A best practices checker for Kubernetes clusters. 🤠
 
 
 
 
Go to file
Dylan Scott a8c53040ec
CON-5812 Extend clusterlint to raise an error-level violation when invalid snapshots are found (#143)
2022-07-04 17:53:02 +01:00
.circleci Update dependencies to fix crash on Go 1.17 (#126) 2021-09-19 16:52:07 +02:00
.github Cross compile clusterlint for darwin arm64 2022-01-12 21:19:52 -05:00
checks CON-5812 Extend clusterlint to raise an error-level violation when invalid snapshots are found (#143) 2022-07-04 17:53:02 +01:00
cmd/clusterlint CON-5812 Extend clusterlint to raise an error-level violation when invalid snapshots are found (#143) 2022-07-04 17:53:02 +01:00
docs docs: running in-cluster with RBAC 2021-12-20 16:13:45 +04:00
example-plugin CON-5812 Extend clusterlint to raise an error-level violation when invalid snapshots are found (#143) 2022-07-04 17:53:02 +01:00
kube CON-5812 Extend clusterlint to raise an error-level violation when invalid snapshots are found (#143) 2022-07-04 17:53:02 +01:00
script Add CI check for license headers 2019-07-02 13:58:00 -06:00
vendor CON-5812 Extend clusterlint to raise an error-level violation when invalid snapshots are found (#143) 2022-07-04 17:53:02 +01:00
CONTRIBUTING.md Add contributing guide and license 2019-07-02 13:30:14 -06:00
Dockerfile add dockerfile to build the clusterlint image to use in-cluster 2022-01-18 15:58:27 -05:00
LICENSE Add contributing guide and license 2019-07-02 13:30:14 -06:00
README.md CON-5812 Extend clusterlint to raise an error-level violation when invalid snapshots are found (#143) 2022-07-04 17:53:02 +01:00
checks.md docs: fix anchor for resource-requirements 2022-03-12 08:38:05 +01:00
go.mod CON-5812 Extend clusterlint to raise an error-level violation when invalid snapshots are found (#143) 2022-07-04 17:53:02 +01:00
go.sum CON-5812 Extend clusterlint to raise an error-level violation when invalid snapshots are found (#143) 2022-07-04 17:53:02 +01:00

README.md

Clusterlint

CircleCI

As clusters scale and become increasingly difficult to maintain, clusterlint helps operators conform to Kubernetes best practices around resources, security and reliability to avoid common problems while operating or upgrading the clusters.

Clusterlint queries live Kubernetes clusters for resources, executes common and platform specific checks against these resources and provides actionable feedback to cluster operators. It is a non invasive tool that is run externally. Clusterlint does not alter the resource configurations.

Background

Kubernetes resources can be configured and applied in many ways. This flexibility often makes it difficult to identify problems across the cluster at the time of configuration. Clusterlint looks at live clusters to analyze all its resources and report problems, if any.

There are some common best practices to follow while applying configurations to a cluster like:

  • Namespace is used to limit the scope of the Kubernetes resources created by multiple sets of users within a team. Even though there is a default namespace, dumping all the created resources into one namespace is not recommended. It can lead to privilege escalation, resource name collisions, latency in operations as resources scale up and mismanagement of kubernetes objects. Having namespaces ensures that resource quotas can be enabled to keep track node, cpu and memory usage for individual teams.

  • Always specify resource requests and limits on pods: When containers have resource requests specified, the scheduler can make better decisions about which nodes to place pods on. And when containers have their limits specified, contention for resources on a node can be handled in a specified manner.

While there are problems that are common to clusters irrespective of the environment they are running in, the fact that different Kubernetes configurations (VMs, managed solutions, etc.) have different subtleties affect how workloads run. Clusterlint provides platform specific checks to identify issues with resources that cluster operators can fix to run in a specific environment.

Some examples of such checks are:

  • On upgrade of a cluster on DOKS, the worker nodes' hostname changes. So, if a user's pod spec relies on the hostname to schedule pods on specific nodes, pod scheduling will fail after upgrade.

Please refer to checks.md to get some background on every check that clusterlint performs.

Install

go get github.com/digitalocean/clusterlint/cmd/clusterlint

The above command creates the clusterlint binary in $GOPATH/bin

Usage

clusterlint list [options]  // list all checks available
clusterlint run [options]  // run all or specific checks

Running in-cluster

Build the docker image to run clusterlint from within a cluster by doing:

docker build -t <hub_username>/clusterlint:<tag> .
docker push <hub_username>/clusterlint:<tag>

If you're running clusterlint from within a Pod, you can use the --in-cluster flag to access the Kubernetes API from the Pod.

clusterlint --in-cluster run

Here's a simple example of CronJob definition to run clusterlint in the default namespace without RBAC :

apiVersion: batch/v1
kind: CronJob
metadata:
  name: clusterlint-cron
spec:
  schedule: "0 */1 * * *"
  concurrencyPolicy: Replace
  failedJobsHistoryLimit: 3
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: clusterlint
              image: docker.io/<hub_username>/clusterlint:<tag>
              command: ['/clusterlint', '--in-cluster', 'run']
              imagePullPolicy: IfNotPresent
          restartPolicy: Never

If you're using RBAC, see docs/RBAC.md.

Specific checks and groups

All checks that clusterlint performs are categorized into groups. A check can belong to multiple groups. This framework allows one to only run specific checks on a cluster. For instance, if a cluster is running on DOKS, then, running checks specific to AWS does not make sense. Clusterlint can blacklist aws related checks, if any while running against a DOKS cluster.

clusterlint run -g basic                // runs only checks that are part of the basic group
clusterlint run -G security            // runs all checks that are not part of the security group
clusterlint run -c default-namespace  // runs only the default-namespace check
clusterlint run -C default-namespace // exclude default-namespace check

Disabling checks via Annotations

Clusterlint provides a way to ignore some special objects in the cluster from being checked. For example, resources in the kube-system namespace often use privileged containers. This can create a lot of noise in the output when a cluster operator is looking for feedback to improve the cluster configurations. In order to avoid such a situation where objects that are exempt from being checked, the annotation clusterlint.digitalocean.com/disabled-checks can be added in the resource configuration. The annotation takes in a comma separated list of check names that should be excluded while running clusterlint.

"metadata": {
  "annotations": {
    "clusterlint.digitalocean.com/disabled-checks" : "noop,bare-pods"
  }
}

Building local checks

Some individuals and organizations have Kubernetes best practices that are not applicable to the general community, but which they would like to check with clusterlint. If your check may be useful for anyone else, we encourage you to submit it to clusterlint rather than keeping it local. However, if you have a truly specific check that is not appropriate for sharing with the broader community, you can implement it using Go plugins.

See the example plugin for documentation on how to build a plugin. Please be sure to read the caveats and consider whether you really want to maintain a plugin.

To use your plugin with clusterlint, pass its path on the commandline:

$ clusterlint --plugins=/path/to/plugin.so list
$ clusterlint --plugins=/path/to/plugin.so run -c my-plugin-check

Contributing

Contributions are welcome, in the form of either issues or pull requests. Please see the contribution guidelines for details.

License

Copyright 2022 DigitalOcean

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.