diff --git a/charts/k8s-reporter/Chart.yaml b/charts/k8s-reporter/Chart.yaml index 36f289623..9b563b283 100644 --- a/charts/k8s-reporter/Chart.yaml +++ b/charts/k8s-reporter/Chart.yaml @@ -15,7 +15,7 @@ type: application # This is the chart version. This version number should be incremented each time you make changes # to the chart and its templates, including the app version. # Versions are expected to follow Semantic Versioning (https://semver.org/) -version: 2.3.0 +version: 2.3.1 # This is the version number of the (CLI) application being deployed. This version number should be # incremented each time you make changes to the application. They should reflect the version the diff --git a/charts/k8s-reporter/README.md b/charts/k8s-reporter/README.md index 186773f82..49961692d 100644 --- a/charts/k8s-reporter/README.md +++ b/charts/k8s-reporter/README.md @@ -4,7 +4,7 @@ title: Kubernetes Reporter Helm Chart # k8s-reporter -![Version: 2.3.0](https://img.shields.io/badge/Version-2.3.0-informational?style=flat-square) +![Version: 2.3.1](https://img.shields.io/badge/Version-2.3.1-informational?style=flat-square) A Helm chart for installing the Kosli K8S reporter as a CronJob. The chart allows you to create a Kubernetes cronjob and all its necessary RBAC to report running images to Kosli at a given cron schedule. @@ -155,17 +155,9 @@ By default the reporter runs as a CronJob every 5 minutes. On clusters that use The cause is Karpenter's `consolidateAfter` timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose `consolidateAfter` is longer than the reporter interval never becomes eligible for consolidation (see [karpenter#1921](https://github.com/kubernetes-sigs/karpenter/issues/1921)). This is Karpenter working as designed, not a reporter bug. -There are three good ways to avoid it, in order of preference. +Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter's way. Widening the interval trades away that detection speed and should be a last resort. -### 1. Widen the report interval - -The simplest fix. Set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. Environment snapshots rarely need 5-minute freshness. - -```yaml -cronSchedule: "*/15 * * * *" -``` - -### 2. Pin the reporter to a stable node group +### 1. Pin the reporter to a stable node group (recommended) If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use `nodeSelector`, and `tolerations` if that node group is tainted: @@ -192,9 +184,17 @@ affinity: operator: DoesNotExist ``` -### 3. Run the reporter out of the cluster +### 2. Run the reporter out of the cluster -For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/). +For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster's nodes. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/). + +### 3. Widen the report interval (last resort) + +Only if you cannot pin the reporter or move it out of cluster: set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above. + +```yaml +cronSchedule: "*/15 * * * *" +``` > Note: `karpenter.sh/do-not-disrupt: "true"` is **not** a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node *less* likely, not more. Likewise `cluster-autoscaler.kubernetes.io/safe-to-evict` only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter. diff --git a/charts/k8s-reporter/_mintlify_templates.gotmpl b/charts/k8s-reporter/_mintlify_templates.gotmpl index 99a330a4f..150d07665 100644 --- a/charts/k8s-reporter/_mintlify_templates.gotmpl +++ b/charts/k8s-reporter/_mintlify_templates.gotmpl @@ -130,17 +130,9 @@ By default the reporter runs as a CronJob every 5 minutes. On clusters that use The cause is Karpenter's `consolidateAfter` timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose `consolidateAfter` is longer than the reporter interval never becomes eligible for consolidation (see [karpenter#1921](https://github.com/kubernetes-sigs/karpenter/issues/1921)). This is Karpenter working as designed, not a reporter bug. -There are three good ways to avoid it, in order of preference. +Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter's way. Widening the interval trades away that detection speed and should be a last resort. -### 1. Widen the report interval - -The simplest fix. Set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. Environment snapshots rarely need 5-minute freshness. - -```yaml -cronSchedule: "*/15 * * * *" -``` - -### 2. Pin the reporter to a stable node group +### 1. Pin the reporter to a stable node group (recommended) If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use `nodeSelector`, and `tolerations` if that node group is tainted: @@ -167,9 +159,17 @@ affinity: operator: DoesNotExist ``` -### 3. Run the reporter out of the cluster +### 2. Run the reporter out of the cluster -For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access. See the [Kubernetes environment reporting tutorial](/tutorials/report_k8s_envs). +For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster's nodes. See the [Kubernetes environment reporting tutorial](/tutorials/report_k8s_envs). + +### 3. Widen the report interval (last resort) + +Only if you cannot pin the reporter or move it out of cluster: set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above. + +```yaml +cronSchedule: "*/15 * * * *" +``` `karpenter.sh/do-not-disrupt: "true"` is **not** a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node *less* likely, not more. Likewise `cluster-autoscaler.kubernetes.io/safe-to-evict` only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter. diff --git a/charts/k8s-reporter/_templates.gotmpl b/charts/k8s-reporter/_templates.gotmpl index 7cc260811..6f4cf8ab1 100644 --- a/charts/k8s-reporter/_templates.gotmpl +++ b/charts/k8s-reporter/_templates.gotmpl @@ -160,17 +160,9 @@ By default the reporter runs as a CronJob every 5 minutes. On clusters that use The cause is Karpenter's `consolidateAfter` timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose `consolidateAfter` is longer than the reporter interval never becomes eligible for consolidation (see [karpenter#1921](https://github.com/kubernetes-sigs/karpenter/issues/1921)). This is Karpenter working as designed, not a reporter bug. -There are three good ways to avoid it, in order of preference. +Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter's way. Widening the interval trades away that detection speed and should be a last resort. -### 1. Widen the report interval - -The simplest fix. Set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. Environment snapshots rarely need 5-minute freshness. - -```yaml -cronSchedule: "*/15 * * * *" -``` - -### 2. Pin the reporter to a stable node group +### 1. Pin the reporter to a stable node group (recommended) If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use `nodeSelector`, and `tolerations` if that node group is tainted: @@ -197,9 +189,17 @@ affinity: operator: DoesNotExist ``` -### 3. Run the reporter out of the cluster +### 2. Run the reporter out of the cluster -For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/). +For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster's nodes. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/). + +### 3. Widen the report interval (last resort) + +Only if you cannot pin the reporter or move it out of cluster: set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above. + +```yaml +cronSchedule: "*/15 * * * *" +``` > Note: `karpenter.sh/do-not-disrupt: "true"` is **not** a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node *less* likely, not more. Likewise `cluster-autoscaler.kubernetes.io/safe-to-evict` only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter. {{- end }}