From a3b8cdbde9e545ec969bea9774829df590caad14 Mon Sep 17 00:00:00 2001 From: Alex Kantor Date: Wed, 1 Jul 2026 11:23:50 +0100 Subject: [PATCH] docs(k8s-reporter): lead Karpenter guidance with pinning, widening as last resort Reorder the 'Running on EKS with Karpenter' section in the chart README and Mintlify docs so node-group pinning is the recommended fix and widening the report interval is a caveated last resort. Frequent snapshots are how Kosli surfaces drift quickly, so slowing the reporter down trades away detection speed. Regenerate README and bump the chart to 2.3.1 (docs only, no manifest change). Refs #984 Co-Authored-By: Claude Opus 4.8 (1M context) --- charts/k8s-reporter/Chart.yaml | 2 +- charts/k8s-reporter/README.md | 26 +++++++++---------- .../k8s-reporter/_mintlify_templates.gotmpl | 24 ++++++++--------- charts/k8s-reporter/_templates.gotmpl | 24 ++++++++--------- 4 files changed, 38 insertions(+), 38 deletions(-) diff --git a/charts/k8s-reporter/Chart.yaml b/charts/k8s-reporter/Chart.yaml index 36f289623..9b563b283 100644 --- a/charts/k8s-reporter/Chart.yaml +++ b/charts/k8s-reporter/Chart.yaml @@ -15,7 +15,7 @@ type: application # This is the chart version. This version number should be incremented each time you make changes # to the chart and its templates, including the app version. # Versions are expected to follow Semantic Versioning (https://semver.org/) -version: 2.3.0 +version: 2.3.1 # This is the version number of the (CLI) application being deployed. This version number should be # incremented each time you make changes to the application. They should reflect the version the diff --git a/charts/k8s-reporter/README.md b/charts/k8s-reporter/README.md index 186773f82..49961692d 100644 --- a/charts/k8s-reporter/README.md +++ b/charts/k8s-reporter/README.md @@ -4,7 +4,7 @@ title: Kubernetes Reporter Helm Chart # k8s-reporter -![Version: 2.3.0](https://img.shields.io/badge/Version-2.3.0-informational?style=flat-square) +![Version: 2.3.1](https://img.shields.io/badge/Version-2.3.1-informational?style=flat-square) A Helm chart for installing the Kosli K8S reporter as a CronJob. The chart allows you to create a Kubernetes cronjob and all its necessary RBAC to report running images to Kosli at a given cron schedule. @@ -155,17 +155,9 @@ By default the reporter runs as a CronJob every 5 minutes. On clusters that use The cause is Karpenter's `consolidateAfter` timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose `consolidateAfter` is longer than the reporter interval never becomes eligible for consolidation (see [karpenter#1921](https://github.com/kubernetes-sigs/karpenter/issues/1921)). This is Karpenter working as designed, not a reporter bug. -There are three good ways to avoid it, in order of preference. +Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter's way. Widening the interval trades away that detection speed and should be a last resort. -### 1. Widen the report interval - -The simplest fix. Set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. Environment snapshots rarely need 5-minute freshness. - -```yaml -cronSchedule: "*/15 * * * *" -``` - -### 2. Pin the reporter to a stable node group +### 1. Pin the reporter to a stable node group (recommended) If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use `nodeSelector`, and `tolerations` if that node group is tainted: @@ -192,9 +184,17 @@ affinity: operator: DoesNotExist ``` -### 3. Run the reporter out of the cluster +### 2. Run the reporter out of the cluster -For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/). +For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster's nodes. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/). + +### 3. Widen the report interval (last resort) + +Only if you cannot pin the reporter or move it out of cluster: set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above. + +```yaml +cronSchedule: "*/15 * * * *" +``` > Note: `karpenter.sh/do-not-disrupt: "true"` is **not** a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node *less* likely, not more. Likewise `cluster-autoscaler.kubernetes.io/safe-to-evict` only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter. diff --git a/charts/k8s-reporter/_mintlify_templates.gotmpl b/charts/k8s-reporter/_mintlify_templates.gotmpl index 99a330a4f..150d07665 100644 --- a/charts/k8s-reporter/_mintlify_templates.gotmpl +++ b/charts/k8s-reporter/_mintlify_templates.gotmpl @@ -130,17 +130,9 @@ By default the reporter runs as a CronJob every 5 minutes. On clusters that use The cause is Karpenter's `consolidateAfter` timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose `consolidateAfter` is longer than the reporter interval never becomes eligible for consolidation (see [karpenter#1921](https://github.com/kubernetes-sigs/karpenter/issues/1921)). This is Karpenter working as designed, not a reporter bug. -There are three good ways to avoid it, in order of preference. +Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter's way. Widening the interval trades away that detection speed and should be a last resort. -### 1. Widen the report interval - -The simplest fix. Set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. Environment snapshots rarely need 5-minute freshness. - -```yaml -cronSchedule: "*/15 * * * *" -``` - -### 2. Pin the reporter to a stable node group +### 1. Pin the reporter to a stable node group (recommended) If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use `nodeSelector`, and `tolerations` if that node group is tainted: @@ -167,9 +159,17 @@ affinity: operator: DoesNotExist ``` -### 3. Run the reporter out of the cluster +### 2. Run the reporter out of the cluster -For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access. See the [Kubernetes environment reporting tutorial](/tutorials/report_k8s_envs). +For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster's nodes. See the [Kubernetes environment reporting tutorial](/tutorials/report_k8s_envs). + +### 3. Widen the report interval (last resort) + +Only if you cannot pin the reporter or move it out of cluster: set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above. + +```yaml +cronSchedule: "*/15 * * * *" +``` `karpenter.sh/do-not-disrupt: "true"` is **not** a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node *less* likely, not more. Likewise `cluster-autoscaler.kubernetes.io/safe-to-evict` only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter. diff --git a/charts/k8s-reporter/_templates.gotmpl b/charts/k8s-reporter/_templates.gotmpl index 7cc260811..6f4cf8ab1 100644 --- a/charts/k8s-reporter/_templates.gotmpl +++ b/charts/k8s-reporter/_templates.gotmpl @@ -160,17 +160,9 @@ By default the reporter runs as a CronJob every 5 minutes. On clusters that use The cause is Karpenter's `consolidateAfter` timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose `consolidateAfter` is longer than the reporter interval never becomes eligible for consolidation (see [karpenter#1921](https://github.com/kubernetes-sigs/karpenter/issues/1921)). This is Karpenter working as designed, not a reporter bug. -There are three good ways to avoid it, in order of preference. +Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter's way. Widening the interval trades away that detection speed and should be a last resort. -### 1. Widen the report interval - -The simplest fix. Set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. Environment snapshots rarely need 5-minute freshness. - -```yaml -cronSchedule: "*/15 * * * *" -``` - -### 2. Pin the reporter to a stable node group +### 1. Pin the reporter to a stable node group (recommended) If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use `nodeSelector`, and `tolerations` if that node group is tainted: @@ -197,9 +189,17 @@ affinity: operator: DoesNotExist ``` -### 3. Run the reporter out of the cluster +### 2. Run the reporter out of the cluster -For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/). +For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster's nodes. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/). + +### 3. Widen the report interval (last resort) + +Only if you cannot pin the reporter or move it out of cluster: set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above. + +```yaml +cronSchedule: "*/15 * * * *" +``` > Note: `karpenter.sh/do-not-disrupt: "true"` is **not** a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node *less* likely, not more. Likewise `cluster-autoscaler.kubernetes.io/safe-to-evict` only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter. {{- end }}