Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion charts/k8s-reporter/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 2.3.0
version: 2.3.1

# This is the version number of the (CLI) application being deployed. This version number should be
# incremented each time you make changes to the application. They should reflect the version the
Expand Down
26 changes: 13 additions & 13 deletions charts/k8s-reporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Kubernetes Reporter Helm Chart

# k8s-reporter

![Version: 2.3.0](https://img.shields.io/badge/Version-2.3.0-informational?style=flat-square)
![Version: 2.3.1](https://img.shields.io/badge/Version-2.3.1-informational?style=flat-square)

A Helm chart for installing the Kosli K8S reporter as a CronJob.
The chart allows you to create a Kubernetes cronjob and all its necessary RBAC to report running images to Kosli at a given cron schedule.
Expand Down Expand Up @@ -155,17 +155,9 @@ By default the reporter runs as a CronJob every 5 minutes. On clusters that use

The cause is Karpenter's `consolidateAfter` timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose `consolidateAfter` is longer than the reporter interval never becomes eligible for consolidation (see [karpenter#1921](https://github.com/kubernetes-sigs/karpenter/issues/1921)). This is Karpenter working as designed, not a reporter bug.

There are three good ways to avoid it, in order of preference.
Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter's way. Widening the interval trades away that detection speed and should be a last resort.

### 1. Widen the report interval

The simplest fix. Set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. Environment snapshots rarely need 5-minute freshness.

```yaml
cronSchedule: "*/15 * * * *"
```

### 2. Pin the reporter to a stable node group
### 1. Pin the reporter to a stable node group (recommended)

If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use `nodeSelector`, and `tolerations` if that node group is tainted:

Expand All @@ -192,9 +184,17 @@ affinity:
operator: DoesNotExist
```

### 3. Run the reporter out of the cluster
### 2. Run the reporter out of the cluster

For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/).
For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster's nodes. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/).

### 3. Widen the report interval (last resort)

Only if you cannot pin the reporter or move it out of cluster: set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above.

```yaml
cronSchedule: "*/15 * * * *"
```

> Note: `karpenter.sh/do-not-disrupt: "true"` is **not** a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node *less* likely, not more. Likewise `cluster-autoscaler.kubernetes.io/safe-to-evict` only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter.

Expand Down
24 changes: 12 additions & 12 deletions charts/k8s-reporter/_mintlify_templates.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -130,17 +130,9 @@ By default the reporter runs as a CronJob every 5 minutes. On clusters that use

The cause is Karpenter's `consolidateAfter` timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose `consolidateAfter` is longer than the reporter interval never becomes eligible for consolidation (see [karpenter#1921](https://github.com/kubernetes-sigs/karpenter/issues/1921)). This is Karpenter working as designed, not a reporter bug.

There are three good ways to avoid it, in order of preference.
Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter's way. Widening the interval trades away that detection speed and should be a last resort.

### 1. Widen the report interval

The simplest fix. Set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. Environment snapshots rarely need 5-minute freshness.

```yaml
cronSchedule: "*/15 * * * *"
```

### 2. Pin the reporter to a stable node group
### 1. Pin the reporter to a stable node group (recommended)

If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use `nodeSelector`, and `tolerations` if that node group is tainted:

Expand All @@ -167,9 +159,17 @@ affinity:
operator: DoesNotExist
```

### 3. Run the reporter out of the cluster
### 2. Run the reporter out of the cluster

For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access. See the [Kubernetes environment reporting tutorial](/tutorials/report_k8s_envs).
For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster's nodes. See the [Kubernetes environment reporting tutorial](/tutorials/report_k8s_envs).

### 3. Widen the report interval (last resort)

Only if you cannot pin the reporter or move it out of cluster: set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above.

```yaml
cronSchedule: "*/15 * * * *"
```

<Warning>
`karpenter.sh/do-not-disrupt: "true"` is **not** a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node *less* likely, not more. Likewise `cluster-autoscaler.kubernetes.io/safe-to-evict` only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter.
Expand Down
24 changes: 12 additions & 12 deletions charts/k8s-reporter/_templates.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -160,17 +160,9 @@ By default the reporter runs as a CronJob every 5 minutes. On clusters that use

The cause is Karpenter's `consolidateAfter` timer: Karpenter only consolidates a node once it has seen no pod scheduling activity on it for the configured window. A reporter pod arriving every 5 minutes keeps resetting that timer, so any node whose `consolidateAfter` is longer than the reporter interval never becomes eligible for consolidation (see [karpenter#1921](https://github.com/kubernetes-sigs/karpenter/issues/1921)). This is Karpenter working as designed, not a reporter bug.

There are three good ways to avoid it, in order of preference.
Frequent snapshots are what let Kosli surface drift or an unauthorized change quickly, so the best fix keeps the 5-minute cadence and moves the reporter out of Karpenter's way. Widening the interval trades away that detection speed and should be a last resort.

### 1. Widen the report interval

The simplest fix. Set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. Environment snapshots rarely need 5-minute freshness.

```yaml
cronSchedule: "*/15 * * * *"
```

### 2. Pin the reporter to a stable node group
### 1. Pin the reporter to a stable node group (recommended)

If you run a stable managed node group that Karpenter does not manage, schedule the reporter there so it never disturbs Karpenter-managed nodes. Use `nodeSelector`, and `tolerations` if that node group is tainted:

Expand All @@ -197,9 +189,17 @@ affinity:
operator: DoesNotExist
```

### 3. Run the reporter out of the cluster
### 2. Run the reporter out of the cluster

For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/).
For zero footprint on cluster nodes, run `kosli snapshot k8s` on a schedule outside the cluster (for example a CI cron job) with kubeconfig access, keeping your reporting cadence without placing a pod on the cluster's nodes. See the [Kubernetes environment reporting tutorial](https://docs.kosli.com/tutorials/report_k8s_envs/).

### 3. Widen the report interval (last resort)

Only if you cannot pin the reporter or move it out of cluster: set `cronSchedule` longer than your NodePool's `consolidateAfter` so nodes get quiet windows long enough to consolidate. This works, but a longer interval widens the window in which a change can go unreported, so prefer the options above.

```yaml
cronSchedule: "*/15 * * * *"
```

> Note: `karpenter.sh/do-not-disrupt: "true"` is **not** a fix here. It prevents Karpenter from disrupting the pod, which protects a mid-run report from interruption but makes consolidation of that node *less* likely, not more. Likewise `cluster-autoscaler.kubernetes.io/safe-to-evict` only affects the Kubernetes Cluster Autoscaler and is ignored by Karpenter.
{{- end }}
Expand Down
Loading