Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,103 @@ Versioning: [Semantic Versioning](https://semver.org/spec/v2.0.0.html)

---

## [0.8.0] - 2026-06-28

SDK↔backend wire-format audit. Closes a class of silent-fail-OPEN
path that was sending `model=None` (or `model="unknown"`) on
`/track` for many LLM-vendor paths — every such event cost the
backend a `model_pricing` lookup that returned no row, fell
through to `DEFAULT_RATE` (~$30/M), and emitted a fallback warning
the operator couldn't reproduce because the offending observation
was buried in another package's telemetry.

No public-API break. No behavior change for callers whose
instrumentation already populates `model` correctly. Pure wire-
payload hygiene.

### Fixed

- **`NullRunRuntime.track()` strips `None` values from the wire
payload.** Pre-0.8.0 the runtime forwarded every key in
`enriched` except those in `_WIRE_STRIP_FIELDS`, including keys
whose value was `None`. Putting `{"model": null}` on the wire
triggered backend `unwrap_or("default")` and a fallback warning.
Backend handles a missing key as well as `null`; dropping `None`
here keeps the diagnostic signal loud (the new
`WARN track(): llm_call event missing 'model' field` fires on
missing-key, which is what we want operators to see) instead of
silent (the JSON-null case). Activated only for `llm_call` so
`span_start` / `span_end` / `tool_call` traffic doesn't pollute
logs.

- **All four instrumentation paths now extract `model` /
`provider` from the response object as a fallback, not just
from `invocation_params` / `self.model`.** When langchain 1.x
stopped forwarding `invocation_params` to `on_llm_end`, every
LangChain-callback track event carried `model="unknown"` and
the backend cost pipeline fell through to `DEFAULT_RATE`. The
same shape applied to llama-index mock providers and autogen
subclasses that don't expose a `.model` attribute. New
fallback chain (per path):

- `NullRunCallback.on_llm_end` (langgraph): `invocation_params.model_name`
→ `response.response_metadata['model_name']` → AIMessage
`response_metadata` → `response.llm_output['model_name']` →
`response.model_name` / `response.model` → `'unknown'`
(truly last resort, not the common case).
- `extract_from_event` (llama_index): `event.response.model` →
`event.response.raw.model` → `usage['model']`. Mock providers
and adapter-style ChatResponse objects now ship a real model
id on the wire.
- `on_messages` (autogen): `self.model` → `result.model`. OpenAI's
response carries the actual model id (may differ from request
if the server resolved an alias) — this is the right value.
- `_emit_from_span` (auto, openai-agents): `span['model']` →
`usage['model']` → `span['response_metadata']['model_name']`.
Some custom tracer configs leave `span['model']` empty; the
other two sources usually have it.

- **Two shared helpers added to `instrumentation/langgraph.py`:**
`_extract_model_from_response` and `_extract_provider_from_response`.
These mirror the same best-effort pattern `_get_finish_reason`
already uses, so we have a single "best-effort read from the
response object" idiom across the module. The autogen /
llama_index / agents paths duplicate the walk inline (the
response shapes differ too much to share a single helper), but
the *ordering* matches: official-attr → metadata → usage
→ wrapper-attr.

### Operator-visible change

`logger.warning("track(): llm_call event missing 'model' field — backend will fall back to DEFAULT_RATE. event=...")` is now emitted from `NullRunRuntime.track()` whenever an `llm_call` event reaches the wire without a `model` field. This log is the single signal an operator needs to reproduce "which observation (httpx / langchain callback / manual track / agents tracer / requests) produced an `llm_call` without `model` set". Activated only for `llm_call`; other event types are silent. Log destination is whatever the host application configures for the `nullrun.runtime` logger.

### Tests

- Tests covering the new helper chain will land in a follow-up
release once the wire-format audit findings are stable. The
fix is a defensive best-effort read; the existing
`test_instrumentation_*` suites already pass against the
updated paths.

---

Additive patch on top of 0.7.7. Converts two silent fail-OPEN footguns
into explicit `DeprecationWarning` / `RuntimeError`. No behavior
change for callers who don't touch the deprecated surface.

### Deprecated

- `NullRunRuntime.start_recording()` and `NullRunRuntime.stop_recording()` now emit `DeprecationWarning`. They have been silent no-op stubs since Sprint 2.1 (0.4.0). Decision history is available via the backend dashboard at `/control-center/decision-history`. **Both methods will be removed in 0.9.0.**
- Setting `NULLRUN_USE_GRPC=1` now raises `RuntimeError` at SDK init instead of silently falling back to HTTP with an info log. gRPC transport remains on the roadmap but is not yet implemented. Unset the env var to use HTTP. See https://docs.nullrun.io/reference/sdk-api#transport

### Migration

- Replace `runtime.start_recording(workflow_id, metadata=...)` with a dashboard navigation or `nullrun.status()` introspection.
- Remove any `NULLRUN_USE_GRPC` env var from deployment configs (Docker compose, k8s manifests, systemd units).
- Catch `RuntimeError` at SDK init if you want to keep the env var as a feature flag — but the recommended path is to unset it.

---

## [0.7.8] - 2026-06-28

Additive patch on top of 0.7.7. Converts two silent fail-OPEN footguns
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "nullrun"
version = "0.7.8"
version = "0.8.0"
# Long form used by PyPI page meta-description and search snippets.
# Kept under the 200-char preview threshold so the full line is visible
# without an "expand" click. Keywords are matched against likely search
Expand Down
2 changes: 1 addition & 1 deletion src/nullrun/__version__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""NullRun Platform SDK."""

__version__ = "0.7.8"
__version__ = "0.8.0"
__platform_version__ = "1.0.0"
59 changes: 39 additions & 20 deletions src/nullrun/instrumentation/auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -1142,27 +1142,46 @@ def _emit_from_agents_result(runtime: Any, result: Any) -> None:
name = (tc.get("function") or {}).get("name")
if name:
tool_names.append(name)
runtime.track(
{
"type": "llm_call",
"provider": "openai_agents",
"model": span.get("model"),
"tokens": total,
"input_tokens": prompt,
"output_tokens": completion,
"cache_read_tokens": int(prompt_details.get("cached_tokens", 0) or 0),
"cache_write_tokens": 0,
"reasoning_tokens": int(completion_details.get("reasoning_tokens", 0) or 0),
"finish_reason": _normalize_finish_reason(
(usage.get("choices") or [{}])[0].get("finish_reason")
if usage.get("choices") else None
),
"tool_names": tool_names,
"has_usage": True,
"raw_usage": usage,
"_fingerprint": f"agents-{span.get('id', id(span))}",
}
# Audit 2026-06-28 (SDK↔backend wire): ``span.get("model")``
# used to be put on the wire as-is — when the agents SDK
# didn't populate the span's ``model`` field (some
# custom tracer configs), this shipped ``model=None`` →
# backend ``unwrap_or("default")`` → fallback warning.
# We also try ``usage["model"]`` (OpenAI usage payload
# sometimes carries the resolved model id) and
# ``span["response_metadata"]["model_name"]`` (langchain-
# style metadata block on the span). Empty / None are
# dropped — only set ``model`` when we have a real value.
span_model = (
span.get("model")
or (usage.get("model") if isinstance(usage, dict) else None)
or (
(span.get("response_metadata") or {}).get("model_name")
if isinstance(span.get("response_metadata"), dict)
else None
)
)
agents_event: dict[str, Any] = {
"type": "llm_call",
"provider": "openai_agents",
"tokens": total,
"input_tokens": prompt,
"output_tokens": completion,
"cache_read_tokens": int(prompt_details.get("cached_tokens", 0) or 0),
"cache_write_tokens": 0,
"reasoning_tokens": int(completion_details.get("reasoning_tokens", 0) or 0),
"finish_reason": _normalize_finish_reason(
(usage.get("choices") or [{}])[0].get("finish_reason")
if usage.get("choices") else None
),
"tool_names": tool_names,
"has_usage": True,
"raw_usage": usage,
"_fingerprint": f"agents-{span.get('id', id(span))}",
}
if span_model:
agents_event["model"] = span_model
runtime.track(agents_event)
except Exception as e: # pragma: no cover — defensive
logger.debug("NullRun: agents track failed: %s", e)

Expand Down
57 changes: 42 additions & 15 deletions src/nullrun/instrumentation/autogen.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,22 +101,49 @@ def _wrap_create(self: Any, *args: Any, **kwargs: Any) -> Any:
getattr(usage, "total_tokens", 0) or 0
) or (prompt + completion)
if prompt or completion or total:
# Audit 2026-06-28 (SDK↔backend wire): model
# used to come only from ``self.model`` with a
# bare ``None`` fallback — if the autogen client
# didn't expose a ``model`` attribute (some
# subclass / wrapper / mock provider), the wire
# event carried ``model=None`` → backend
# ``unwrap_or("default")`` → fallback warning →
# DEFAULT_RATE. Now we try three sources in
# priority order, matching the multi-source
# pattern in langgraph's
# ``_extract_model_from_response``:
# 1. ``self.model`` (autogen config — preferred
# because it reflects what the user asked for)
# 2. ``result.model`` (OpenAI's response — actual
# model id, may differ from request if the
# server aliased)
# 3. None — let the runtime-level warning log
# (added 2026-06-28 in runtime.py:track())
# surface which path produced the gap.
model = (
getattr(self, "model", None)
or getattr(result, "model", None)
)
try:
runtime.track(
{
"type": "llm_call",
"provider": "autogen",
"model": getattr(self, "model", None),
"tokens": total,
"input_tokens": prompt,
"output_tokens": completion,
"has_usage": True,
"raw_usage": {
"prompt_tokens": prompt,
"completion_tokens": completion,
},
}
)
event: dict[str, Any] = {
"type": "llm_call",
"provider": "autogen",
"tokens": total,
"input_tokens": prompt,
"output_tokens": completion,
"has_usage": True,
"raw_usage": {
"prompt_tokens": prompt,
"completion_tokens": completion,
},
}
# Only set ``model`` when we have a real value
# — putting ``None`` on the wire defeats the
# backend's ``unwrap_or("default")`` defensive
# path. Empty string is treated as absent.
if model:
event["model"] = model
runtime.track(event)
except Exception as e: # pragma: no cover
logger.debug("autogen create emit failed: %s", e)
return result
Expand Down
129 changes: 125 additions & 4 deletions src/nullrun/instrumentation/langgraph.py
Original file line number Diff line number Diff line change
Expand Up @@ -467,12 +467,35 @@ def on_llm_end(self, response: Any, **kwargs: Any) -> None:

Extracts usage data and sends to backend for cost computation.
Does NOT compute cost - backend is source of truth.

Audit 2026-06-28 (SDK↔backend wire): the previous version pulled
``model_name`` exclusively from ``invocation_params`` with a
hard fallback to the literal string ``"unknown"``. When langchain
1.x stopped forwarding ``invocation_params`` to ``on_llm_end``,
every track event carried ``model="unknown"`` and the backend
cost pipeline fell through to ``DEFAULT_RATE``. Now we try
``invocation_params.model_name`` first, then fall back to
reading the real model id from the response object itself
(``response.response_metadata['model_name']`` or the AIMessage
on the LLMResult generation). ``"unknown"`` is now a true last
resort, not the common case.
"""
try:
# Extract provider/model from invocation params
invocation_params = kwargs.get('invocation_params', {})
model = invocation_params.get('model_name', 'unknown')
provider = invocation_params.get('model_provider', 'openai')
# Extract provider/model from invocation params first, then
# fall back to the response object. This matches the
# best-effort pattern used by ``_get_finish_reason`` /
# ``_extract_tool_names`` for the same response.
invocation_params = kwargs.get('invocation_params') or {}
model = (
invocation_params.get('model_name')
or _extract_model_from_response(response)
or 'unknown'
)
provider = (
invocation_params.get('model_provider')
or _extract_provider_from_response(response)
or 'openai'
)

# Extract usage (normalized format)
usage = extract_usage_from_response(response, provider, model)
Expand Down Expand Up @@ -670,3 +693,101 @@ def _extract_node_name(serialized: Any, default: str) -> str:
return name
return default


# ---------------------------------------------------------------------------
# Audit 2026-06-28 (SDK↔backend wire): model_name on the callback path
# ---------------------------------------------------------------------------
# Pre-fix: ``on_llm_end`` pulled ``model_name`` exclusively from
# ``kwargs['invocation_params']`` with a hard fallback to the literal
# string ``"unknown"``. When langchain 1.x stopped forwarding
# ``invocation_params`` to ``on_llm_end`` (or forwarded it without a
# ``model_name`` key), every track event carried ``model="unknown"``
# → backend cost pipeline hit ``model_pricing WHERE model_id='unknown'``
# → no row → fallback warning → DEFAULT_RATE (~$30/M).
#
# Real model name is always reachable from the response itself (OpenAI
# via LangChain puts it in ``response.response_metadata['model_name']``;
# LLMResult callback path puts it on the generation's AIMessage). This
# helper walks the same fallback chain ``_get_finish_reason`` already
# uses, so we have a single pattern for "best-effort read from the
# response object" across both helpers.

def _extract_model_from_response(response: Any) -> str | None:
"""Best-effort model extraction mirroring ``_get_finish_reason``.

Returns the first non-empty value found, or ``None`` if every known
source is empty / malformed.

Sources checked, in order:

1. ``response.response_metadata['model_name']`` — OpenAI-via-LangChain
puts the real model id (e.g. ``"gpt-4.1-mini-2025-04-14"``) here.
2. ``response.generations[0][0].message.response_metadata['model_name']``
— LLMResult callback path where the metadata lives on the AIMessage
rather than the LLMResult itself.
3. ``response.llm_output['model_name']`` — legacy LLMResult where the
chat-model wrapper hoisted the field onto the LLMResult dict.
4. ``response.model`` / ``response.model_name`` — direct attributes
on the response object (rare but seen in some custom wrappers).
"""
# 1. response_metadata on the response.
resp_meta = getattr(response, "response_metadata", None)
if isinstance(resp_meta, dict):
val = resp_meta.get("model_name") or resp_meta.get("model")
if val:
return str(val)

# 2. LLMResult callback path — look on the generation's AIMessage.
gen_msg = _safe_get_gen_message(response)
if gen_msg is not None:
gm = getattr(gen_msg, "response_metadata", None)
if isinstance(gm, dict):
val = gm.get("model_name") or gm.get("model")
if val:
return str(val)
# Some wrappers put the model name directly on the AIMessage.
for attr in ("model_name", "model"):
v = getattr(gen_msg, attr, None)
if v:
return str(v)

# 3. llm_output dict (legacy LLMResult).
llm_out = getattr(response, "llm_output", None)
if isinstance(llm_out, dict):
val = llm_out.get("model_name") or llm_out.get("model")
if val:
return str(val)

# 4. Direct attribute on response.
for attr in ("model_name", "model"):
v = getattr(response, attr, None)
if v:
return str(v)

return None


def _extract_provider_from_response(response: Any) -> str | None:
"""Best-effort provider extraction mirroring ``_extract_model_from_response``.

Same fallback chain — ``model_provider`` is what langchain passes
in ``invocation_params`` and what we want to read from response
metadata when invocation_params is absent. Returns ``None`` if
nothing is found so the caller keeps the default ('openai').
"""
resp_meta = getattr(response, "response_metadata", None)
if isinstance(resp_meta, dict):
val = resp_meta.get("model_provider") or resp_meta.get("provider")
if val:
return str(val)

gen_msg = _safe_get_gen_message(response)
if gen_msg is not None:
gm = getattr(gen_msg, "response_metadata", None)
if isinstance(gm, dict):
val = gm.get("model_provider") or gm.get("provider")
if val:
return str(val)

return None

Loading
Loading