nullrunio · maltsev-dev · Jun 29, 2026 · Jun 27, 2026 · Jun 27, 2026 · Jun 27, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,103 @@ Versioning: [Semantic Versioning](https://semver.org/spec/v2.0.0.html)
 
 ---
 
+## [0.8.0] - 2026-06-28
+
+SDK↔backend wire-format audit. Closes a class of silent-fail-OPEN
+path that was sending `model=None` (or `model="unknown"`) on
+`/track` for many LLM-vendor paths — every such event cost the
+backend a `model_pricing` lookup that returned no row, fell
+through to `DEFAULT_RATE` (~$30/M), and emitted a fallback warning
+the operator couldn't reproduce because the offending observation
+was buried in another package's telemetry.
+
+No public-API break. No behavior change for callers whose
+instrumentation already populates `model` correctly. Pure wire-
+payload hygiene.
+
+### Fixed
+
+- **`NullRunRuntime.track()` strips `None` values from the wire
+  payload.** Pre-0.8.0 the runtime forwarded every key in
+  `enriched` except those in `_WIRE_STRIP_FIELDS`, including keys
+  whose value was `None`. Putting `{"model": null}` on the wire
+  triggered backend `unwrap_or("default")` and a fallback warning.
+  Backend handles a missing key as well as `null`; dropping `None`
+  here keeps the diagnostic signal loud (the new
+  `WARN track(): llm_call event missing 'model' field` fires on
+  missing-key, which is what we want operators to see) instead of
+  silent (the JSON-null case). Activated only for `llm_call` so
+  `span_start` / `span_end` / `tool_call` traffic doesn't pollute
+  logs.
+
+- **All four instrumentation paths now extract `model` /
+  `provider` from the response object as a fallback, not just
+  from `invocation_params` / `self.model`.** When langchain 1.x
+  stopped forwarding `invocation_params` to `on_llm_end`, every
+  LangChain-callback track event carried `model="unknown"` and
+  the backend cost pipeline fell through to `DEFAULT_RATE`. The
+  same shape applied to llama-index mock providers and autogen
+  subclasses that don't expose a `.model` attribute. New
+  fallback chain (per path):
+
+  - `NullRunCallback.on_llm_end` (langgraph): `invocation_params.model_name`
+    → `response.response_metadata['model_name']` → AIMessage
+    `response_metadata` → `response.llm_output['model_name']` →
+    `response.model_name` / `response.model` → `'unknown'`
+    (truly last resort, not the common case).
+  - `extract_from_event` (llama_index): `event.response.model` →
+    `event.response.raw.model` → `usage['model']`. Mock providers
+    and adapter-style ChatResponse objects now ship a real model
+    id on the wire.
+  - `on_messages` (autogen): `self.model` → `result.model`. OpenAI's
+    response carries the actual model id (may differ from request
+    if the server resolved an alias) — this is the right value.
+  - `_emit_from_span` (auto, openai-agents): `span['model']` →
+    `usage['model']` → `span['response_metadata']['model_name']`.
+    Some custom tracer configs leave `span['model']` empty; the
+    other two sources usually have it.
+
+- **Two shared helpers added to `instrumentation/langgraph.py`:**
+  `_extract_model_from_response` and `_extract_provider_from_response`.
+  These mirror the same best-effort pattern `_get_finish_reason`
+  already uses, so we have a single "best-effort read from the
+  response object" idiom across the module. The autogen /
+  llama_index / agents paths duplicate the walk inline (the
+  response shapes differ too much to share a single helper), but
+  the *ordering* matches: official-attr → metadata → usage
+  → wrapper-attr.
+
+### Operator-visible change
+
+`logger.warning("track(): llm_call event missing 'model' field — backend will fall back to DEFAULT_RATE. event=...")` is now emitted from `NullRunRuntime.track()` whenever an `llm_call` event reaches the wire without a `model` field. This log is the single signal an operator needs to reproduce "which observation (httpx / langchain callback / manual track / agents tracer / requests) produced an `llm_call` without `model` set". Activated only for `llm_call`; other event types are silent. Log destination is whatever the host application configures for the `nullrun.runtime` logger.
+
+### Tests
+
+- Tests covering the new helper chain will land in a follow-up
+  release once the wire-format audit findings are stable. The
+  fix is a defensive best-effort read; the existing
+  `test_instrumentation_*` suites already pass against the
+  updated paths.
+
+---
+
+Additive patch on top of 0.7.7. Converts two silent fail-OPEN footguns
+into explicit `DeprecationWarning` / `RuntimeError`. No behavior
+change for callers who don't touch the deprecated surface.
+
+### Deprecated
+
+- `NullRunRuntime.start_recording()` and `NullRunRuntime.stop_recording()` now emit `DeprecationWarning`. They have been silent no-op stubs since Sprint 2.1 (0.4.0). Decision history is available via the backend dashboard at `/control-center/decision-history`. **Both methods will be removed in 0.9.0.**
+- Setting `NULLRUN_USE_GRPC=1` now raises `RuntimeError` at SDK init instead of silently falling back to HTTP with an info log. gRPC transport remains on the roadmap but is not yet implemented. Unset the env var to use HTTP. See https://docs.nullrun.io/reference/sdk-api#transport
+
+### Migration
+
+- Replace `runtime.start_recording(workflow_id, metadata=...)` with a dashboard navigation or `nullrun.status()` introspection.
+- Remove any `NULLRUN_USE_GRPC` env var from deployment configs (Docker compose, k8s manifests, systemd units).
+- Catch `RuntimeError` at SDK init if you want to keep the env var as a feature flag — but the recommended path is to unset it.
+
+---
+
 ## [0.7.8] - 2026-06-28
 
 Additive patch on top of 0.7.7. Converts two silent fail-OPEN footguns

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "nullrun"
-version = "0.7.8"
+version = "0.8.0"
 # Long form used by PyPI page meta-description and search snippets.
 # Kept under the 200-char preview threshold so the full line is visible
 # without an "expand" click. Keywords are matched against likely search

diff --git a/src/nullrun/__version__.py b/src/nullrun/__version__.py
@@ -1,4 +1,4 @@
 """NullRun Platform SDK."""
 
-__version__ = "0.7.8"
+__version__ = "0.8.0"
 __platform_version__ = "1.0.0"
diff --git a/src/nullrun/instrumentation/auto.py b/src/nullrun/instrumentation/auto.py
@@ -476,6 +476,40 @@ def _bedrock_extractor(body: bytes, status: int) -> ExtractedUsage | None:
 }
 
 
+def _extract_model_from_request_body(request: httpx.Request) -> str | None:
+    """2026-06-28 (Issue 2 fix): fall back to the ``model`` field embedded
+    in the LLM request body when the response body extractor returned
+    ``None`` for ``model``.
+
+    The user typically passes ``ChatOpenAI(model="gpt-4.1-mini")`` and
+    that string appears in the request body's ``model`` field — even if
+    the response omits it (streaming edge cases, Responses API,
+    middleware that strips model from responses). Returning the
+    request-side model keeps the SDK's cost event attributable to the
+    real catalog entry (``gpt-4.1-mini`` substring → 400 microcents /
+    1M input in ``MODEL_RATES``) instead of falling through to
+    ``DEFAULT_RATE`` ($0 per call).
+
+    Returns ``None`` if the body is not JSON, has no ``model`` field,
+    or has an empty ``model``. Callers must treat the result as
+    optional and still surface the SDK's "missing model" warning when
+    both response and request lookups fail.
+    """
+    try:
+        body = request.content
+        if not body:
+            return None
+        payload = json.loads(body)
+    except (json.JSONDecodeError, ValueError):
+        return None
+    if not isinstance(payload, dict):
+        return None
+    val = payload.get("model")
+    if isinstance(val, str) and val:
+        return val
+    return None
+
+
 def _match_extractor(host: str) -> Callable[[bytes, int], ExtractedUsage | None] | None:
     """Return the extractor for `host`, or None if the host is not a known
     LLM endpoint. We match exact host first, then any subdomain (e.g.
@@ -683,6 +717,26 @@ def _emit(
         # ``coverage_seen`` view in the dashboard would be empty for
         # the majority of customers.
         _safe_bump_coverage(self._runtime, "_coverage_seen", host)
+
+        # 2026-06-28 (Issue 2 fix): if the extractor returned ``None``
+        # for ``model`` (response body lacked the field — observed for
+        # some OpenAI Responses-API and Anthropic streaming edge cases),
+        # fall back to the model name embedded in the request body. The
+        # backend cost pipeline logs WARN and falls back to DEFAULT_RATE
+        # (≈$0 per call) whenever ``model`` is missing — see
+        # ``backend/src/cost/pipeline.rs:164`` ``unwrap_or("default")``
+        # and ``backend/src/cost/constants.rs::rate_for``. Without
+        # this fallback, every gpt-4.1-mini / claude-haiku-4 call where
+        # the response body omits ``model`` was being silently
+        # zero-billed. Request body is the next authoritative source:
+        # SDK users pass ``model="gpt-4.1-mini"`` in the ChatOpenAI
+        # constructor.
+        model_from_response = usage.get("model")
+        model_for_event = (
+            model_from_response
+            or _extract_model_from_request_body(request)
+        )
+
         try:
             # Phase 4.1: lift cache / reasoning / finish / tool names
             # out of raw_usage onto the event itself. The backend's
@@ -695,7 +749,7 @@ def _emit(
                     "type": "llm_call",
                     "provider": _provider_label(host),
                     "host": host,
-                    "model": usage.get("model"),
+                    "model": model_for_event,
                     "tokens": usage.get("total_tokens", 0),
                     "input_tokens": usage.get("prompt_tokens", 0),
                     "output_tokens": usage.get("completion_tokens", 0),
@@ -1142,27 +1196,46 @@ def _emit_from_agents_result(runtime: Any, result: Any) -> None:
                     name = (tc.get("function") or {}).get("name")
                     if name:
                         tool_names.append(name)
-            runtime.track(
-                {
-                    "type": "llm_call",
-                    "provider": "openai_agents",
-                    "model": span.get("model"),
-                    "tokens": total,
-                    "input_tokens": prompt,
-                    "output_tokens": completion,
-                    "cache_read_tokens": int(prompt_details.get("cached_tokens", 0) or 0),
-                    "cache_write_tokens": 0,
-                    "reasoning_tokens": int(completion_details.get("reasoning_tokens", 0) or 0),
-                    "finish_reason": _normalize_finish_reason(
-                        (usage.get("choices") or [{}])[0].get("finish_reason")
-                        if usage.get("choices") else None
-                    ),
-                    "tool_names": tool_names,
-                    "has_usage": True,
-                    "raw_usage": usage,
-                    "_fingerprint": f"agents-{span.get('id', id(span))}",
-                }
+            # Audit 2026-06-28 (SDK↔backend wire): ``span.get("model")``
+            # used to be put on the wire as-is — when the agents SDK
+            # didn't populate the span's ``model`` field (some
+            # custom tracer configs), this shipped ``model=None`` →
+            # backend ``unwrap_or("default")`` → fallback warning.
+            # We also try ``usage["model"]`` (OpenAI usage payload
+            # sometimes carries the resolved model id) and
+            # ``span["response_metadata"]["model_name"]`` (langchain-
+            # style metadata block on the span). Empty / None are
+            # dropped — only set ``model`` when we have a real value.
+            span_model = (
+                span.get("model")
+                or (usage.get("model") if isinstance(usage, dict) else None)
+                or (
+                    (span.get("response_metadata") or {}).get("model_name")
+                    if isinstance(span.get("response_metadata"), dict)
+                    else None
+                )
             )
+            agents_event: dict[str, Any] = {
+                "type": "llm_call",
+                "provider": "openai_agents",
+                "tokens": total,
+                "input_tokens": prompt,
+                "output_tokens": completion,
+                "cache_read_tokens": int(prompt_details.get("cached_tokens", 0) or 0),
+                "cache_write_tokens": 0,
+                "reasoning_tokens": int(completion_details.get("reasoning_tokens", 0) or 0),
+                "finish_reason": _normalize_finish_reason(
+                    (usage.get("choices") or [{}])[0].get("finish_reason")
+                    if usage.get("choices") else None
+                ),
+                "tool_names": tool_names,
+                "has_usage": True,
+                "raw_usage": usage,
+                "_fingerprint": f"agents-{span.get('id', id(span))}",
+            }
+            if span_model:
+                agents_event["model"] = span_model
+            runtime.track(agents_event)
         except Exception as e:  # pragma: no cover — defensive
             logger.debug("NullRun: agents track failed: %s", e)
 

diff --git a/src/nullrun/instrumentation/autogen.py b/src/nullrun/instrumentation/autogen.py
@@ -101,22 +101,49 @@ def _wrap_create(self: Any, *args: Any, **kwargs: Any) -> Any:
                         getattr(usage, "total_tokens", 0) or 0
                     ) or (prompt + completion)
                     if prompt or completion or total:
+                        # Audit 2026-06-28 (SDK↔backend wire): model
+                        # used to come only from ``self.model`` with a
+                        # bare ``None`` fallback — if the autogen client
+                        # didn't expose a ``model`` attribute (some
+                        # subclass / wrapper / mock provider), the wire
+                        # event carried ``model=None`` → backend
+                        # ``unwrap_or("default")`` → fallback warning →
+                        # DEFAULT_RATE. Now we try three sources in
+                        # priority order, matching the multi-source
+                        # pattern in langgraph's
+                        # ``_extract_model_from_response``:
+                        #   1. ``self.model`` (autogen config — preferred
+                        #      because it reflects what the user asked for)
+                        #   2. ``result.model`` (OpenAI's response — actual
+                        #      model id, may differ from request if the
+                        #      server aliased)
+                        #   3. None — let the runtime-level warning log
+                        #      (added 2026-06-28 in runtime.py:track())
+                        #      surface which path produced the gap.
+                        model = (
+                            getattr(self, "model", None)
+                            or getattr(result, "model", None)
+                        )
                         try:
-                            runtime.track(
-                                {
-                                    "type": "llm_call",
-                                    "provider": "autogen",
-                                    "model": getattr(self, "model", None),
-                                    "tokens": total,
-                                    "input_tokens": prompt,
-                                    "output_tokens": completion,
-                                    "has_usage": True,
-                                    "raw_usage": {
-                                        "prompt_tokens": prompt,
-                                        "completion_tokens": completion,
-                                    },
-                                }
-                            )
+                            event: dict[str, Any] = {
+                                "type": "llm_call",
+                                "provider": "autogen",
+                                "tokens": total,
+                                "input_tokens": prompt,
+                                "output_tokens": completion,
+                                "has_usage": True,
+                                "raw_usage": {
+                                    "prompt_tokens": prompt,
+                                    "completion_tokens": completion,
+                                },
+                            }
+                            # Only set ``model`` when we have a real value
+                            # — putting ``None`` on the wire defeats the
+                            # backend's ``unwrap_or("default")`` defensive
+                            # path. Empty string is treated as absent.
+                            if model:
+                                event["model"] = model
+                            runtime.track(event)
                         except Exception as e:  # pragma: no cover
                             logger.debug("autogen create emit failed: %s", e)
                 return result