Skip to content

Defer provider SDK imports and harden startup launchers#347

Merged
korivi-CraftOS merged 8 commits into
CraftOS-dev:improvement/fix-provider-sdk-importsfrom
namabeeru:codex/fix-provider-sdk-imports
Jun 29, 2026
Merged

Defer provider SDK imports and harden startup launchers#347
korivi-CraftOS merged 8 commits into
CraftOS-dev:improvement/fix-provider-sdk-importsfrom
namabeeru:codex/fix-provider-sdk-imports

Conversation

@namabeeru

@namabeeru namabeeru commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Lazy-load OpenAI and Anthropic SDK imports in the model factory so CraftBot can import/start without provider SDKs before a selected provider path needs them.
  • Keep OpenAI-compatible providers on the existing OpenAI SDK implementation instead of adding a second HTTP client path; if that SDK is missing when such a provider is initialized, raise a provider-specific install error.
  • Add a shared runtime preflight (app.runtime_preflight) used by both run.py and direct app.main startup. It checks startup-critical imports in the exact Python runtime that will run the backend and reports wrong-interpreter installs before the traceback path.
  • Make craftbot.py start detect early child-process failure, remove stale PID files, print the relevant log tail instead of reporting false success, and return a nonzero CLI exit code on failed start / restart.
  • Harden launcher edge cases: avoid duplicate backend preflight after run.py already checked the runtime, ignore stale CRAFTBOT IS READY lines from previous logs, and make install flows fail instead of claiming success when the service does not start.
  • Make the macOS desktop shortcut for source installs open the running frontend only when both frontend and backend respond, or start CraftBot with the Python that created the shortcut when the service is not healthy.
  • Add focused regression tests for lazy provider SDK imports, missing-SDK provider errors, runtime dependency detection, provider SDK exclusion from preflight, app.main preflight ordering, service startup failure reporting, source shortcut generation, stale readiness logs, install failure reporting, and CLI failure exit codes.

Root cause

There are four related failure modes:

  1. agent_core.core.models.factory imported openai and anthropic at module load, so backend startup could fail before the selected provider mattered.
  2. CraftBot can be launched with a different Python than the one used during install. On systems with multiple Python installs, such as macOS with Homebrew Python, users can accidentally run CraftBot with an interpreter that did not receive install.py dependencies. The crash then appears deep inside backend imports after frontend startup has already begun.
  3. craftbot.py start wrote the child PID and printed CRAFTBOT STARTED immediately after spawning run.py, even if the child exited right away. That left users with a desktop shortcut to a server that was not actually running. Loop testing also found the failed start command still exited 0; this PR changes failed start / restart CLI calls to exit 1.
  4. In source mode on macOS, CraftBot.command only opened http://localhost:7925; it did not start the service if nothing was listening there, and it could also open a frontend-only/stale service without verifying the backend.

Why this approach

This avoids introducing a parallel OpenAI-compatible HTTP client. The project already declares and uses the OpenAI SDK for DeepSeek/OpenRouter/Grok/MiniMax/Moonshot request paths, so the safer open-source fix is to import it only when needed and report a clear install error if that selected path is unavailable.

The runtime preflight handles the broader Python mismatch directly. It checks packages that are imported during backend startup, but intentionally leaves provider SDKs out so provider-specific SDK failures happen only when the selected provider path actually needs that SDK.

The service-manager changes keep craftbot.py start, craftbot.py install, and the generated desktop shortcut honest: quick startup failures surface in the terminal with the last log lines, stale PID files are removed, failed CLI starts return nonzero, stale readiness markers are ignored, and the macOS source shortcut opens the existing local service only when both frontend and backend respond.

Validation

Focused checks:

  • /usr/local/bin/python3.10 -m pytest tests/test_craftbot_service.py tests/test_model_factory.py tests/test_run_dependency_check.py -q -> 21 passed.
  • /usr/local/bin/python3.10 -m py_compile craftbot.py tests/test_craftbot_service.py agent_core/core/models/factory.py run.py app/main.py app/runtime_preflight.py tests/test_model_factory.py tests/test_run_dependency_check.py
  • git diff --check

Loop checks:

  • Focused pytest loop: 10 iterations, all passed (21 passed each iteration).
  • python3 run.py --cli with Homebrew Python 3.14: 10 iterations, all exited nonzero with the missing-dependencies message and no traceback.
  • python3 -m app.main --cli with Homebrew Python 3.14: 10 iterations, all exited nonzero with the missing-dependencies message and no traceback.
  • python3 craftbot.py start with Homebrew Python 3.14: 10 iterations, all reported CraftBot failed to start, exited nonzero, did not print CRAFTBOT STARTED, and did not leave a stale PID.
  • Current Python runtime preflight: 10 iterations, all reported no missing imports under /usr/local/bin/python3.10.
  • Imported app.main with openai and anthropic deliberately blocked: 10 iterations, all imported successfully.

Manual smoke:

  • /usr/local/bin/python3.10 craftbot.py start --cli started successfully, craftbot.py status reported the PID running, and /usr/local/bin/python3.10 craftbot.py stop stopped it cleanly.

Broader baseline:

  • /usr/local/bin/python3.10 -m pytest tests -q -> 107 passed, 4 unrelated failures in tests/test_trigger_router_and_parking.py; the fake LLM there does not accept the newer prompt_name keyword.
  • /usr/local/bin/python3.10 -m pytest -q still fails during collection in generated/template living-ui test folders:
    • app/data/living_ui_modules/auth/backend/tests/test_auth.py imports models as a top-level module.
    • app/data/living_ui_template/backend/tests conflicts with the root tests.conftest import path.

@namabeeru namabeeru changed the title [codex] Handle missing OpenAI SDK for compatible providers Defer provider SDK imports and detect wrong Python runtime Jun 26, 2026
@namabeeru namabeeru marked this pull request as ready for review June 26, 2026 15:19
@namabeeru namabeeru marked this pull request as draft June 27, 2026 00:10
@namabeeru namabeeru changed the title Defer provider SDK imports and detect wrong Python runtime Defer provider SDK imports and harden startup launchers Jun 27, 2026
@namabeeru namabeeru marked this pull request as ready for review June 27, 2026 00:57
@namabeeru namabeeru marked this pull request as draft June 27, 2026 01:08
@namabeeru namabeeru marked this pull request as ready for review June 27, 2026 01:27
@zfoong zfoong requested a review from korivi-CraftOS June 27, 2026 08:03
@zfoong

zfoong commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

@namabeeru Can you point this PR to the improvement/fix-provider-sdk-imports branch?
@korivi-CraftOS Can you do a clean installation test on Mac?

@namabeeru namabeeru changed the base branch from main to improvement/fix-provider-sdk-imports June 28, 2026 01:30
@namabeeru

Copy link
Copy Markdown
Contributor Author

Done, this PR now targets improvement/fix-provider-sdk-imports.

korivi-CraftOS added a commit that referenced this pull request Jun 29, 2026
Lazy-load OpenAI/Anthropic SDKs in the model factory; raise a friendly install hint when a needed SDK is missing. Clean portion of #347 at its final reviewed state.

Co-authored-by: namabeeru <github.body594@passmail.com>
@korivi-CraftOS

korivi-CraftOS commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Thanks @namabeeru really nice work, and the test coverage is appreciated. 🙏

I've merged the model-factory change (the lazy OpenAI/Anthropic SDK imports) into main separately in #348, since that part is clean and self-contained. I took it at its final state i.e. without the earlier HTTP-fallback client you later removed and credited you as co-author.

I'd like to hold the rest of this PR for a couple of revisions before merging:

runtime_preflight.py Good idea, but if the dependency probe fails for a reason other than missing packages (interpreter not found, JSON decode error, odd output buffering), the code treats it as "all deps missing" and calls sys.exit(1). That can hard-block startup on a healthy machine. Could you make an inconclusive/failed probe warn and continue, and only exit when deps are genuinely confirmed missing? Also curious about the reasoning behind the fixed 60s timeout.

craftbot.py The honest exit-code fix is great. But BACKEND_URL = "http://localhost:7926" is hardcoded, while the port is configurable via --backend-port, so it breaks on a custom port could you derive it from the port arg? Minor: "CRAFTBOT IS READY" is now duplicated in three places (here, installer/api.py, run.py) worth a shared constant.

The macOS shortcut changes look fine. Once the two items above are addressed I'm happy to merge the remainder. Thanks again!

@korivi-CraftOS korivi-CraftOS merged commit a86a437 into CraftOS-dev:improvement/fix-provider-sdk-imports Jun 29, 2026
korivi-CraftOS added a commit that referenced this pull request Jun 29, 2026
This reverts commit a86a437.

Holding #347 pending revisions discussed on the PR: the runtime_preflight
probe should warn-and-continue (not sys.exit) on an inconclusive result,
and craftbot.py BACKEND_URL must derive from --backend-port instead of a
hardcoded 7926. The clean model-factory change was taken separately in #348.
@korivi-CraftOS

Copy link
Copy Markdown
Collaborator

Quick follow-up @namabeeru: this PR briefly got merged into the integration branch in full by mistake, and I've now reverted that (the branch no longer carries these changes). The model-factory portion is already in via #348. Whenever you have a chance to address the two items above (the runtime_preflight warn-and-continue, and deriving the backend port from --backend-port), please open a follow-up PR and I'll get the rest in. Thanks again, and sorry for the back-and-forth!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants