feat: Geneva enterprise API V2 doc updates#274
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
justinrmiller
left a comment
There was a problem hiding this comment.
Just a few comments to consider.
justinrmiller
left a comment
There was a problem hiding this comment.
Just a few comments to consider.
|
|
||
| <Note> | ||
| Auto-backfill is an enterprise feature. On direct object-storage or local-filesystem | ||
| connections there is no managed agent, so `auto_backfill=True` has no effect and you must run |
There was a problem hiding this comment.
not really a comment on this PR, but a question, do we alert the user that this has no effect when called on non-enterprise?
There was a problem hiding this comment.
We do not. We don't really have a good way to detect it AFAIK. We could do it based on the connection URI, but it's possible that we might want to support creating UDFs directly against object storage even when there's an enterprise deployment.
dantasse
left a comment
There was a problem hiding this comment.
ah shoot I'm too late, sorry - not 100% done but mostly
| ## Providing a Ray cluster | ||
|
|
||
| The LanceDB Helm chart can be configured to deploy a static KubeRay cluster, provision KubeRay clusters on demand per job, or | ||
| use an existing Ray cluster. | ||
|
|
||
| ### Use default LanceDB Enterprise Ray cluster (default) | ||
|
|
||
| By default, LanceDB Enterprise will use a shared, statically provisioned Ray cluster for job execution. | ||
|
|
||
| This can be enabled in the Helm chart by setting the following values. | ||
|
|
||
| ```yaml | ||
| raycluster: | ||
| enabled: true | ||
|
|
||
| global: | ||
| rayclusterUri: "ray://raycluster-kuberay-head-svc.lancedb.svc.cluster.local:10001" | ||
| ``` |
There was a problem hiding this comment.
I'm kind of confused how this interacts with the previous section. Like, "I just said what the default cluster was in the deployment-default bit above - now why do I have to set up raycluster: and global:?"
(I mean, I know this as an experienced user, but it still made me double take, which makes me think that it might be confusing to a new user.)
I guess the point to make is something like "geneva.defaults tells your jobs what cluster to use. But we assume you don't already have a Ray cluster, so you have to deploy one there; here's how you do that."
| Set `global.rayclusterUri` to an empty value to provision ephemeral KubeRay clusters on-demand for each execution job. The default KubeRay cluster configuration | ||
| is specified in `geneva.defaults.cluster`, i.e. | ||
|
|
||
| ```yaml | ||
| geneva: | ||
| defaults: |
There was a problem hiding this comment.
I'm not good at helm but I'd love it if this were super explicit (do you mean ""? or some other empty value?) so like this:
| Set `global.rayclusterUri` to an empty value to provision ephemeral KubeRay clusters on-demand for each execution job. The default KubeRay cluster configuration | |
| is specified in `geneva.defaults.cluster`, i.e. | |
| ```yaml | |
| geneva: | |
| defaults: | |
| Set `global.rayclusterUri` to an empty value to provision ephemeral KubeRay clusters on-demand for each execution job. The default KubeRay cluster configuration | |
| is specified in `geneva.defaults.cluster`, i.e. | |
| ```yaml | |
| global: | |
| rayclusterUri: "" | |
| geneva: | |
| defaults: |
| <Note> | ||
| **Manifests are immutable at the column / view level.** When a transform is registered, its | ||
| manifest is snapshotted onto the column (or view) metadata. Changing the deployment-default | ||
| manifest — or the `GenevaManifest` object in your code — does **not** affect existing columns | ||
| or views: they keep using the snapshot taken at creation time. To move a column or view to a | ||
| new manifest, re-point it to a new (or updated) UDF / chunker / UDTF — for example with | ||
| `alter_columns()` for a column, or by recreating the view. | ||
| </Note> |
There was a problem hiding this comment.
Can we move this, maybe after the @udtf(manifest=) section? I want to get the meat of the process down first (define manifest and attach it to udf) before I can understand the caveats around immutability of this registration.
Also, maybe spell out a little bit more, in code, what this means? so if I do:
@udf(manifest=manifest_a)
def myUdf(...
and run it, then tomorrow I change it to:
@udf(manifest=manifest_b)
def myUdf(...
and manifest_b has completely different dependencies, and I try to run it again, what happens? does it run with manifest_a or manifest_b? and what would the code look like if I want to change it to manifest_b?
|
|
||
| When iterating locally, you often want the workers to run with the *exact* packages from your | ||
| current environment rather than a curated pip list. `Connection.capture_local_environment()` | ||
| zips your workspace (and, optionally, your site-packages), uploads the archives through the |
There was a problem hiding this comment.
maybe this is obvious but: does "workspace" mean "your current working directory, where you are running the script from"?
|
|
||
| 1. Install or upgrade the Geneva Helm chart (see [Helm Deployment](/geneva/deployment/helm/)). | ||
| 2. Forward port 3000 from the geneva-console-ui service: | ||
| 2. In your web browser, connect to the Geneva Console UI using the external ingress/load balancer URI configured in your deployment. |
There was a problem hiding this comment.
ooh exciting
hmm
are we doing this now?
we don't have any authentication on the console; that seems like a problem?
closes https://linear.app/lancedb/issue/ENT-1403/v2-documentation