GG-49287 Bulk decode of primitive arrays (vectorized float-vector deserialization)#78
Draft
PakhomovAlexander wants to merge 1 commit into
Draft
GG-49287 Bulk decode of primitive arrays (vectorized float-vector deserialization)#78PakhomovAlexander wants to merge 1 commit into
PakhomovAlexander wants to merge 1 commit into
Conversation
…erialization)
PrimitiveArray.to_python_not_null built the Python list element-by-element over the
ctypes array (`[obj.data[i] for i in range(len)]`). For large float arrays — e.g. a
1536-d vector value per vector-query result row — this is the dominant client cost:
cProfile of a kNN query loop showed ~41% of total client CPU in this one comprehension
(~77M per-element ctypes reads for 500 queries x 100 results).
Replace it with a single bulk buffer decode: memoryview(obj.data).cast('B').cast(fmt).tolist().
The cast through bytes is needed because ctypes LittleEndianStructure reports an explicit
byte-order format (e.g. '<f') that memoryview.tolist() rejects; we strip the order prefix
and reinterpret in native order (correct on little-endian platforms). Generic across all
primitive array types (float/int/long/...).
Measured (dbpedia-openai 1536-d vector queries, single client, identical results /
key-checksums): 1.4x @ k=10 -> 1.9x @ k=800 on the vector-query result path; the win grows
with the number of result rows deserialized. Part of GG-49287 (pygridgain deserialization
fast path).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
PrimitiveArray.to_python_not_nulldeserializes a primitive array element-by-element:```python
return [ctypes_object.data[i] for i in range(ctypes_object.length)]
```
For large float arrays — e.g. the 1536-d vector returned as the value of each vector-query
result row — this is the dominant client cost. cProfile of a kNN query loop (500 queries × 100
results) against a live node put ~41% of total client CPU in this one list comprehension
(~77M per-element ctypes accesses).
Change (one method)
Bulk buffer decode in a single C-level pass:
```python
mv = memoryview(ctypes_object.data)
return mv.cast('B').cast(mv.format.lstrip('<>=!@')).tolist()
```
The cast through bytes is required because a ctypes
LittleEndianStructurearray reports anexplicit byte-order format (e.g.
'<f') thatmemoryview.tolist()rejects; we strip theorder prefix and reinterpret in native order (correct on little-endian platforms). Generic
across all primitive array element types.
Result
Measured on dbpedia-openai 1536-d vector queries (single client, identical results —
key-checksums unchanged at every efSearch):
The win grows with the number of result rows deserialized. Combined with not over-fetching
(request k results with the efSearch beam carried on the negative-threshold sentinel rather
than k=efSearch), this is what closes most of the vector-query client gap to RediSearch in the
high-recall regime.
Notes
Draft. Validated end-to-end via the ann-benchmarks pygridgain harness; CI runs the unit suite.