GG-49287 Bulk decode of primitive arrays (vectorized float-vector deserialization) by PakhomovAlexander · Pull Request #78 · gridgain/python-thin-client

PakhomovAlexander · 2026-06-19T10:45:14Z

Draft — profiling-driven client-side deserialization fast path (part of GG-49287).

Problem

PrimitiveArray.to_python_not_null deserializes a primitive array element-by-element:
```python
return [ctypes_object.data[i] for i in range(ctypes_object.length)]
```
For large float arrays — e.g. the 1536-d vector returned as the value of each vector-query
result row — this is the dominant client cost. cProfile of a kNN query loop (500 queries × 100
results) against a live node put ~41% of total client CPU in this one list comprehension
(~77M per-element ctypes accesses).

Change (one method)

Bulk buffer decode in a single C-level pass:
```python
mv = memoryview(ctypes_object.data)
return mv.cast('B').cast(mv.format.lstrip('<>=!@')).tolist()
```
The cast through bytes is required because a ctypes LittleEndianStructure array reports an
explicit byte-order format (e.g. '<f') that memoryview.tolist() rejects; we strip the
order prefix and reinterpret in native order (correct on little-endian platforms). Generic
across all primitive array element types.

Result

Measured on dbpedia-openai 1536-d vector queries (single client, identical results —
key-checksums unchanged at every efSearch):

k (results/query)	baseline	this PR	speedup
10	531 q/s	736 q/s	1.39x
100	71 q/s	126 q/s	1.77x
800	9.7 q/s	18.2 q/s	1.88x

The win grows with the number of result rows deserialized. Combined with not over-fetching
(request k results with the efSearch beam carried on the negative-threshold sentinel rather
than k=efSearch), this is what closes most of the vector-query client gap to RediSearch in the
high-recall regime.

Notes

Draft. Validated end-to-end via the ann-benchmarks pygridgain harness; CI runs the unit suite.

…erialization) PrimitiveArray.to_python_not_null built the Python list element-by-element over the ctypes array (`[obj.data[i] for i in range(len)]`). For large float arrays — e.g. a 1536-d vector value per vector-query result row — this is the dominant client cost: cProfile of a kNN query loop showed ~41% of total client CPU in this one comprehension (~77M per-element ctypes reads for 500 queries x 100 results). Replace it with a single bulk buffer decode: memoryview(obj.data).cast('B').cast(fmt).tolist(). The cast through bytes is needed because ctypes LittleEndianStructure reports an explicit byte-order format (e.g. '<f') that memoryview.tolist() rejects; we strip the order prefix and reinterpret in native order (correct on little-endian platforms). Generic across all primitive array types (float/int/long/...). Measured (dbpedia-openai 1536-d vector queries, single client, identical results / key-checksums): 1.4x @ k=10 -> 1.9x @ k=800 on the vector-query result path; the win grows with the number of result rows deserialized. Part of GG-49287 (pygridgain deserialization fast path).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GG-49287 Bulk decode of primitive arrays (vectorized float-vector deserialization)#78

GG-49287 Bulk decode of primitive arrays (vectorized float-vector deserialization)#78
PakhomovAlexander wants to merge 1 commit into
masterfrom
gg-49287-bulk-float-decode

PakhomovAlexander commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PakhomovAlexander commented Jun 19, 2026

Problem

Change (one method)

Result

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant