Skip to content

Optimizations for random functions and methods returning ints or bytes #151520

@zahlman

Description

@zahlman

Feature or enhancement

Proposal:

While investigating the performance of random.Random vs. random.SystemRandom, I noticed some asymmetries that suggest performance enhancements for the former.

Currently, random.getrandbits is implemented (via _random_Random_getrandbits_impl) by filling a raw memory allocation with entropy (looping genrand_uint32) and passing the buffer to _PyLong_FromByteArray which creates its own copy. (Similarly, integer-range functions like randint and randrange are built on top of this, in a reasonable way — perhaps micro-optimizations are possible there, but they don't seem worthwhile to me.)

Then random.randbytes is implemented by calling random.getrandbits and feeding the result to int.to_bytes from Python. This entails two useless copies and additional Python object manipulation overhead (round-tripping raw buffer -> int object -> bytes object, which is also a lot of bit packing/unpacking for the special digit representations within the ints). By contrast, random.SystemRandom uses os.urandom to get bytes directly (from a system call, but no real adaptation is necessary).

A micro-benchmark suggests (weakly) that random.randbytes for large chunks could be made over three times as fast by avoiding the extra work:

$ py3.14 -m timeit --setup 'from random import randbytes' 'randbytes(1000000)'
50 loops, best of 5: 4.21 msec per loop
$ py3.14 -m timeit --setup 's = b"\xff"*1000000' 'int.from_bytes(s, "little").to_bytes(1000000, "little")'
100 loops, best of 5: 2.91 msec per loop

The underlying genrand_uint32, meanwhile, works (implementing the Mersenne Twister algorithm) by refilling an entropy pool in 32-bit units if necessary, extracting a value, and then scrambling a bit further with bit arithmetic. I don't know of a 64-bit version of the algorithm for filling the entropy pool (and it's more complex than I'd consider messing with); but I'm confident the final step can be extended to 64 bits. Nowadays 64-bit architectures are commonplace and this is surely worth taking advantage of.

Suggestions:

  1. Extract the portion of genrand_uint32 that retrieves a value from self->state (roughly the existing function minus the y ^= ... lines at the end) to a helper function (I'll call it _get_entropy here), which can probably be marked inline

  2. Add genrand_uint64 logic, along the lines of:

    uint64_t y = _get_entropy() << 32 | _get_entropy();
    y ^= (y >> 11) & 0xffffffff001fffffULL; /* mask out the 11 bits that would otherwise overflow from high to low */
    y ^= (y << 7) & 0x9d2c56809d2c5680ULL; /* the masks for the left-shifts already handle overflows */
    y ^= (y << 15) & 0xefc60000efc60000ULL;
    y ^= (y >> 18) & 0xffffffff00003fffULL;
    return y;
    

    (This might also be useful for _random_Random_random_impl; is the current use of 26/27 bits from each value necessary?)

  3. Add _genrand_buffer, implementing the loop to populate a buffer currently in _random_Random_getrandbits_impl

  4. Have _random_Random_getrandbits_impl proceed (after the special-case checks) by calling long_alloc, using _genrand_buffer to populate the ->long_value.ob_digit, and masking out the extra bits (this requires changing the math for the allocation size; it might also make sense to modify genrand_uint64 to be able to get a twodigits directly?)

  5. Add a C implementation for random.Random.randbytes that, similarly, populates a new bytes object's allocated memory directly

(Edit: if genrand_uint32 is an implementation detail and doesn't need to be preserved, then 64-bit versions could be used exclusively. In this case, the logic for checking the pool state can be simplified, checking only once per call, taking advantage of the fact that the pool size is even.)

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirperformancePerformance or resource usagetype-featureA feature request or enhancement
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions