Kronecker Embeddings: byte-level structured token representations for parameter-efficient language models. Reference implementation.
-
Updated
May 29, 2026 - Python
Kronecker Embeddings: byte-level structured token representations for parameter-efficient language models. Reference implementation.
🐍This is a fast, lightweight, and clean CPython extension for the Byte Pair Encoding (BPE) algorithm, which is commonly used in LLM tokenization and NLP tasks.
An efficient openblocks parser module.
A deterministic byte-level BPE tokenizer in pure Python, built from scratch with strict tests, typed code, and polished docs.
Byte-level 16MB language model — the only byte-level submission in 622+ OpenAI Parameter Golf entries. Built the right size, not shrunk from the wrong one.
Engram without a tokenizer
NeuralPiece: An adaptive byte-level neural tokenizer designed to surpass traditional BPE and Unigram via deep learning chunking.
File security system using remote authentication
An editable, auditable sub-1M-param byte-level model: CRUD fact-edits with provable locality + routing-confidence abstention. Research prototype, honestly scoped.
Non-learned byte-level signal encoder for PyTorch - one modality-agnostic 27-D exact base (anchor rule, Delta = Gray code), losslessly invertible. pip install hsl-embedding
Add a description, image, and links to the byte-level topic page so that developers can more easily learn about it.
To associate your repository with the byte-level topic, visit your repo's landing page and select "manage topics."