Base2026 is a local-first, public-facing source intelligence system for short-form expert videos.
The current public demo focuses on TikTok creators talking about SEO, GEO, AEO, AI search visibility, schema, keyword research, Google, Bing, and related topics.
Live demo: https://aggressorbulkit.online/knowledge/
Public API and AI access: https://aggressorbulkit.online/knowledge/api.html
Base2026 is built to be useful to humans and agents. The public site exposes a read-only, public-safe API surface so AI tools, scripts, researchers, and search systems can inspect the library without scraping the visual UI.
Start here:
- API overview: https://aggressorbulkit.online/knowledge/api.html
- Machine-readable API index: https://aggressorbulkit.online/knowledge/api-index.json
- Agent context file: https://aggressorbulkit.online/knowledge/llms.txt
- Data dictionary: https://aggressorbulkit.online/knowledge/data-dictionary.json
- Public documents JSONL: https://aggressorbulkit.online/knowledge/static/documents.jsonl
- Public passages JSONL: https://aggressorbulkit.online/knowledge/static/passages.jsonl
- Public insight cards JSONL: https://aggressorbulkit.online/knowledge/static/insight_cards.jsonl
The API surface is intentionally public-only. It does not expose raw captions, raw ASR, media files, private QA notes, local databases, credentials, logs, or unreviewed pipeline artifacts.
- converts public creator videos into searchable English source text and evidence passages;
- keeps raw captions, raw ASR output, media, private QA notes, and unreviewed transcripts local/private;
- can expose reviewed polished public source text/transcript as a source-record reading surface where policy allows;
- exports public-safe source records, passages, insight cards, topics, and creator metadata;
- indexes searchable passages with Meilisearch;
- serves a static read-only web UI under
/knowledge/; - generates creator, source, topic, and comparison pages from public JSONL.
- exposes agent-readable public entry points (
/knowledge/llms.txt,/knowledge/data-dictionary.json,/knowledge/api-index.json) so AI tools can inspect the public library without scraping the visual UI.
The public site is designed for source discovery, attribution, comparison, citation, and searchable source reading. It is not a video re-hosting platform and not a raw transcript dump.
Latest deployed release: base2026-social-metadata-h1-ay39-20260618.
Current public export:
- 1,388 source records;
- 1,906 searchable passages;
- 1,623 insight cards;
- 1,052 public insight cards;
- 1,516 topics;
- 1,001 public topics;
- 1,482 sitemap URLs across the latest generated public sitemap files.
Recent readiness checks:
- public export policy: current live release uses reviewed public source text where policy allows and continues to forbid raw/unreviewed transcript dumps;
- publication boundary audit: passing for current changed public-safe files;
- GitHub metadata validation: passing;
- static SEO/social metadata audit: passing for 1,483 indexable HTML files with title, description, canonical URL, H1, JSON-LD schema, and OG/X metadata present; 1,929 noindex utility/detail-state files were intentionally skipped.
Public pages:
/knowledge//knowledge/creators/{handle}.html/knowledge/sources/{item_id}.html/knowledge/topics/{topic_id}.html/knowledge/compare/{topic_id}.html/knowledge/roadmap.html/knowledge/story.html/knowledge/methodology.html/knowledge/privacy.html/knowledge/source-policy.html/knowledge/support.html/knowledge/site-structure.html/knowledge/opt-out.html
Public data files generated locally:
source_records.jsonlpassages.jsonlinsight_cards.jsonltopics.jsonlcreators.jsonlmanifest.json
Compatibility files:
documents.jsonlchunks.jsonl
Agent-readable public files:
/knowledge/llms.txt/knowledge/api.html/knowledge/api-index.json/knowledge/data-dictionary.json
The public search UI also uses a server-side Meilisearch proxy at /knowledge-search/multi-search. The proxy is read-only and injects the public search key server-side; integrations should prefer static JSONL for bulk/offline analysis.
Generated public data and release archives are deploy artifacts, not GitHub source.
Do not commit or publish:
- private research folders;
- local SQLite databases;
- raw captions;
- ASR audio/video;
- cookies, tokens, API keys, SSH keys;
- generated release zips;
- generated
public-data/; - local Meilisearch data;
- logs.
Raw captions, raw ASR, media, private QA notes, and unreviewed transcripts are private/local. Reviewed polished public source text may be shown where policy allows. Public source pages and source detail must show attribution, original links, source context, methodology, and correction/opt-out paths.
This repository is intended to show the public-safe system layer:
- data contracts for public source records, passages, insight cards, topics, and creator metadata;
- static page generation for search, creator, source, topic, comparison, roadmap, methodology, policy, support, and correction/removal pages;
- local worker scripts for export, validation, indexing, packaging, and deployment;
- project memory and runbooks for repeatable operation;
- open-source issue templates and contribution paths.
Private research data, raw platform material, local databases, and deploy archives are intentionally excluded.
creator registry
-> local intake / captions / ASR
-> transcript cleanup and QA
-> passage chunking
-> topic and insight extraction
-> public JSONL export
-> static page generation
-> Meilisearch passage index
-> read-only public UI under /knowledge/
No live LLM call is required during public search.
Export public TikTok data:
python3 scripts/export-public-tiktok.py
python3 scripts/check-public-export-policy.py public-data/tiktokDo not use implicit public-card promotion for GitHub or public release preparation. Public insight cards should come from reviewed source-backed rows, not from one-off export flags.
Index passages into Meilisearch:
python3 scripts/meili-index-public.py --index base2026_public_tiktokPackage a public release:
pwsh -NoProfile -ExecutionPolicy Bypass -File ./scripts/package-public-release.ps1 -ReleaseName <release-name>Current public packages can include reviewed public source text where policy allows. Do not use -IncludeFullTranscripts or --auto-promote-insights for public deploys; raw captions, raw ASR, media, private QA, and unreviewed transcripts stay private.
Deploy to the VPS:
pwsh -NoProfile -ExecutionPolicy Bypass -File ./scripts/deploy-public-vps.ps1 -ReleaseName <release-name>Audit before staging for GitHub:
python3 scripts/audit-publication-boundary.py
python3 scripts/validate-github-metadata.py
pwsh -NoProfile -ExecutionPolicy Bypass -File ./scripts/preflight-github-launch.ps1 -SkipExportPolicy -SkipLiveCheckAgents and maintainers should start from repo files, not chat memory.
Read first:
AGENTS.mddocs/project-memory/ACTIVE_PHASE.mddocs/project-memory/NEXT_ACTION.mddocs/project-memory/STATUS_BOARD.csvdocs/project-memory/PUBLICATION_BOUNDARY.mddocs/GIT_PUBLICATION_AUDIT.md
Base2026 is created and maintained by Alex Yarosh, an independent AI Search Visibility consultant working across SEO, GEO, AEO, local search, entity/trust signals, and public source intelligence.
Alex is building Base2026 as an independent pilot project for studying how public expert knowledge can become searchable, attributable, and useful to both humans and AI systems.
- Website: https://aggressorbulkit.online/
- Live Base2026 demo: https://aggressorbulkit.online/knowledge/
- Contact: offflinerpsy@gmail.com
Useful contributions include:
- extractor adapters for additional public short-form platforms;
- caption and ASR quality benchmarks;
- safer public export validators;
- Meilisearch ranking and faceting improvements;
- static page, schema, sitemap, and accessibility improvements;
- creator correction/removal workflow improvements;
- documentation that makes local operation easier.
Please do not submit raw third-party captions, unreviewed transcripts, media files, cookies, credentials, or private research exports.
Repository code and documentation are licensed under Apache-2.0. Third-party creator videos, platform captions, and original source content are not relicensed by this repository.