Dev/improves - full dwarf support by buzzer-re · Pull Request #8 · buzzer-re/ToCode

buzzer-re · 2026-06-20T05:20:33Z

No description provided.

Two quality gaps are addressed: 1. DWARF source info was ignored. A new shared backends/dwarf.py reads per-function source file/dir/line and type definitions via pyelftools. When a binary has debug info, the export tree now mirrors the original source files and directories (src/raw/<dir>/<file>.c) instead of only call-graph clusters; functions without debug info keep the cluster fallback. Each function also gets an `origin: file:line` annotation and decl_file/decl_dir/decl_line in functions.json / function-index.json. 2. Types were never exported (the header carried only a hardcoded pseudo-type prelude). Backends now expose a types() catalog and per-function return/param/local types, sourced from the tool rather than invented: - angr: DWARF DIEs (structs/unions/enums/typedefs) + subprogram protos - IDA: local type library (TIL) + function tinfo (return/params/calltype) - radare2: best-effort tsj/tuj/tej + tc (DWARF support is limited) These are written to a new types.json and an include/<bin>.types.h of real C declarations, included by the main header. functions.json now carries return_type/params/locals/calltype. All wiring is defensive: missing debug info or an older session degrades to the previous behavior. Adds DWARF unit/integration tests and an exporter source-grouping/types test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

pyelftools caches every DIE it parses onto its CompileUnit. Walking all compilation units of a large .debug_info therefore accumulated the entire DWARF tree in memory (multiple GB) and could OOM-kill analysis on big binaries. Release each unit's DIE cache once we are done with it, capping memory to a single unit at a time. On a large real-world shared object this dropped peak RSS from ~7 GB (OOM) to ~650 MB with no loss of recovered data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The DWARF reader only inspected attributes directly on each subprogram DIE. For C++ and optimized code the concrete function DIE typically omits decl_file/decl_line/type and instead references a DW_AT_specification or DW_AT_abstract_origin DIE that carries them. As a result the vast majority of functions in a C++ binary recovered no source file and fell back to the generic cluster layout. Follow those references (decl_file is resolved against the file table and comp_dir of the CU that actually holds it, which may differ from the concrete DIE's CU). On a large real-world C++ shared object this raised source-file coverage from ~3.9k to ~38.7k of ~39k functions, with memory still bounded (~0.65 GB). Adds a C++ regression test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The IDA backend used APIs incorrectly, so source files/types never came through and every function fell back to the cluster layout: - types() iterated the numbered-type ordinal space and rendered with tinfo_t._print(), which is not exposed; switch to the idiomatic db.types (named types) and render via serialize()+idc_print_type(), falling back to tinfo_t.dstr(). - source recovery only probed the function entry ea, which is frequently not annotated; scan the first instruction heads (ida_bytes.next_head) for the first get_sourcefile()/get_source_linnum() hit. - prototype recovery called the non-existent ._print() on types; use get_func_details() with rettype/arg .dstr(). Adds fake-IDA unit tests covering these paths (IDA cannot run in CI). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

IDA's DWARF plugin defaults to DWARF_IMPORT_LNNUMS=NO, so it imported types but no source file/line information; get_sourcefile() and get_source_linnum() returned nothing and every function fell back to the cluster layout. Pass -Odwarf:import_lnnums=1 when creating the database so the loader imports source locations, which the backend already reads. Verified on a real IDA 9.3 install: functions now resolve to their source file/line and group by source file, while recovered types (structs/enums) continue to populate the type catalog. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

pyelftools ships with the angr extra via cle, so it is absent from the type checking environment and mypy raised import-not-found on dwarf.py. Mirror the existing angr.* override to ignore its missing imports. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

buzzer-re and others added 6 commits June 19, 2026 09:29

buzzer-re merged commit 4f3502f into main Jun 20, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/improves - full dwarf support#8

Dev/improves - full dwarf support#8
buzzer-re merged 6 commits into
mainfrom
dev/improves

buzzer-re commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buzzer-re commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant