ChainGenius-Lite
structure-first transaction fingerprinting

Theory

Mathematical foundations of structure-first transaction analysis

Execution Structure

Every EVM transaction induces a deterministic execution trace. By attributing logs to the trace frame that emitted them, each transaction defines two mathematical objects:

  • A rooted unordered labeled tree — the call hierarchy
  • An edge-labeled directed multigraph — the address topology

Tree nodes and multigraph edges carry the label trace_type[topic0s], where trace_type ∈ {call, delegatecall, staticcall, callcode, create, create2, selfdestruct} and topic0s is the list of event signatures emitted by that frame, in execution order.

Factorization

The label multiset can be factored out as a canonical sorted array — the base space. Tree and graph then reduce to unlabeled structures with index arrays indicating label placement. This completes the factorization:

  • Behavior inventory (which operations occurred)
  • Call nesting (how operations are nested)
  • Address flow (how addresses interact)

become independent components coupled only through compatible label placement.

Implementation note: ChainGenius-Lite does not perform this full factorization. Labels remain attached to tree nodes and graph edges, treating call[Transfer] and call[Approval] as distinct types rather than instances of a shared unlabeled structure. This choice privileges pattern utility over maximal decomposition.

Quotient Spaces

Quotienting by tree isomorphism forgets address identity while preserving call structure. Two transactions with identical call hierarchies — same nesting, same labels — map to the same equivalence class regardless of which contracts were involved.

Quotienting by graph isomorphism forgets call order while preserving address flow. Two transactions with identical address topologies — same flow patterns, same edge labels — map to the same equivalence class regardless of how the calls were nested.

The canonical forms (canon_tree, canon_graph) define coordinates in the product of these quotient spaces, fibered over the shared label set.

Realizability

Not every point in the product space is realizable. A tree and graph are compatible only if they admit a common realization: an actual execution whose call structure canonicalizes to the tree and whose address topology canonicalizes to the graph.

The set of valid (tree, graph) pairs forms a constraint set — the realizable equivalence classes of EVM executions. Most trees admit multiple graph realizations (different address flows can produce the same call structure). Some graphs admit multiple tree realizations (different call orderings can produce the same address topology), though this is less common.

Containment Posets

The trace_path hierarchy induces a subtree at each node. Canonicalizing these subtrees defines a containment relation on each quotient space:

Pattern A contains pattern B if B arises as the canonical form of a subtree (or induced subgraph) within some representative of A.

This yields two containment posets — one for tree equivalence classes, one for graph equivalence classes — that refine the constraint set with compositional structure:

  • Patterns decompose into sub-patterns
  • Patterns that share sub-patterns cluster in the poset
  • Recursive or repeated structures become visible

The posets are not lattices in general: joins may not exist when incompatible sub-patterns admit no common parent.

Asymmetry of the Posets

The tree poset is formally redundant. The canonical tree string is a complete encoding — all subtrees are recoverable by parsing. The containment relation adds no information beyond what the canonical form already provides.

The graph poset is not redundant. Each subtree induces a subgraph over the addresses active at that level, but these induced subgraphs are not recoverable from the root graph's canonical form alone. The root graph records which addresses interacted and how; it does not record which subset of interactions occurred at each level of call nesting.

However, the graph poset — the set of induced subgraphs at each trace path — together with their shared labels, suffices to reconstruct the tree. The nesting of induced subgraphs reveals the call hierarchy.

In this sense, the graph poset is the more fundamental object: it encodes both address topology and, implicitly, call structure. The tree is a convenient projection that makes call nesting explicit at the cost of discarding address identity.

Practical Implications

This decomposition enables queries that would otherwise require expensive trace-level scanning:

  • "Find all transactions with this call structure, regardless of which contracts" — tree lookup
  • "Find all transactions with this address flow, regardless of call ordering" — graph lookup
  • "Find all transactions where pattern X appears as a sub-component" — containment query
  • "What graph realizations exist for this tree pattern?" — fiber enumeration

The canonical forms are content-addressed (SHA-256 of the canonical string), making equality checks O(1) and enabling efficient indexing across the full transaction history.

Notation

Symbol Meaning
tree_hash SHA-256 of canon_tree string
graph_hash SHA-256 of canon_graph string
trace_path Position in call hierarchy (e.g., /0/2/1)
topic0 Event signature (first log topic)
canon_tree Canonical tree string: trace_type[topics](children...)
canon_graph Canonical graph string: node:label:neighbors;...