The blueprint is complete.
Now we build it.

Most projects ask for money to figure out what to build. We already know. Every layer of the analysis engine has been designed, specified, and is ready for development. What follows is the complete system — open for scrutiny before a single line of code is written.

Fund the Build
7 Layers 3 Tiers of Depth Each
Raw Documents
Ingestion
Normalization
Extraction
Graph
Timeline
Legal Mapping
Finding

Raw public documents enter on the left. Structured, confidence-rated, legally-mapped findings come out on the right.

The System, Layer By Layer

Every layer has been designed. Click any layer to see how deep the thinking goes.

The investigation begins before any analysis happens. Every publicly available document related to the Epstein case — court filings, FOIA releases, flight logs, financial records, trial transcripts, civil litigation, deposition exhibits — is identified, located, and brought into the system.

Nothing enters without a record of where it came from. Every document is tagged at intake: its source, its release date, the legal mechanism that made it public, and what type of document it is. The archive is not assumed to be complete. New documents are released continuously through ongoing litigation. The system is designed to receive them.

Intake metadata for each document includes: source type, release date, jurisdiction of origin, document class, redaction status, known completeness flags, and chain of custody from original release to system entry.

The DOJ releases presented a specific ingestion challenge — files named "003.pdf" with no context, no descriptions, no index. Content classification at intake resolves this: documents are typed by content, not by the name they were given on release. Partial documents are flagged and cross-referenced against known complete versions. Where the same document appears across multiple release batches, deduplication logic identifies and merges the canonical version while preserving the provenance trail of each copy.

Redaction status is cataloged as a data field, not a dead end. A document that is forty percent redacted is not forty percent of a document. It is a complete document with forty percent of its content marked absent — and that absence is analytically significant.

Without structured ingestion, the archive is a pile. Millions of pages released without context are, functionally, no more accessible than millions of pages that were never released at all. The dump is the obstacle as much as the volume.

This layer is the foundation everything else depends on. It is also where the investigation makes its first methodological commitment: every finding traces back to a specific document with a specific provenance. Nothing enters the system anonymously. Nothing that cannot be sourced can become a finding. The discipline starts here, at the door.

Documents in this archive were produced across twenty-five years, in multiple countries, by dozens of different institutions. They do not agree on names, dates, places, or spellings. A flight log entry from 2002 and a court filing from 2019 that both reference the same individual will not use the same name. They may not use the same date format. They may reference the same location three different ways.

Normalization is what makes them connectable. It converts the raw archive into a unified, consistent dataset where a person is the same person across every document that mentions them — regardless of how their name was spelled, abbreviated, or rendered in that document.

Normalization operations include: entity disambiguation — resolving name variants, initials, nicknames, and transliterations to canonical identities using cross-reference logic across the full corpus. Date reconciliation — converting relative references, partial dates, and format variations to absolute ISO timestamps. Geographic standardization — resolving location references across jurisdictions, languages, and naming conventions. Document deduplication — identifying and merging duplicate records while preserving full provenance for each copy.

Redacted sections are treated as structured absences. Their position within a document, their approximate length, their relationship to surrounding named entities, and their pattern across documents — all of this is preserved as metadata. A name redacted in seventeen documents across four separate release batches is a data pattern. It is cataloged as such.

The Epstein case is a twenty-five year record produced by adversarial parties across multiple jurisdictions, each with their own conventions, incentives, and levels of cooperation. The inconsistency is not incidental. Names are spelled differently across documents. Dates shift. Locations are described vaguely. Whether this is deliberate or structural does not matter analytically — the effect is the same: the archive resists connection.

Normalization is the act of insisting that the record be consistent. It is, in a real sense, the first investigative act — the refusal to let inconsistency be the end of the inquiry.

Once the archive is normalized, the system reads it — all of it — and identifies every significant element: people, organizations, locations, dates, financial transactions, vessels, aircraft, properties, and legal proceedings. Each element becomes a node. Each document that mentions two elements together becomes a connection between them.

This is where the archive stops being documents and starts being evidence. A name that appears in a flight log, a financial filing, and a court exhibit is not three search results. It is a pattern — and the system treats it as one.

Entity extraction operates across multiple entity classes simultaneously: persons, organizations, locations, financial instruments, vessels and aircraft, real property, legal proceedings, and dates. Each extracted entity is assigned a canonical identifier and linked to every document instance that references it.

Co-occurrence scoring measures the significance of entities appearing together — across documents, across time, across entity classes. Anomaly detection flags: entities appearing frequently in source documents but absent from official findings; documents referencing events without corresponding records in adjacent jurisdictions; financial flows appearing in one document class but absent from expected corroborating filings; named individuals present in primary sources but unnamed or redacted in official proceedings.

Every prior investigation of this case worked on a fragment of the archive. One journalist's sources. One prosecutor's discovery. One lawsuit's exhibits. No single investigation has read everything simultaneously and asked what appears across all of it.

This layer does that for the first time. The patterns it surfaces are not the result of a theory being tested — they are the result of the full record being read. That distinction matters. The investigation follows the evidence. The evidence does not follow the investigation.

The extracted entities and their co-occurrences are assembled into a relationship graph — a map of every documented connection between individuals, organizations, locations, and events in the public record. The graph does not speculate. It reflects exactly what the documents show.

It shows not just that connections exist but their nature — financial, geographic, legal, associative — and their distribution across time. A connection documented once is different from a connection documented across fourteen independent sources over six years. The graph holds that distinction.

Relationship edges are typed by connection class and weighted by evidence density — the number of independent documents supporting each connection. Path analysis identifies chains of connection across the full network: how many documented steps separate any two entities, and what documents establish each step.

Graph topology is analytically significant independent of individual connections. Centrality measures identify who is structurally central to the documented network. Clustering reveals groups of entities with dense internal connections and sparse external ones. Bridge nodes — entities that connect otherwise separate clusters — receive elevated analytical attention. Gaps in the graph are cataloged as carefully as connections: where documentation is absent where it should exist, that absence is flagged.

This is the first time the full documented network of the Epstein case exists as a single analyzable object. Every investigation to date has mapped a neighborhood. This maps the city.

The gap between those two things is not a matter of degree. It is a matter of kind. Patterns that are invisible in any fragment of the archive become visible only when the full record is held simultaneously. The relationship graph is what makes that possible. It is the infrastructure of the investigation — the thing everything after it depends on.

Every event extracted from the archive is placed on a single unified timeline. Where documents agree on when something happened, the event is confirmed. Where they conflict, both versions are retained with their sources. Where something is documented in one record but absent from adjacent records where it would be expected to appear, that absence is marked explicitly.

The timeline does not resolve conflicts by choosing a version. It holds the conflict as data. The question of why two documents disagree on a date is often more investigatively significant than either date.

Timeline construction involves: absolute timestamp assignment for all events, reconciliation of conflicting date references with source attribution for each version, flagging of expected-but-absent records — events that should appear in corroborating documents by institutional logic but do not — and temporal clustering analysis to identify significant concentrations of activity.

Sequence dependency is critical for legal mapping. Most legal frameworks are order-sensitive: a transaction that precedes a legal filing carries different significance than one that follows it. The timeline layer preserves and surfaces these sequences as primary inputs to Layer 06.

Twenty-five years of evidence has never existed on a single timeline. Prosecutors worked within statutes of limitations. Journalists worked within publication cycles. Civil litigants worked within the scope of their specific claims. Nobody has ever held the full sequence simultaneously.

What becomes visible at full sequence is invisible in any fragment: patterns of timing relative to legal proceedings, gaps in documented activity during periods of known significance, sequences that establish or undermine causality. The timeline is where the archive acquires the capacity to tell a story — not one imposed on it, but one that emerges from it.

The facts, relationships, and timelines established by the previous layers are mapped against relevant legal frameworks — federal statutes, state laws, international agreements, and applicable case law. The system asks a specific question for each potential claim type: does the evidence meet the elements the law requires?

This is not legal advice. It is structured analysis. The system identifies where evidence meets, approaches, or falls short of legal thresholds. Human legal review determines what happens next.

Legal mapping is jurisdiction-specific and statute-specific. Each potential claim type — trafficking, conspiracy, financial crime, obstruction, abuse of process — has defined evidentiary elements under applicable law. The system scores evidence density against each element independently: how many independent sources support it, at what confidence level, with what corroboration.

Where evidence meets threshold on all required elements, the finding is flagged for human legal review. Where evidence meets threshold on some elements but not others, the gap is documented with specificity — which elements are supported, which are absent, and what document type would be needed to close the gap. This gap documentation is itself a finding: it describes exactly what would need to exist in the record to support a complete claim.

This layer may produce the most consequential finding of the entire investigation — not that crimes were committed, but that the legal frameworks as written cannot reach the documented conduct.

Statutes of limitations. Jurisdictional gaps. Evidentiary standards that foreclose historical prosecution even where patterns are clear and corroborated. If the law cannot reach what the documents show, that is a finding. It is a finding that belongs to victims, to legislators, and to everyone who has asked why accountability has not followed documentation.

We do not know in advance what this layer will produce. That is the point of building it.

Every finding the system generates passes through a structured review pipeline before it is published. A defined confidence threshold must be met. Findings that implicate named individuals or organizations go through independent legal review. Published findings are structured documents — sourced to specific records, rated by confidence level, with explicit statements distinguishing what is established from what is inferred.

The threshold criteria are published before analysis begins. This is not a policy that can be adjusted after findings are generated. It is a pre-commitment — the intellectual honesty of the entire project, stated in advance and held to without exception.

The publication pipeline operates as follows: system-generated finding → confidence scoring → threshold gate → human editorial review → legal review for named-entity findings → structured publication with full source citation and confidence rating.

  • Tier 1 — Documented: Directly supported by public record. No inference required. Published as established fact with source citation.
  • Tier 2 — Supported: Evidenced with moderate inference. Corroborated across multiple independent sources. Published with explicit inference disclosure.
  • Tier 3 — Indicated: Requires significant inference. Published only with heavy caveat language and only where pattern significance warrants public attention.

Findings below threshold are retained internally. They are not published. They may inform future analysis as additional documents are ingested.

The referral layer activates for Tier 1 and Tier 2 findings: structured summaries are prepared for identified appropriate authorities — specific prosecutors, regulatory bodies, congressional oversight committees — with submission guidance made public so that any member of the public can independently submit findings to the same recipients.

The confidence threshold system is the entire credibility of this project. Without it, this is opinion dressed as analysis. With it, this is analysis that can be examined, challenged, and built upon.

The thresholds are set before the investigation runs. The methodology is public before the first document is processed. The criteria for what gets published, what gets referred, and what stays internal are not determined by what we find — they are determined before we look. That sequence is not procedural. It is the difference between an investigation and a narrative.

This layer is also where the project meets its obligation to victims most directly. A finding that meets threshold and is referred to an appropriate authority is an act. It may not produce prosecution. It may not produce justice in the legal sense. But it is the act of saying: here is what the record shows, here is what the law says about it, and here is the person whose job it is to decide what happens next. That is what this project exists to do.

This is what the machine produces.

Every element of a finding traces back through the layers above — ingested, normalized, extracted, graphed, sequenced, mapped to law, reviewed, and published only when it meets threshold.

DOCUMENT_ID: EF-0244-LOGS
FLAGGED FOR LEGAL REVIEW
CLAIM:

Individual A appears in flight logs alongside Individual B on 14 documented occasions between 1999–2005

SOURCES:
  • 2024 FOIA flight log release
  • SDNY filing 2019
  • Maxwell trial exhibit 44
CONFIDENCE: TIER 1 — DOCUMENTED

Corroborated across 3 independent sources. No inference required.

LEGAL FRAMEWORK:

Assessed against 18 U.S.C. § 1591. Elements present: 3 of 5.

The blueprint is done. Fund the build.

The system above is not theoretical. Every layer has been specified. The design decisions have been made. What does not yet exist is the development time and infrastructure to build it.

That is what this campaign funds.

Support the Investigation
Fund allocation public
Methodology open
Standards set before analysis begins