Skip to main content
Federal Validation

The Framework Detected Structural Severity Years Before Each Failure

We took federal data from SEC filings, FDIC call reports, NTSB investigations, Congressional testimony, and bankruptcy court records. We mapped it through the Four Frequencies framework. Then we asked one question: did the structural severity scores escalate before the crisis, or only after?

In all six cases, the answer was the same. The framework read escalating structural conditions years before the failure event, using only publicly available data. The lead time ranged from 4.8 years (SVB) to 15 years (drug shortages). In every case, standard metrics and supervisory assessments showed no comparable signal during the same period.

Validation Summary
6
Sectors Tested
800+
Time Series Records
80-97%
Federal Data Provenance
4.8-15yr
Temporal Lead Range
100%
Sensitivity Stability

Six Cases. Six Sectors. Six Distinct Failure Architectures.

Banking Critical

Silicon Valley Bank

Connected Crisis: All four frequencies above 0.90 at failure

SVB put 82% of its securities into bonds it could not sell, removed the hedges that protected them, and held 94% uninsured deposits that could leave instantly. The CRO seat was empty for eight months. The framework read all four conditions escalating simultaneously, reaching Critical while the bank's supervisory rating was still Satisfactory.

Peak: 0.83 Critical
Keystone: Thinness
Lead: 4.8 years
Federal: 97%
View interactive backtest → Read full analysis →
Healthcare / Pharma Moderate

U.S. Generic Drug Shortage

Chronic Erosion: 25-year structural degradation with no single trigger event

What hospitals pay for drugs kept going up. What manufacturers get paid to make them barely moved. That pricing squeeze has been continuous since 2001, grinding down generic manufacturing margins until producers started exiting the market. The framework picked up the signal three years before shortage counts spiked above 200.

Peak: 0.54 Moderate
Keystone: Absence
Lead: ~15 years
Federal: 85%
View interactive backtest → Read full analysis →
Aerospace Severe

Boeing 737 MAX

Cascading Crisis: Permission leads, then each frequency amplifies the next

Boeing was allowed to certify its own safety, and the scope of what it certified kept growing. MCAS went from a minor system to a flight control that could push the nose down based on a single sensor with no pilot override. The framework tracked the cascade: weakened oversight led to safety gaps, which got locked into concentrated design risk, while internal metrics showed everything on track.

Peak: 0.80 Severe
Keystone: Permission
Lead: 5 years
Federal: 93%
View interactive backtest → Read full analysis →
Rail / Transportation High

East Palestine Derailment

Chronic + Acute Revelation: Peak severity preceded the derailment

Norfolk Southern cut 33% of its train crews in three years. The operating ratio improved. Wall Street celebrated. But every input that drives operating ratio improvement also degrades safety capacity. The framework's composite peaked in 2022 at 0.70. The derailment happened in 2023. The structural condition was at its worst before anyone outside the industry was paying attention.

Peak: 0.70 High
Keystone: Management
Lead: 6 years
Federal: 92%
View interactive backtest → Read full analysis →
Cybersecurity Critical

CrowdStrike Global Outage

Instantaneous Cascade: 78 minutes from update to 8.5 million blue screens

CrowdStrike got more fragile by getting more successful. Every new Fortune 500 customer (80 to 298 over five years) expanded the blast radius of a single deployment failure. The testing gaps were there the whole time, but they were invisible until they mattered. The composite severity rose every single year, driven entirely by market penetration, not degradation.

Peak: 0.80 Critical
Keystone: Thinness
Lead: 5 years
Federal: 81%
View interactive backtest → Read full analysis →
Commercial Real Estate Severe

WeWork

Narrative Implosion: The S-1 filing collapsed a $47B valuation narrative

In 2016, WeWork lost $430 million on $436 million in revenue and carried a $16.9 billion valuation. The framework scored that gap at 0.99 from year one. After the founder left, governance reformed, and the Permission signal dropped. But the company still went bankrupt because $47.2 billion in 15-year leases cannot be fixed by board reform alone.

Peak: 0.75 Severe
Keystone: Management
Lead: 7 years
Federal: 80%
View interactive backtest → Read full analysis →

How the Backtest Works

Each case follows the same process. We identify the federal data sources that map to each of the four frequencies. Revenue and asset concentration map to Thinness. Staffing patterns and capability metrics map to Absence. Governance structures and regulatory enforcement map to Permission. The gap between reported metrics and actual conditions maps to Management.

We normalize every metric to a 0-to-1 severity scale with fixed bands that do not change between cases: Elevated (0.25), Moderate (0.40), High (0.55), Severe (0.70), Critical (0.80). The composite severity is a weighted average where the keystone frequency (the one that drives the failure architecture) receives the highest weight. Each case documents why a specific frequency is the keystone and states explicit conditions that would disprove the classification.

The data provenance is classified into two tiers. FEDERAL-VERIFIED means the number comes from a filing or database that an organization submitted under legal obligation to a federal agency. DOCUMENTED-VERIFIED means the number comes from official company communications or industry analysis. Between 80% and 97% of the data across all six cases is FEDERAL-VERIFIED.

Sensitivity Analysis

The most common methodological challenge to any scoring framework is that the analyst chose the normalization ranges and weights to produce the desired result. We tested this directly.

We shifted all normalization ranges by plus and minus 20% and ran 25 parameter combinations per case (5 normalization shifts times 5 weight perturbations). The temporal lead finding holds at 100% across all combinations. The trajectory shape holds at 100%. The severity band at the crisis point holds at 100%. The findings are not artifacts of where we set the boundaries.

Federal data sources across six cases: FDIC Call Reports, FOMC target rates, SEC 10-K/S-1/10-Q filings, Federal Reserve OIG, NTSB investigation reports, FAA OIG audits, FRA enforcement databases, Congressional hearing testimony, CISA alerts, GAO reports, BLS Producer Price Index, BLS employment data, USITC import data, SoftBank Group annual reports, U.S. Bankruptcy Court filings. Full provenance documentation available for each case.

Frequently Asked Questions About the Historical Backtest

What is the Four Frequencies historical backtest?

The historical backtest applies the Four Frequencies framework to six documented failures across six sectors, using federal data (SEC filings, FDIC call reports, NTSB investigations, congressional testimony) to measure structural severity over time. The backtest tests whether the framework can detect escalating structural conditions before each crisis event. In all six cases, the framework identified elevated severity years before the failure occurred.

What data sources does the backtest use?

Between 80% and 97% of the data in each backtest case comes from federal sources: SEC 10-K and S-1 filings, FDIC call reports, NTSB investigation reports, FAA Office of Inspector General audits, FRA enforcement databases, Congressional hearing testimony, CISA alerts, and bankruptcy court filings. These are numbers organizations filed under legal obligation. The remaining data comes from official company communications and industry analysis, classified as DOCUMENTED-VERIFIED.

How far in advance did the framework detect structural severity?

Temporal lead varied by case: 4.8 years for SVB, approximately 15 years for the drug shortage crisis, 5 years for Boeing 737 MAX, 6 years for East Palestine, 5 years for CrowdStrike, and 7 years for WeWork. In every case, the composite severity score crossed into elevated territory well before the crisis event, using only publicly available federal data.

How does the sensitivity analysis work?

The sensitivity analysis tested multiple parameter combinations per case by varying normalization ranges and composite weights across a range of configurations. The key findings (temporal lead, trajectory shape, severity band at crisis) held at 100% stability across all combinations. The absolute severity numbers shift with different parameters, but the structural reading does not change.

What are the six failure architectures identified by the backtest?

The framework identified six structurally distinct failure architectures: Connected Crisis (SVB), Chronic Erosion (Drug Shortage), Cascading Crisis (Boeing), Chronic plus Acute Revelation (East Palestine), Instantaneous Cascade (CrowdStrike), and Narrative Implosion (WeWork). The fact that the framework assigns different keystones, different weights, different temporal signatures, and different failure modes to each case confirms it is reading the structure rather than projecting a template.

What is a keystone frequency?

The keystone frequency is the structural condition whose removal from the architecture would prevent the failure mode from existing at all. It is identified through a counterfactual test: if this frequency were at baseline, would the failure still have occurred? The keystone is not necessarily the highest-scoring frequency at the crisis point. It is the frequency that determines whether the failure mode is structurally available.

Can the backtest predict future failures?

Phase 1 is explicitly retrospective validation. It claims structural legibility, not prediction: the ability to read the conditions that precede failure from publicly available federal data. Prospective application is the Phase 2 objective, where the framework will read live sector conditions using the same data sources. The backtests earn the right to make that attempt by first demonstrating correct readings of known failures.

What would disprove the backtest findings?

Each case includes explicit falsification conditions that would disprove the structural reading. These are specific, testable statements about the data relationships the framework relies on. If those relationships do not hold, the reading fails. The existence of falsification conditions means the framework exposes itself to failure rather than insulating itself.


These backtests validate the framework against federal data. The same structural vocabulary applies to organizations that are still operating.

The backtests are retrospective. The framework also works prospectively.

The Structural Diagnostic Subscribe to The Frequency Report