reference-class-forecasting

Name: reference-class-forecasting
Author: lyndonkl/claude

$npx mdskill add lyndonkl/claude/reference-class-forecasting

Anchors predictions in historical reality using statistical baselines.

Helps establish base rates and test uniqueness claims before forecasting.
Depends on historical event databases and statistical frequency records.
Decides recommendations by matching current cases to similar past patterns.
Delivers clear validation steps and calculated probability baselines.

SKILL.md

.github/skills/reference-class-forecastingView on GitHub ↗

---
name: reference-class-forecasting
description: Anchors predictions in historical reality by identifying a class of similar past events and using their statistical frequency as a baseline (outside view) before analyzing case-specific details. Use when starting a forecast, establishing base rates, testing "this time is different" claims, or when user mentions reference classes, outside view, base rates, or starting a new prediction.
---

# Reference Class Forecasting

## Table of Contents
- [Interactive Menu](#interactive-menu)
- [Quick Reference](#quick-reference)
- [Resource Files](#resource-files)

---

## Interactive Menu

**What would you like to do?**

### Core Workflows

**1. [Find My Base Rate](#1-find-my-base-rate)** - Identify reference class and get statistical baseline
- Guided process to select correct reference class
- Search strategies for finding historical frequencies
- Validation that you have the right anchor

**2. [Test "This Time Is Different"](#2-test-this-time-is-different)** - Challenge uniqueness claims
- Reversal test for uniqueness bias
- Similarity matching framework
- Burden of proof calculator

**3. [Calculate Funnel Base Rates](#3-calculate-funnel-base-rates)** - Multi-stage probability chains
- When no single base rate exists
- Sequential probability modeling
- Product rule for compound events

**4. [Validate My Reference Class](#4-validate-my-reference-class)** - Ensure you chose the right comparison set
- Too broad vs too narrow test
- Homogeneity check
- Sample size evaluation

**5. [Learn the Framework](#5-learn-the-framework)** - Deep dive into methodology
- Read [Outside View Principles](resources/outside-view-principles.md)
- Read [Reference Class Selection Guide](resources/reference-class-selection.md)
- Read [Common Pitfalls](resources/common-pitfalls.md)

**6. Exit** - Return to main forecasting workflow

---

## 1. Find My Base Rate

**Let's establish your statistical baseline.**

### Step 1: What are you forecasting?
Tell me the specific event or outcome you're predicting.

**Example prompts:**
- "Will this startup succeed?"
- "Will this bill pass Congress?"
- "Will this project launch on time?"

---

### Step 2: Identify the Reference Class

I'll help you identify what bucket this belongs to.

**Framework:**
- **Too broad:** "All companies" → meaningless
- **Just right:** "Seed-stage B2B SaaS startups in fintech"
- **Too narrow:** "Companies founded by people named Steve in 2024" → no data

**Key Questions:**
1. What type of entity is this? (company, bill, project, person, etc.)
2. What stage/size/category?
3. What industry/domain?
4. What time period is relevant?

I'll work with you to refine this until we have a specific, searchable class.

---

### Step 3: Search for Historical Data

I'll help you find the base rate using:
- **Web search** for published statistics
- **Academic studies** on success rates
- **Government/industry reports**
- **Proxy metrics** if direct data unavailable

**Search Strategy:**
```
"historical success rate of [reference class]"
"[reference class] failure statistics"
"[reference class] survival rate"
"what percentage of [reference class]"
```

---

### Step 4: Set Your Anchor

Once we find the base rate, that becomes your **starting probability**.

**The Rule:**
> Treat this base rate as your starting point. Adjust only when you have specific,
> evidence-based reasons from your "inside view" analysis.

**Default anchors if no data found:**
- Novel innovation: 10-20% (most innovations fail)
- Established industry: 50% (uncertain)
- Regulated/proven process: 70-80% (systems work)

**Next:** Return to [menu](#interactive-menu) or proceed to inside view analysis.

---

## 2. Test "This Time Is Different"

**Challenge uniqueness bias.**

When someone (including yourself) believes "this case is special," we need to stress-test that belief.

### The Uniqueness Audit

**Question 1: Similarity Matching**
- What are 5 historical cases that are most similar to this one?
- For each, what was the outcome?
- How is your case materially different from these?

**Question 2: The Reversal Test**
- If someone claimed a different case was "unique" for the same reasons you're claiming, would you accept it?
- Are you applying special pleading?

**Question 3: Burden of Proof**
The base rate says [X]%. You claim it should be [Y]%.

Calculate the gap: `|Y - X|`

**Required evidence strength:**
- Gap < 10%: Minimal evidence needed
- Gap 10-30%: Moderate evidence needed (2-3 specific factors)
- Gap > 30%: Extraordinary evidence needed (multiple independent strong signals)

### Output

I'll tell you:
1. Whether "this time is different" is justified
2. How much you can reasonably adjust from the base rate
3. What evidence would be needed to justify larger moves

**Next:** Return to [menu](#interactive-menu)

---

## 3. Calculate Funnel Base Rates

**For multi-stage processes without a single base rate.**

### When to Use
- No direct statistic exists (e.g., "success rate of X")
- Event requires multiple sequential steps
- Each stage has independent probabilities

### The Funnel Method

**Example: "Will Bill X become law?"**

No direct data on "Bill X success rate," but we can model the funnel:

1. **Stage 1:** Bills introduced → Bills that reach committee
   - P(committee | introduced) = ?

2. **Stage 2:** Bills in committee → Bills that reach floor vote
   - P(floor | committee) = ?

3. **Stage 3:** Bills voted on → Bills that pass
   - P(pass | floor vote) = ?

**Final Base Rate:**
```
P(law) = P(committee) × P(floor) × P(pass)
```

### Process

I'll help you:
1. **Decompose** the event into sequential stages
2. **Search** for statistics on each stage
3. **Multiply** probabilities using the product rule
4. **Validate** the model (are stages truly independent?)

### Common Funnels
- Startup success: Seed → Series A → Profitability → Exit
- Drug approval: Discovery → Trials → FDA → Market
- Project delivery: Planning → Development → Testing → Launch

**Next:** Return to [menu](#interactive-menu)

---

## 4. Validate My Reference Class

**Ensure you chose the right comparison set.**

### The Three Tests

**Test 1: Homogeneity**
- Are the members of this class actually similar enough?
- Is there high variance in outcomes?
- Should you subdivide further?

**Example:** "Tech startups" is too broad (consumer vs B2B vs hardware are very different). Subdivide.

---

**Test 2: Sample Size**
- Do you have enough historical cases?
- Minimum: 20-30 cases for meaningful statistics
- If N < 20: Widen the class or acknowledge high uncertainty

---

**Test 3: Relevance**
- Have conditions changed since the historical data?
- Are there structural differences (regulation, technology, market)?
- Time decay: Data from >10 years ago may be stale

### Validation Checklist

I'll walk you through:
- [ ] Class has 20+ historical examples
- [ ] Members are reasonably homogeneous
- [ ] Data is from relevant time period
- [ ] No major structural changes since data collection
- [ ] Class is specific enough to be meaningful
- [ ] Class is broad enough to have data

**Output:** Confidence level in your reference class (High/Medium/Low)

**Next:** Return to [menu](#interactive-menu)

---

## 5. Learn the Framework

**Deep dive into the methodology.**

### Resource Files

📄 **[Outside View Principles](resources/outside-view-principles.md)**
- Statistical thinking vs narrative thinking
- Why the outside view beats experts
- Kahneman's planning fallacy research
- When outside view fails

📄 **[Reference Class Selection Guide](resources/reference-class-selection.md)**
- Systematic method for choosing comparison sets
- Balancing specificity vs data availability
- Similarity metrics and matching
- Edge cases and judgment calls

📄 **[Common Pitfalls](resources/common-pitfalls.md)**
- Base rate neglect examples
- "This time is different" bias
- Overfitting to small samples
- Ignoring regression to the mean
- Availability bias in class selection

**Next:** Return to [menu](#interactive-menu)

---

## Quick Reference

### The Outside View Commandments

1. **Base Rate First:** Establish statistical baseline BEFORE analyzing specifics
2. **Assume Average:** Treat case as typical until proven otherwise
3. **Burden of Proof:** Large deviations from base rate require strong evidence
4. **Class Precision:** Reference class should be specific but data-rich
5. **No Narratives:** Resist compelling stories; trust frequencies

### One-Sentence Summary

> Find what usually happens to things like this, start there, and only move with evidence.

### Integration with Other Skills

- **Before:** Use `estimation-fermi` if you need to calculate base rate from components
- **After:** Use `bayesian-reasoning-calibration` to update from base rate with new evidence
- **Companion:** Use `scout-mindset-bias-check` to validate you're not cherry-picking the reference class

---

## Resource Files

📁 **resources/**
- [outside-view-principles.md](resources/outside-view-principles.md) - Theory and research
- [reference-class-selection.md](resources/reference-class-selection.md) - Systematic selection method
- [common-pitfalls.md](resources/common-pitfalls.md) - What to avoid

---

**Ready to start? Choose a number from the [menu](#interactive-menu) above.**

More from lyndonkl/claude

Skill	Description
abstraction-concrete-examples	Builds structured abstraction ladders that translate high-level principles into concrete, actionable examples across 3-5 levels. Bridges communication gaps, reveals hidden assumptions, and tests whether abstract ideas work in practice. Use when explaining concepts at different expertise levels, moving between abstract principles and concrete implementation, identifying edge cases by testing ideas against scenarios, designing layered documentation, decomposing complex problems into actionable steps, or bridging strategy-execution gaps.
academic-letter-architect	Guides the creation of evidence-based academic recommendation letters, reference letters, and award nominations that combine concrete examples, meaningful comparisons, and genuine enthusiasm. Use when writing recommendation letters for students, postdocs, or colleagues, or when user mentions recommendation letter, reference, nomination, letter of support, endorsement, or needs help with strong advocacy and comparative statements.
adr-architecture	Documents significant architectural and technical decisions with full context, alternatives considered, trade-offs analyzed, and consequences understood. Creates a decision trail that helps teams understand why decisions were made. Use when choosing between technology options, making infrastructure decisions, establishing standards, migrating systems, or when user mentions ADR, architecture decision, technical decision record, or decision documentation.
adverse-selection-prior	Produces a Bayesian prior probability that an offered transaction is +EV for the recipient, given that the counterparty chose to propose it. Applies Akerlof market-for-lemons logic -- if they offered it, they believe it is +EV for them, so the prior that it is +EV for us is materially below 50%. Reusable across trade evaluation, waiver drops (another team dropping a player is also adverse selection), job-offer analysis, M&A, and any "someone offered me this" situation. Use when you receive an unsolicited trade/offer/proposal, analyzing incoming trade prior, evaluating why a counterparty proposed a deal, or when user mentions adverse selection, market for lemons, why did they offer this, incoming trade prior, they proposed it, Bayesian adjustment on received offer.
alignment-values-north-star	Creates actionable alignment frameworks that give teams a shared North Star (direction), values (guardrails), and decision tenets (behavioral standards). Enables autonomous decision-making while maintaining organizational coherence. Use when starting new teams, scaling organizations, defining culture, establishing product vision, resolving misalignment, creating strategic clarity, or when user mentions North Star, team values, mission, principles, guardrails, decision framework, or cultural alignment.
analogy-weight-check	For every analogy in a substacker draft, verifies it carries mechanical weight — the analogy does real work explaining the mechanism, not merely decorates it. Cross-references analogy-catalog.md for novelty (is this analogy reused from a prior post?) and domain fit (biology > organizational > sports preferred; physics/military disfavored). Use whenever an analogy appears in the draft. Trigger keywords: analogy weight, decorative, mechanical weight, reused analogy, catalog check, metaphor check.
answer-uncomfortable-question	Takes one strategic question about substacker ("should we launch paid?", "is this section dead?", "are we writing for the wrong audience?") and produces the mandatory evidence + reasoning + downside triad plus a recommendation. Used 3 times per Growth Strategist review. Trigger keywords: uncomfortable question, strategic question, evidence reasoning downside, triad.
attribute-performance	For each substacker post that materially over- or under-performs the rolling baseline (\|z\| ≥ 1.0), produces a plain-English attribution paragraph with calibrated confidence (high / medium / low / unexplained). Considers subject-line effect, topic zeitgeist, external share, day-of-week, length effect, and audience-notes signals. Labels unexplained outliers explicitly rather than fabricating a story. Use after compute-baseline when outlier posts exist. Trigger keywords: attribution, why did this post work, outlier explanation, performance analysis.
auction-first-price-shading	Computes the optimal shaded bid for a first-price sealed-bid auction given a true private value, an estimate of the number of competing bidders N, and a value-distribution assumption. Implements the `(N-1)/N` equilibrium shading rule for uniform private values, adjusts for log-normal or empirical value distributions, layers a risk-aversion adjustment, and caps output against the bidder's remaining budget. Domain-neutral auction theory reusable across fantasy sports (baseball FAAB, NBA/NHL waiver auctions), prediction-market limit sizing, sealed procurement bids, and any blind-bid context. Use when user mentions "first-price auction bid", "sealed bid shading", "(N-1)/N", "FAAB bid amount", "auction shading", "optimal bid first-price", "bid for sealed-bid", "blind bid sizing", or when downstream logic needs a principled shade factor rather than an ad-hoc heuristic.
auction-winners-curse-haircut	Applies a Bayesian haircut to a bid valuation for common-value auctions where winning is itself evidence the bidder over-estimated. Takes a raw valuation, a value-type classification (common_value / private_value / mixed), the number of informed bidders N, and a signal-dispersion estimate, and returns an adjusted valuation. Domain-neutral and reusable across fantasy FAAB, prediction markets, M&A bids, ad-auction budgets, and any generic bidding context. Use when user mentions "winner's curse", "common value auction", "valuation haircut", "adverse valuation", "Bayesian bid adjustment", or "over-paying in auction".