testing_philosophy.md
Purpose
This document defines what must be tested, why it must be tested, and what failure means.
Testing is not primarily for “bugs.”
Testing is for truth preservation.
If a system teaches something false, hides causality, violates constraints, or breaks determinism, the build must fail.
Authority
Testing enforces the authority chain:
accord/ → design/ → data/ → engine/
Testing must enforce design/accord_constraints.md, design/simulation_laws.md, and design/realism_constraints.md.
Testing Principles
-
Correctness outranks convenience
Convenient falsehood is still falsehood. -
Determinism is mandatory
Nondeterministic outcomes are treated as defects unless explicitly modeled and bounded. -
Constraints are law
Any violation of realism constraints or conservation is a hard failure. -
Explanations must match causes
If the system explains an outcome, the explanation must match the causal trace. -
Regression is unacceptable
Once a truth is encoded and tested, it cannot silently change.
Test Categories
Unit Tests
Validate small deterministic rules.
Examples:
- unit conversions
- nutrient accounting functions
- spoilage functions
Property-Based Tests
Verify invariants across wide input spaces.
Examples:
- conservation across transformations
- monotonic decay functions
- bounds on fatigue/error relationships
Simulation Replay Tests
Prove determinism.
Method:
- run simulation with seed + action log
- replay
- assert identical state hashes at milestones
Data Validation Tests
Ensure data is lawful.
Fail the build if:
- missing units
- invalid ranges
- unresolved references
- broken schema invariants
- impossible values
Integration Tests
Verify domain interactions.
Examples:
- growth depends on soil + water + temperature
- fatigue increases error rate
- preservation trades time/energy for reduced spoilage
Explanation Correctness Tests
Ensure explanations match reality.
Requirements:
- every reported failure reason maps to:
- a constraint
- a violated requirement
- a defined failure case
- the system can always say:
- what happened
- why it happened
- what would prevent it
Performance Tests
Ensure low-power viability.
Rules:
- complexity scales ~linearly with active entities
- performance improvements must not change outcomes
Always-Test Requirements
Determinism
- identical inputs ⇒ identical outputs
- no cross-platform drift within supported configurations
Conservation
- matter, energy, nutrients must balance within declared models
Time Cost
- every process declares time
- no zero-time production
Failure Cases
- every major process must fail in defined ways
- failures must be explainable
Test Artifacts
Golden Replays
Canonical seed + action logs for reference scenarios. Treated as constitutional fixtures.
Examples:
- basic week scenario
- nutrient stress scenario
- maintenance neglect scenario
Scenario Packs
Curated small scenarios representing common real-world conditions. Used for regression and demonstration.
Failure Policy
A test failure means:
- truth was violated
- determinism was broken
- constraints were bypassed
- explanations no longer match causality
Response:
- fix the defect
- or explicitly update design + data + tests together
Silent changes are forbidden.
Balance Policy
Balance is evaluated only after correctness.
Rules:
- difficulty caused by reality constraints is not a defect
- ease caused by reality constraints is not a defect
Any tuning must:
- preserve constraints
- remain explainable
Collaboration Stance
Open collaboration increases the need for strong tests.
Tests replace trust by making violations unmergeable.
Minimal Required Suite
Must run on every change:
- schema/unit validation
- conservation properties
- replay determinism
- explanation mapping
Recommended:
- performance regressions