Ab Initio Data Quality Fix May 2026
Ab Initio Data Quality: Why You Can’t Fix Rubbish Later
Stop polishing bad data. Start building it right from the first principle. ab initio data quality
You enforce quality at the point of creation or ingestion. If a record doesn’t meet the first principles of your domain (e.g., timestamp cannot be in the future; customer_id must match a regex), it is rejected immediately. The rule: Do not allow a known violation to enter your persistent storage. Ever. 2. The "Nullable Integer" Paradox Let’s look at a classic first-principles failure: Nulls in numeric fields. Ab Initio Data Quality: Why You Can’t Fix
Audit your warehouse. Pick one critical table. Enforce NOT NULL on every single column. If you truly need a missing value, use a sentinel row (e.g., id = 0 , name = "UNKNOWN" ). You will be shocked how many bugs disappear. If a record doesn’t meet the first principles
Use tools like pydantic (Python), Great Expectations (with expect_column_values_to_not_be_null set to fatal ), or dbt 's constraints (enforced, not just documented). If the contract fails, the pipe breaks. Loudly.