Create process for Data Integrity check
Assign

Data Source
Priority
Status
Description
Base level checks:
⢠not null
⢠dupes / uniqueness
⢠orphaned records / referential data integrity
⢠accepted values
⦠validate ETH addresses
⦠wei amounts
Next level checks:
⢠data redundancy (remove where possible)
⢠data freshness (check already in place?) (map out load frequency and expected lag for each pipeline)
⢠data dependency (identify beyond existing or assumed fk)
⢠data completeness (source of truth)
⦠identify ways to check against source data
Source of truth ideas:
⢠basic solution:
⦠add ETL_log table: pipeline, load dt, row count, ⦠(add to each pipeline)
⦠add DQ check to compare row counts in destination tables with ETL_log table based on LoadDt=InsertedDt
⢠subgraphs: add entity with basic aggregations (total trxn count, amount etc) and use timed queries to compare with totals in daodash db
Tools:
⢠sql queries -> DQ dashboard
⢠dbt schema model -> report?Ā
⦠optional flag to store test results in a table
Estimated Completion
Tags
Property