Create process for Data Integrity check

Assign

Data Source

Priority

Status

Completed

Description

Base level checks: • not null • dupes / uniqueness • orphaned records / referential data integrity • accepted values ◦ validate ETH addresses ◦ wei amounts Next level checks: • data redundancy (remove where possible) • data freshness (check already in place?) (map out load frequency and expected lag for each pipeline) • data dependency (identify beyond existing or assumed fk) • data completeness (source of truth) ◦ identify ways to check against source data Source of truth ideas: • basic solution: ◦ add ETL_log table: pipeline, load dt, row count, … (add to each pipeline) ◦ add DQ check to compare row counts in destination tables with ETL_log table based on LoadDt=InsertedDt • subgraphs: add entity with basic aggregations (total trxn count, amount etc) and use timed queries to compare with totals in daodash db Tools: • sql queries -> DQ dashboard • dbt schema model -> report? ◦ optional flag to store test results in a table

Estimated Completion