Create process for Data Integrity check

Assign
Data Source
Priority
Status
Completed
Description
Base level checks: • not null • dupes / uniqueness • orphaned records / referential data integrity • accepted values ā—¦ validate ETH addresses ā—¦ wei amounts Next level checks: • data redundancy (remove where possible) • data freshness (check already in place?) (map out load frequency and expected lag for each pipeline) • data dependency (identify beyond existing or assumed fk) • data completeness (source of truth) ā—¦ identify ways to check against source data Source of truth ideas: • basic solution: ā—¦ add ETL_log table: pipeline, load dt, row count, … (add to each pipeline) ā—¦ add DQ check to compare row counts in destination tables with ETL_log table based on LoadDt=InsertedDt • subgraphs: add entity with basic aggregations (total trxn count, amount etc) and use timed queries to compare with totals in daodash db Tools: • sql queries -> DQ dashboard • dbt schema model -> report?Ā  ā—¦ optional flag to store test results in a table
Estimated Completion
Tags
Property

Create process for Data Integrity check

Assign
Data Source
Priority
Status
Completed
Description
Base level checks: • not null • dupes / uniqueness • orphaned records / referential data integrity • accepted values ā—¦ validate ETH addresses ā—¦ wei amounts Next level checks: • data redundancy (remove where possible) • data freshness (check already in place?) (map out load frequency and expected lag for each pipeline) • data dependency (identify beyond existing or assumed fk) • data completeness (source of truth) ā—¦ identify ways to check against source data Source of truth ideas: • basic solution: ā—¦ add ETL_log table: pipeline, load dt, row count, … (add to each pipeline) ā—¦ add DQ check to compare row counts in destination tables with ETL_log table based on LoadDt=InsertedDt • subgraphs: add entity with basic aggregations (total trxn count, amount etc) and use timed queries to compare with totals in daodash db Tools: • sql queries -> DQ dashboard • dbt schema model -> report?Ā  ā—¦ optional flag to store test results in a table
Estimated Completion
Tags
Property