Create Data Tests for BANK Subgraph pipeline
Assign
Data Source
Priority
Status
Description
Estimated Completion
Tags
Property
ย
Created SQL scripts to test for data duplicates:
SELECT
a.*
FROM subgraph_bank_transactions a
JOIN (SELECT id, graph_id, amount_display, COUNT(*)
FROM subgraph_bank_transactions
GROUP BY id, graph_id, amount_display
HAVING COUNT(*) > 1) b
ON a.id = b.id
AND a.graph_id = b.graph_id
AND a.amount_display = b.amount_display
ORDER BY a.id
ย
ย
ย
ย
Notes from DaoDash meeting (March 4th, 2022):
- Pipeline update
- Getting to production is larger task
- To dos:
- Automation
- Testing (Unit test, Data testing (duplicate testing, sanity testing, integrity testing))
- poor manโs way: our own custom scripts (SQL scripts)
- dbt as a tool ELT+ with checks built-in: https://www.getdbt.com/pricing/
- data integrity tools: greatexpectations.io
- Continuous integration , integration testing
- dockerization , logging
ย
ย
ย