Create Data Tests for BANK Subgraph pipeline

Assign

Data Source

Priority

Status

Completed

Description

Estimated Completion

Tags

Property

Created SQL scripts to test for data duplicates:

SELECT 
  a.*
FROM subgraph_bank_transactions a
JOIN (SELECT id, graph_id, amount_display, COUNT(*)
      FROM subgraph_bank_transactions 
      GROUP BY id, graph_id, amount_display
      HAVING COUNT(*) > 1) b 
ON a.id = b.id
AND a.graph_id = b.graph_id
AND a.amount_display = b.amount_display 
ORDER BY a.id

Notes from DaoDash meeting (March 4th, 2022):

Pipeline update

Getting to production is larger task

To dos:

Automation

Testing (Unit test, Data testing (duplicate testing, sanity testing, integrity testing))

poor man’s way: our own custom scripts (SQL scripts)

dbt as a tool ELT+ with checks built-in: https://www.getdbt.com/pricing/

data integrity tools: greatexpectations.io

Continuous integration , integration testing

dockerization , logging