DQ blank data cleanup/investigation

Assign
Data Source
Priority
Status
IceBox
Description
Estimated Completion
Tags
Property
DQ Blank Data assumptions
  • content column in discord_messages table shouldn’t be blank
  • proposal_id column in stg_bankless_snapshot_1 table shouldn’t be blank
 
Current results:
notion image
 
Additional checks to see users attached sending messages without content:
select d.username, count(*)
from discord_messages m
 join discord_user d on m.author_user_id = d.discord_user_id 
where coalesce(content, '') = ''
group by 1
order by 2 desc;
top results - mostly bots (with some exceptions):
notion image
There are records sent by some bots with content though:
select d.username, count(*)
from discord_messages m
 join discord_user d on m.author_user_id = d.discord_user_id 
where coalesce(content, '') != ''
 and lower(d.username) like '%bot%'
group by 1
order by 2 desc;
top results:
notion image
 
Suggested actions
  • review discord pipeline to see if there is a way to pull content for bot messages?
  • review discord pipeline to possibly back-populate blank content?
  • delete records with content=NULL?

DQ blank data cleanup/investigation

Assign
Data Source
Priority
Status
IceBox
Description
Estimated Completion
Tags
Property
DQ Blank Data assumptions
  • content column in discord_messages table shouldn’t be blank
  • proposal_id column in stg_bankless_snapshot_1 table shouldn’t be blank
 
Current results:
notion image
 
Additional checks to see users attached sending messages without content:
select d.username, count(*)
from discord_messages m
 join discord_user d on m.author_user_id = d.discord_user_id 
where coalesce(content, '') = ''
group by 1
order by 2 desc;
top results - mostly bots (with some exceptions):
notion image
There are records sent by some bots with content though:
select d.username, count(*)
from discord_messages m
 join discord_user d on m.author_user_id = d.discord_user_id 
where coalesce(content, '') != ''
 and lower(d.username) like '%bot%'
group by 1
order by 2 desc;
top results:
notion image
 
Suggested actions
  • review discord pipeline to see if there is a way to pull content for bot messages?
  • review discord pipeline to possibly back-populate blank content?
  • delete records with content=NULL?