tl;dr Twitter Spaces Gitcoin Collab - Open Data: building web3 native tools and dashboards

Tags
Speakers:
Humpty Cauldron, Crypto Sapiens
Evan Powell, Gitcoin
Erick Pinos, Ontology and Orange
Disruption Joe, Gitcoin
Philip Silva, Bright ID
Boxer, Dune Analytics
Thomas Jay Rush, True Blocks
Matt Davis, Ceramic
Pat, Transpose
tl;dr
13:50
Evan Powell
@epowell101
Yeah, I mean, maybe I'd kick it off just in the form of a question. I mean, what do people, fellow speakers and others on here think in terms of decentralization versus centralization, how do you guys feel when you see these?
15:26
Thomas Jay Rush
@tjayrush
Yeah, I personally think that the problem lies in the node software itself and that the node software itself is doing a great job of one of its two purposes, which is joining the network and coming to synchronization with the network, but it does a pretty poor job of its other task, which is to deliver the data that it's collected. So the RPC itself I think is inadequate. And I think that, if you truly did want to build something that was decentralized, you would fix the RPC so it can deliver the data that people need and the data that these third party proprietary solutions are creating. It should come from the node software itself.
16:38
Humpty Cauldron, Crypto Sapiens
@CryptoSapiens_
Can you define for the rest of us what you are describing here in terms of node software? What nodes are we describing and what is the challenge with the status quo?
16:62
Thomas Jay Rush
@tjayrush
I'll try to, so I'll use an example. So the 15th issue ever written against the first Ethereum node software was we need access to the history of an account. The very first response was that's never going to happen. In other words, we're not going to put indexing by address into the node software. And I think that was a mistake because from the perspective of the note software developer, that makes perfect sense because it's hard to give. But from the perspective of the users, that's the only thing they want, which is a history of their own address.
17:56
Humpty Cauldron, Crypto Sapiens
@CryptoSapiens_
Got it. So this is really looking at the usability of on chain data and some of the challenges with the way that it currently works and this is basically a programming, issue with the way that the software was developed.
19:44
Evan Powell
@epowell101
I've heard it referred to as the original sin of Etherium, but others perspectives on either that indexing the criticality of it and and and sort of the gap. There were other top problems that you see that ought to be addressed in a Community centric way if possible.
21:15
Pat
@tannishmango
I went through essentially like every single type of technology that was used to index the chain. Everything from like the graph to our own solution. I guess along the spectrum of decentralized, the centralized.
One of the things we noticed was like with respect to the graph, there's a lot that's been incredibly useful about it, but there's almost like a lack of Composability to subgraphs, so you can't really build on top of them, which creates a very inefficient structure. It almost funnels you in with the way that, yeah, sort of the RPC node is designed.
23:00
Thomas Jay Rush
@tjayrush
What we've done now is, I say this way we've created an index or we've created a time ordered log of an index of a time ordered log. So we're kind of going up a level and creating basically a block chain of blocks that are the. Our index is chunked on purpose because we want to preserve immutability. What we give up is sorting every address in the entire history of the chain.
In one file, but that's OK because we preserved immutability and that's why index is chunked.
Another benefit comes from that that we can store that chunked index on an immutable file system like IPFS. If you try to store something on IPFS that's being sorted every 14 seconds, it breaks IPFS. So I think that decentralization of this data comes by embracing the time ordered nature of the data and stop thinking in terms of a web two database that sorts every time you insert a record.
26:08
Michael Calvey (calvey.eth)
@michaeljcalvey
at what points in this data stack is decentralization necessary? Is it not as critical to have?
27:08
Boxer
@0xBoxer
I think one of the aspects that haven't surfaced so far in this conversation is like there's of course like decentralization whereas the data is actually stored, but how accessible, how accessible is that data actually?
On a very core level and if I want to look at my own history or like my own states like what are my accounts actually doing, that should be a decentralized service. But if we actually go a step up and kind of look at what's the status of a protocol or what's the what is actually happening on etherium today, I don't really think that those need to be decentralized services. So like I think the higher we go within levels of abstraction within data.
34:32
Thomas Jay Rush
@tjayrush
The higher up you go in the data abstraction, so DEX is as an example it, it's not so easy to do from my perspective because we purposefully don't have big, you know, data machines that can churn that stuff out, right. So we're much more focused on the individual.
33:17
Boxer
@0xBoxer
What Dune is doing, we're just surfacing blockchain data and then we use APIs to decode the data so it's like nice and human readable. And then what we do on top of that is basically we have a bunch of data engineering pipelines which are also open source where we have one table that's called Dex dot trades. And in that table like all decentralized exchange trades across like multiple blockchains, multiple protocols, they all live there. So it's super easy to access the data, but it's all out there in the open.
40:12
Michael Calvey (calvey.eth)
@michaeljcalvey
I think that this is exactly what it's together as a variety of different stakeholders with different opinions and different perspectives. There's obviously overlap in what a lot of us are doing, but we we're all taking a different approach and I don't think there's a right way or wrong way. I think that everyone ends up optimizing for a slightly different outcome. But at the end of the day, it needs to be a a long an ongoing discussion, I suppose, about how much standardization is needed.
41:56
Evan Powell
@epowell101
And are there EIP's or similar that we should be getting behind or you know, to get a little more tactical?
Has an area of standardization you'd like to see emerge?
What would that area be?
43:38
Boxer
@0xBoxer
I think where the, where the bigger problem here is is that different governments will have different rules for like depending on your local legislation. So I don't feel like this is necessarily a data problem.
43:53
Thomas Jay Rush
@tjayrush
It is a data problem because you can't even get the list of all the things where the assets transferred that regardless, it is a data problem because you can't even get the list of all the things where the assets transferred. That regardless of what the local governments want me to do with that. You can't even get a list that says this is every asset that ever transferred.
44:10
Erick Pinos
@erickpinos
that there definitely needs to be a lot more done.
On the on the accounting side, just to have like an accurate Ledger of like what you've done, even the AXI Infinity token when they forked an airdropped everyone looked at the AXI Infinity version 2 token like it, it looked like a completely new token and and these coin tracking softwares didn't even pick it up.
50:46
Evan Powell
@epowell101
The collaboration on this group because this has been an amazing conversation for me, I hope, for everybody on here. So one idea is that we could start to literally draft, you know, what we we could have, you know, issues. We could use GitHub and we could, you know, begin to kind of vote on which which standards do we want to talk about, who wants to help draft those?
53:08
Erick Pinos
@erickpinos
At Orange, we've been focusing a lot on the model creation for assessing different kinds of reputation, whether it's like your your contribution level to a Dow or whether it's your risk model credit score for access to to better lending rates on a lending platform or even like for determining eligibility to apply to a certain grant or apply to a hackathon. We're focused a lot on like the model design and not being like the authority on. This is how you define Dow contribution activity. This is how you define risk assessment.

tl;dr Twitter Spaces Gitcoin Collab - Open Data: building web3 native tools and dashboards

Tags
Speakers:
Humpty Cauldron, Crypto Sapiens
Evan Powell, Gitcoin
Erick Pinos, Ontology and Orange
Disruption Joe, Gitcoin
Philip Silva, Bright ID
Boxer, Dune Analytics
Thomas Jay Rush, True Blocks
Matt Davis, Ceramic
Pat, Transpose
tl;dr
13:50
Evan Powell
@epowell101
Yeah, I mean, maybe I'd kick it off just in the form of a question. I mean, what do people, fellow speakers and others on here think in terms of decentralization versus centralization, how do you guys feel when you see these?
15:26
Thomas Jay Rush
@tjayrush
Yeah, I personally think that the problem lies in the node software itself and that the node software itself is doing a great job of one of its two purposes, which is joining the network and coming to synchronization with the network, but it does a pretty poor job of its other task, which is to deliver the data that it's collected. So the RPC itself I think is inadequate. And I think that, if you truly did want to build something that was decentralized, you would fix the RPC so it can deliver the data that people need and the data that these third party proprietary solutions are creating. It should come from the node software itself.
16:38
Humpty Cauldron, Crypto Sapiens
@CryptoSapiens_
Can you define for the rest of us what you are describing here in terms of node software? What nodes are we describing and what is the challenge with the status quo?
16:62
Thomas Jay Rush
@tjayrush
I'll try to, so I'll use an example. So the 15th issue ever written against the first Ethereum node software was we need access to the history of an account. The very first response was that's never going to happen. In other words, we're not going to put indexing by address into the node software. And I think that was a mistake because from the perspective of the note software developer, that makes perfect sense because it's hard to give. But from the perspective of the users, that's the only thing they want, which is a history of their own address.
17:56
Humpty Cauldron, Crypto Sapiens
@CryptoSapiens_
Got it. So this is really looking at the usability of on chain data and some of the challenges with the way that it currently works and this is basically a programming, issue with the way that the software was developed.
19:44
Evan Powell
@epowell101
I've heard it referred to as the original sin of Etherium, but others perspectives on either that indexing the criticality of it and and and sort of the gap. There were other top problems that you see that ought to be addressed in a Community centric way if possible.
21:15
Pat
@tannishmango
I went through essentially like every single type of technology that was used to index the chain. Everything from like the graph to our own solution. I guess along the spectrum of decentralized, the centralized.
One of the things we noticed was like with respect to the graph, there's a lot that's been incredibly useful about it, but there's almost like a lack of Composability to subgraphs, so you can't really build on top of them, which creates a very inefficient structure. It almost funnels you in with the way that, yeah, sort of the RPC node is designed.
23:00
Thomas Jay Rush
@tjayrush
What we've done now is, I say this way we've created an index or we've created a time ordered log of an index of a time ordered log. So we're kind of going up a level and creating basically a block chain of blocks that are the. Our index is chunked on purpose because we want to preserve immutability. What we give up is sorting every address in the entire history of the chain.
In one file, but that's OK because we preserved immutability and that's why index is chunked.
Another benefit comes from that that we can store that chunked index on an immutable file system like IPFS. If you try to store something on IPFS that's being sorted every 14 seconds, it breaks IPFS. So I think that decentralization of this data comes by embracing the time ordered nature of the data and stop thinking in terms of a web two database that sorts every time you insert a record.
26:08
Michael Calvey (calvey.eth)
@michaeljcalvey
at what points in this data stack is decentralization necessary? Is it not as critical to have?
27:08
Boxer
@0xBoxer
I think one of the aspects that haven't surfaced so far in this conversation is like there's of course like decentralization whereas the data is actually stored, but how accessible, how accessible is that data actually?
On a very core level and if I want to look at my own history or like my own states like what are my accounts actually doing, that should be a decentralized service. But if we actually go a step up and kind of look at what's the status of a protocol or what's the what is actually happening on etherium today, I don't really think that those need to be decentralized services. So like I think the higher we go within levels of abstraction within data.
34:32
Thomas Jay Rush
@tjayrush
The higher up you go in the data abstraction, so DEX is as an example it, it's not so easy to do from my perspective because we purposefully don't have big, you know, data machines that can churn that stuff out, right. So we're much more focused on the individual.
33:17
Boxer
@0xBoxer
What Dune is doing, we're just surfacing blockchain data and then we use APIs to decode the data so it's like nice and human readable. And then what we do on top of that is basically we have a bunch of data engineering pipelines which are also open source where we have one table that's called Dex dot trades. And in that table like all decentralized exchange trades across like multiple blockchains, multiple protocols, they all live there. So it's super easy to access the data, but it's all out there in the open.
40:12
Michael Calvey (calvey.eth)
@michaeljcalvey
I think that this is exactly what it's together as a variety of different stakeholders with different opinions and different perspectives. There's obviously overlap in what a lot of us are doing, but we we're all taking a different approach and I don't think there's a right way or wrong way. I think that everyone ends up optimizing for a slightly different outcome. But at the end of the day, it needs to be a a long an ongoing discussion, I suppose, about how much standardization is needed.
41:56
Evan Powell
@epowell101
And are there EIP's or similar that we should be getting behind or you know, to get a little more tactical?
Has an area of standardization you'd like to see emerge?
What would that area be?
43:38
Boxer
@0xBoxer
I think where the, where the bigger problem here is is that different governments will have different rules for like depending on your local legislation. So I don't feel like this is necessarily a data problem.
43:53
Thomas Jay Rush
@tjayrush
It is a data problem because you can't even get the list of all the things where the assets transferred that regardless, it is a data problem because you can't even get the list of all the things where the assets transferred. That regardless of what the local governments want me to do with that. You can't even get a list that says this is every asset that ever transferred.
44:10
Erick Pinos
@erickpinos
that there definitely needs to be a lot more done.
On the on the accounting side, just to have like an accurate Ledger of like what you've done, even the AXI Infinity token when they forked an airdropped everyone looked at the AXI Infinity version 2 token like it, it looked like a completely new token and and these coin tracking softwares didn't even pick it up.
50:46
Evan Powell
@epowell101
The collaboration on this group because this has been an amazing conversation for me, I hope, for everybody on here. So one idea is that we could start to literally draft, you know, what we we could have, you know, issues. We could use GitHub and we could, you know, begin to kind of vote on which which standards do we want to talk about, who wants to help draft those?
53:08
Erick Pinos
@erickpinos
At Orange, we've been focusing a lot on the model creation for assessing different kinds of reputation, whether it's like your your contribution level to a Dow or whether it's your risk model credit score for access to to better lending rates on a lending platform or even like for determining eligibility to apply to a certain grant or apply to a hackathon. We're focused a lot on like the model design and not being like the authority on. This is how you define Dow contribution activity. This is how you define risk assessment.