Resources

Data Quality in Web3 Starts With Double Validation

Web3 has a data problem. It’s not a problem of a shortage of data. Instead it’s a quality problem. That’s surprising given we have a shared ledger of data with Web3, but it’s the offchain data that’s the problem.

Written By

Jordan Hatcher

Date

Category

Insights

Length

0 min read

Web3 has a data problem. It’s not a problem of a shortage of data. Instead it’s a quality problem. That’s surprising given we have a shared ledger of data with Web3, but it’s the offchain data that’s the problem.

Ecosystem leads, BD teams, investors, and marketers have all encountered this problem. You've pulled project lists that are out of date. You've found company descriptions that don't match reality. You've made decisions on information that turned out wrong, incomplete, or copied from another source with the same problem. Or you’ve spent hours manually cleaning and checking information to try to stay up-to-date.

The Real Cost of Bad Ecosystem Data

Bad data doesn't just cause minor inconveniences when your ecosystem runs to hundreds or thousands of projects. It compounds. One wrong categorization leads to a flawed market map or understanding of where the market actually is today. A flawed market understanding shapes your decisions in ways that can go against your ecosystem instead of working to grow it.

That chain of consequences often only shows itself until much later. As an ecosystem steward for L1s, L2s, and stablecoins, getting ahead of this problem and solving it with better data allows for more growth and healthier ecosystems in the future.

Most ecosystem teams assume better tooling or more frequent scraping will fix it. That assumption misses the point. The root issue comes down to trust. Specifically, the lack of a reliable process that stands behind the data and builds trust and confidence.

We strategically use AI at The Grid

At The Grid, we believe that in an age of increasing AI-generated content, the value of validated data goes up, not down. Anyone can generate plausible-sounding information about a Web3 project. Claude and ChatGPT can give you a false sense of security by sounding accurate. The plausible and the accurate look identical. That gap can cost you.

But AI still has a very clear and important role. Our approach combines AI and human review in a structured way. AI handles the work it does well: cleaning, normalizing, and flagging inconsistencies across large volumes of data. It also helps with data collection, formatting, and spotting if data may need an update. But AI does not make the final call.

That means that AI does not directly edit published data at The Grid. 

We see this as a strategic advantage - human validation instead of automation. And Double Validation changes that even further to give us more robust data accuracy.

Why and how we use Double Validation

Every data point that enters The Grid goes through human review. Not once. Twice. For our primary data research flow, two trained researchers independently check the same data before it gets published in our system. It’s the four eyes principle used in compliance and governance checks all the time: two different people, two independent checks, one reliable record.

The four eyes principle is a fundamental control often used in critical systems and critical work, and together with other policies (such as how you stay neutral and how you monitor and structure incentives for reviewers) it is how you can maintain quality and checks and balances. We believe that ecosystem is foundational and requires core controls to keep it accurate.

The four eyes principle sounds simple. Doing it at scale takes real work. But we have no other way to stand behind what we publish. A single reviewer catches most errors. But a single reviewer also inherits their own blind spots, assumptions, and fatigue.

The second reviewer doesn't just double-check the first. They bring a second and independent view and opinion. That gap between two perspectives surfaces the edge cases. Conflicting sources. Outdated claims. Subtly wrong categorizations. These data errors look fine until they don't.

We'd rather catch them before they reach you. And if the two reviewers can’t agree on the edge cases, it goes back internally for a broader discussion, and when needed, feedback and review with our users and benchmarking against what’s happening in the industry. That’s part of our process for continually updating and improving TGS, our data schema, which is like a living organism mapping what’s happening in Web3.

We also extend this double validation principle into our Network Portal where projects can claim and update their profiles. The builders themselves are the first set of eyes, the validation by a human at The Grid are the second set of eyes before publication. 

What This Means for Ecosystem Leads

Build a block explorer, a wallets page, grants program, dApp page, or an ecosystem directory on top of external data, and the quality of that data becomes your exposure. Errors in, errors out (GIGO/RIRO).

The Grid provides a data layer with a defined, repeatable process behind it, a transparent collection process, and two human reviewers. This is our methodology.

We only publish data we can stand behind. We'd rather have a gap in our coverage than a confident wrong answer.

The Bigger Picture

All of us at The Grid believe that Web3 is the infrastructure for the next phase of the global economy. That infrastructure depends on accurate information about who builds what, where, and how. Alongside on-chain data, ecosystem intelligence over the off-chain data gives the extra needed context for adoption and growth.

The four eyes principle protects that intelligence layer, and so we provide double validation that makes it something you can actually rely on.

Ready to Find Out More?

Got More Questions?

Ready to Find Out More?

Got More Questions?

Ready to Find Out More?

Got More Questions?