I wanted to do something to further my understanding of the Solana ecosystem, and one of the ways I learn about something is to look at data.
So from the project ideas github, the data viz project held a great deal of attraction to me. The advantages:
- its just extract data, parse, populate database
- then build visualizations
Stuff I'm fairly familiar with.
Solana Data Viz Idea
A web-based tool through which any user can query all data on the Solana blockchain by using SQL queries. Pre-populated databases with parsed data from Solana blocks. This can all be built on Redash just like Dune (https://redash.io/)
- Any user would be able to write a query
- free users would have to open source their queries so others could work
- paid users would have the ability to write queries privately
- Automatic hourly updates on the most viewed queries and charts
- Paid users could hook an API to any publicly available query and get results
- Dashboarding functionality
- Curated queries shared in a visualized form
- The team would write several popularly requested queries to launch this product to expedite the kick-off
- Charts that should be launched with
- Number of daily transactions on Solana
- Number of daily active Solana addresses
- Number of new daily Solana addresses
- Daily volume of SOL transfers
- Daily fees of all SOL/SPL transfers
- Daily median USD cost of SOL&SPL transfer
- Daily USD revenue of Solana stakers
- Daily Solana addresses with balance over $X ($1k, $10k, $100k, $1M, $10M
- Total monthly DEX trading volume on Solana (with DEX splits - Serum&Raydium etc.)
- Number of monthly unique liquidity providers on Serum
- Number of monthly unique trading addresses on Serum
- Number of monthly trades on Serum
- Total number of unique addresses interacting with Solana DeFi apps with splits by each project
- Total stacked daily TVL of all Solana DeFi projects
- Daily outstanding debt of Jet Protocol
- Total daily supply of stablecoins issued on Solana
I put out a message on discord and got contacted by someone from from Goog who was working on the same project. We have overlapping skill sets so not sure we will work together.
In terms of architecture, this looks like it will require:
- pipeline to ingest and parse data from the network
- data warehouse to store data
- frontend api to manage SQL queries
- frontend UI for dashboards
- rest of standard SaaS infra, logins, billing, subscriptions etc.
One of the concerns would be the amount of data and costs.. the Goog guy told me he was ingesting 70GB a day. That's significant. That's 21 TB a year.
Goog guy is running a VM with BigQuery on the backend. BigQuery is expensive because they charge for processing vs just storage.
I was about to start hacking in Redshift, but then I find out that Solana integrated BigTable support, by reading through code on Github.
This begs the question though, how many people have the exact same BigTable archive running in the backend. I imagine any organization of sufficient size would want to have their own non-corrupted archive.
However BigTable is a noSQL database, not a data warehouse. So it would be good for tracing individual past transactions, not getting aggregates.
So you look at either BigQuery or Redshift. If you disregard performance considerations, assuming that our queries and data are not particularly difficult vs the scale of the two services.., then pricing becomes something to take a look at.
- BQ costs very little for storage, but charged for query processing.
- Redshift charges a lot for storage, but query processing is free
Really thorough comparison here:
In summary, BigQuery looks cheaper and easier to use in our low usage case here. But the moment you want to do some heavy and regular usage, we would have to find something cheaper.
Update: June 15th 2021 - Got distracted and didn't have time for this... joining a long line of half assed projects... Oh well, this blog is supposed to keep me honest..