I am thinking of a opensource platform which will be built on top of IPFS (https://medium.com/@ConsenSys/an-introduction-to-ipfs-9bba4860abd0)
I have read recently that the volume of genomic data is expected to grow by 4-5 orders of magnitude (http://www.nature.com/news/genome-researchers-raise-alarm-over-big-data-1.17912). GB/$ or network bandwidth/$ is not going to reduce at that rate.
Some opendata platforms are already suffering from this issue (https://ncbiinsights.ncbi.nlm.nih.gov/2017/05/09/phasing-out-support-for-non-human-genome-organism-data-in-dbsnp-and-dbvar/). This issue is probably only going to get worse.
I am thinking of this more as a middle-layer between the platform (potentially bioconductor) and IPFS. Potentially all the datasets will be assigned a IPFS hash and the hash will be maintained on a website along with any metadata regarding the information contained in the dataset.
So instead of users having to query specific servers with the server specific API, they will query the dataset and IPFS through a uniform API (ipfs get /ipfs/<hash>).
This has a couple of advantages
1. it leads to a uniform api. you dont need to know which server the dataset is located on. IPFS handles the routing and fetching
2. redundant store of data
3. distributed storage of data thus faster/multi-downloads
4. local caching and incentivizing users with filecoin/ethereum to store and distribute data reduces burden on non-profit organizations
5. current datastores/servers wont have to do too much extra to bring down their data cost and improve data access. They just have to install IPFS and pin their data sets locally so that its available on the ipfs network to everybody. so its backward compatible.
I am not thinking of building an entire platform, as there are already a lot of great platforms. Further integrating such a middlelayer to a existing platform will benefit potential suppliers and consumers of data immediately.
I heard that bioconductor is one of the most used packages in the bioinformatics community, however, now that I read a little bit more about it, it seems to be bioconductor is more a suite/library of small packages which are called in R-studio/R.
If so, do you think I should actually think of integration with Rstudio/R?
Do you see any holes in my logic?