Search
Question: Is bioconductor a good platform for what I am thinking of
0
gravatar for alpha0uu8
2 days ago by
alpha0uu80
alpha0uu80 wrote:

I am thinking of a opensource platform which will be built on top of IPFS (https://medium.com/@ConsenSys/an-introduction-to-ipfs-9bba4860abd0)

I have read recently that the volume of genomic data is expected to grow by 4-5 orders of magnitude (http://www.nature.com/news/genome-researchers-raise-alarm-over-big-data-1.17912). GB/$ or network bandwidth/$ is not going to reduce at that rate.

Some opendata platforms are already suffering from this issue (https://ncbiinsights.ncbi.nlm.nih.gov/2017/05/09/phasing-out-support-for-non-human-genome-organism-data-in-dbsnp-and-dbvar/). This issue is probably only going to get worse.

I am thinking of this more as a middle-layer between the platform (potentially bioconductor) and IPFS. Potentially all the datasets will be assigned a IPFS hash and the hash will be maintained on a website along with any metadata regarding the information contained in the dataset. 

So instead of users having to query specific servers with the server specific API, they will query the dataset and IPFS through a uniform API (ipfs get /ipfs/<hash>).

This has a couple of advantages

1. it leads to a uniform api. you dont need to know which server the dataset is located on. IPFS handles the routing and fetching

2. redundant store of data

3. distributed storage of data thus faster/multi-downloads

4. local caching and incentivizing users with filecoin/ethereum to store and distribute data reduces burden on non-profit organizations

5. current datastores/servers wont have to do too much extra to bring down their data cost and improve data access. They just have to install IPFS and pin their data sets locally so that its available on the ipfs network to everybody. so its backward compatible.

I am not thinking of building an entire platform, as there are already a lot of great platforms. Further integrating such a middlelayer to a existing platform will benefit potential suppliers and consumers of data immediately. 

I heard that bioconductor is one of the most used packages in the bioinformatics community, however, now that I read a little bit more about it, it seems to be bioconductor is more a suite/library of small packages which are called in R-studio/R.

If so, do you think I should actually think of integration with Rstudio/R?

Do you see any holes in my logic?

ADD COMMENTlink modified 1 day ago • written 2 days ago by alpha0uu80
1
gravatar for Aaron Lun
2 days ago by
Aaron Lun15k
Cambridge, United Kingdom
Aaron Lun15k wrote:

This sounds like a question for the BioC-devel mailing list, see https://stat.ethz.ch/mailman/listinfo/bioc-devel.

FWIW, the standard way to do this would be to make your IPFS API into a R package - possibly depending on other database-querying packages in Bioconductor (e.g., GEOqueryArrayExpress, various TCGA-related packages) - for people to download and use. If need be, there's also a bunch of pure API packages (e.g., Rhtslib, Rhdf5lib) that are intended purely for use by other package developers.

ADD COMMENTlink modified 2 days ago • written 2 days ago by Aaron Lun15k

Thanks for pointing out those packages.. yeah.. I am trying to not reinvent the wheel as much as possible. I will look into the packages you pointed out.. and ask on the bioc thread as well..

ADD REPLYlink written 2 days ago by alpha0uu80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 369 users visited in the last hour