Question

Processing of curatedmetagenomics

0

Entering edit mode

Edward • 0

@4f12533b

Last seen 13 months ago

Australia

There is a lot of data from different research articles collected and quality checks by manual power. I found there are some .csv files in the path (curatedMetagenomicData/inst/extdata) on github which seems to show the metadata for each involved research. It seems like the raw data FASTQ files are located on the NCBI and they may be manually collected by humans and put into the package for further processing into different forms for different applications. I've tried many ways to find where the original FASTQ file is on NCBI and my questions are below.

How can I trace back to the original source of raw sequence data on NCBI based on the information on metadata ? (Ex: If I want to know the exact raw sequence data of AsnicarF_2017, how can I trace back to the original source on NCBI).
Where can I find more detailed information about processing steps from raw sequence data to usable data (e.g. relative abundance, gene counts, pathway, ...) aside from the published paper ? I think it does not mention every detail how they process the raw data into six output data in curatedmetagenomics (relative abundance, marker gene, marker presence, pathway abundance, pathway coverage, and gene family.) . Also, I think there may be other quality controls requirement during the processing steps such as excluding some samples or reads or etc...

Thanks!

curatedMetagenomicData • 525 views

ADD COMMENT • link 20 months ago Edward • 0