Search
News: Experimental data package 'seqc'
5
gravatar for Wei Shi
3.0 years ago by
Wei Shi2.7k
Australia
Wei Shi2.7k wrote:

We have created a new experimental data package called 'seqc'. It includes gene-level read count data generated by the SEQC (SEquencing Quality Control) project, which is the third stage of the well-known MAQC project (a US FDA initiative). The SEQC/MAQC-III Consortium produced benchmark RNA-seq data for the assessment of RNA sequencing technologies and data analysis methods (published recently on Nature Biotechnology - http://www.ncbi.nlm.nih.gov/pubmed/25150838):

Sequence reads were aligned to human reference genome hg19 using the Subread aligner and were then summarized to genes using the featureCounts program. This package includes the gene-level read count data for 2,758 libraries. It can be downloaded from the following link (188MB):

http://bioconductor.org/packages/release/data/experiment/html/seqc.html

In addition to the read count data, this package also includes exon-exon junction data generated for human brain reference RNA and universal human reference RNA samples. Exon-exon junctions were detected by using the Subjunc aligner.

Moreover, TaqMan RT-PCR validation data for ~1000 genes and ERCC spike-in sequence data are included in this package as well.

We hope this package is a useful resource for the community.

Wei

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Wei Shi2.7k
4
gravatar for Steve Lianoglou
3.0 years ago by
Genentech
Steve Lianoglou12k wrote:

Thanks a lot for processing and annotating the data in the way that you have. This will be a super useful resource ... especially since I already have a need for it ;-)

I've created some helper functions that allow you to create a (semi-decently) annotated ExpressionSet from the data given some user specified criteria and put it in the gist here. Perhaps something like this would be useful to include in the package?

You would use it like so:

## Fetch all of the RefSeq data from all centers and sequencing platforms:
R> e <- seqc.eSet('gene', 'refseq')
R> head(pData(e))
    platform sample replicate lane  flowcell center
 1|      ILM      A         1  L01 FlowCellA    AGR
 2|      ILM      A         1  L01 FlowCellB    AGR
 3|      ILM      A         1  L02 FlowCellA    AGR
 4|      ILM      A         1  L02 FlowCellB    AGR
 5|      ILM      A         1  L03 FlowCellA    AGR
 6|      ILM      A         1  L03 FlowCellB    AGR

R> with(pData(e), table(platform, center))
         center
 platform AGR BGI CNL COH LIV MAY MGP NVS NWU NYU PSU SQW
      ILM 256 384 360 128   0 384   0 320   0   0   0   0
      LIF   0   0   0   0  50   0   0   0 285   0 288 288
      ROC   0   0   0   0   0   0   4   0   0   4   0   4

## Fetch just the Illumina RefSeq data from all centers:
R> ilm <- seqc.eSet('gene', 'refseq', 'ILM')
R> with(pData(ilm), table(platform, center))
         center
 platform AGR BGI CNL COH MAY NVS
      ILM 256 384 360 128 384 320

Currently I've only implemented this parsing/aggregating for gene-level features (ie. no junction or taqman data), but I can add those later if you think these would be helpful to include in the package.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Steve Lianoglou12k
0
gravatar for Wei Shi
3.0 years ago by
Wei Shi2.7k
Australia
Wei Shi2.7k wrote:

Thanks for the code, Steve. I have just added them to the package and committed to svn devel repository ...

ADD COMMENTlink written 3.0 years ago by Wei Shi2.7k

That was quick! Thanks for incorporating that ... I of course now feel compelled to round off the functionality so that one could get ExpressionSets for all of the data. I'll let you know when the gist is updated with that ...

ADD REPLYlink written 3.0 years ago by Steve Lianoglou12k

Happy to incorporate them when you code are updated! It will be helpful if you could provide .Rd files as well ...
 

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Wei Shi2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 325 users visited in the last hour