Question: Is there a data package containing complex RNA-Seq data?
gravatar for ap3637
9 days ago by
ap36370 wrote:

Does anyone know if there is a good RNA-Seq data set that is reasonably large and has multiple factors included in its metadata?

For instance, the airway data set mentioned here, , is nicely formatted for what I hope to do, but the coldata is pretty simple:

          SampleName     cell      dex    albut        Run avgLength Experiment    Sample    BioSample
             <factor> <factor> <factor> <factor>   <factor> <integer>   <factor>  <factor>     <factor>
SRR1039508 GSM1275862   N61311    untrt    untrt SRR1039508       126  SRX384345 SRS508568 SAMN02422669
SRR1039509 GSM1275863   N61311      trt    untrt SRR1039509       126  SRX384346 SRS508567 SAMN02422675
SRR1039512 GSM1275866  N052611    untrt    untrt SRR1039512       126  SRX384349 SRS508571 SAMN02422678
SRR1039513 GSM1275867  N052611      trt    untrt SRR1039513        87  SRX384350 SRS508572 SAMN02422670
SRR1039516 GSM1275870  N080611    untrt    untrt SRR1039516       120  SRX384353 SRS508575 SAMN02422682
SRR1039517 GSM1275871  N080611      trt    untrt SRR1039517       126  SRX384354 SRS508576 SAMN02422673
SRR1039520 GSM1275874  N061011    untrt    untrt SRR1039520       101  SRX384357 SRS508579 SAMN02422683
SRR1039521 GSM1275875  N061011      trt    untrt SRR1039521        98  SRX384358 SRS508580 SAMN02422677


I'm looking for something like this that has more factors to consider.  In this case it's mostly cell and dex, but I would like there to be more factors.  For instance, infection, tissue type, sex, batch#, time point, etc.  Does anyone have any ideas about a good data set on bioconductor?  Thanks very much!

ADD COMMENTlink modified 8 days ago by mario.zanfardino10 • written 9 days ago by ap36370
gravatar for Wolfgang Huber
9 days ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Have a look at

ADD COMMENTlink written 9 days ago by Wolfgang Huber13k
gravatar for mario.zanfardino
8 days ago by
Naples (Italy)
mario.zanfardino10 wrote:

have a look at TCGA database. You can access data by multiple modalities. One of these is based on curatedTCGAData package.

For example:


# Downolad "RNASeqGene" BRCA cancer data

BRCA_TCGA <- curatedTCGAData(diseaseCode = "BRCA", 

                                           assays = c("RNASeqGene"), 
                                  = FALSE)


The result is a MultiAssayExperiment class with a large number of data and  coldata. 

ADD COMMENTlink modified 8 days ago • written 8 days ago by mario.zanfardino10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 356 users visited in the last hour