DE analysis with reference transcriptome

0

Entering edit mode

Nicole Ertl ▴ 10

@nicole-ertl-6570

Last seen 11.4 years ago

Dear Bioconductor users, I'm working on a novel organism (no genome, only a reference transcriptome I had prepared with Trinity) and I have to do some differential gene expression analysis, using RNA-Seq data, produced with the Illumina TruSeq (non directional) kit. Most of my experiments have 2 conditons: control & treatment, one experiment has 4 conditions: control & 3 treatments. I have 6 biological replicates each per control/treatment. I've seen quite a few publications (BMC, PLOS one and others) that have aligned their reads to a reference transcriptome and then used RSEM or eXpress (+ sometimes FPKM/RPMK) to produce the count table which they then used with DESeq to analyse their data. Most don't really go into any sort of detail, so it's hard to follow what has been done. I've seen the "Count-based differential expression analysis of RNA sequencing data using R and Bioconductor" publication online and in it is mentioned that in the case of no genome, a reference transcriptome can be built, reads aligned to it and counted and then the standard pipeline for differential analysis used. The documentation for DESeq (and DESeq2), says to use raw counts, and nothing (rounded) normalised or counts of covered base pairs. I had a look at the RSEM and eXpress documentation and both seem to do some kind of estimation due to the isoforms inherent in a transcriptome? On the RSEM website it mentions that "popular differential expression (DE) analysis tools such as edgeR and DESeq do not take variance due to read mapping uncertainty into consideration. Beacause read mapping ambiguity is prevalent among isoforms and de novo assembled transcripts, these tools are not ideal for DE detection in such conditions." They suggest to use EBSeq, but I found max a handful of papers on google scholar that actually used RSEM-EBSeq. I'm new to all this and it's getting quite confusing. Could you please help? What would I have to do with my data and/or my reference transcriptome to be able to use eg the RSEM - DESeq (maybe DESeq2) pipeline? Is there a pipeline that you could recommend in my situation? Thank you so much for your time. Kind Regards, Nicole University of the Sunshine Coast, Locked Bag 4, Maroochydore DC, Queensland, 4558 Australia. CRICOS Provider No: 01595D Please consider the environment before printing this email. This email is confidential. If received in error, please delete it from your system. [[alternative HTML version deleted]]

Sequencing GO Organism edgeR DESeq EBSeq Sequencing GO Organism edgeR DESeq EBSeq • 3.3k views

ADD COMMENT • link updated 11.7 years ago by James W. MacDonald 68k • written 11.7 years ago by Nicole Ertl ▴ 10

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 days ago

United States

Hi Nicole, Trinity has scripts that will generate counts from the RSEM results that you can then use as inputs for either edgeR or DESeq(2). http://trinityrnaseq.sourceforge.net/analysis/diff_expression_analysis .html Best, Jim On 5/22/2014 8:06 PM, Nicole Ertl wrote: > Dear Bioconductor users, > > > > I'm working on a novel organism (no genome, only a reference transcriptome I had prepared with Trinity) and I have to do some differential gene expression analysis, using RNA-Seq data, produced with the Illumina TruSeq (non directional) kit. Most of my experiments have 2 conditons: control & treatment, one experiment has 4 conditions: control & 3 treatments. I have 6 biological replicates each per control/treatment. > > > > I've seen quite a few publications (BMC, PLOS one and others) that have aligned their reads to a reference transcriptome and then used RSEM or eXpress (+ sometimes FPKM/RPMK) to produce the count table which they then used with DESeq to analyse their data. Most don't really go into any sort of detail, so it's hard to follow what has been done. I've seen the "Count-based differential expression analysis of RNA sequencing data using R and Bioconductor" publication online and in it is mentioned that in the case of no genome, a reference transcriptome can be built, reads aligned to it and counted and then the standard pipeline for differential analysis used. The documentation for DESeq (and DESeq2), says to use raw counts, and nothing (rounded) normalised or counts of covered base pairs. I had a look at the RSEM and eXpress documentation and both seem to do some kind of estimation due to the isoforms inherent in a transcriptome? On the RSEM website it mentions that "popular diff! > erential expression (DE) analysis tools such as edgeR and DESeq do not take variance due to read mapping uncertainty into consideration. Beacause read mapping ambiguity is prevalent among isoforms and de novo assembled transcripts, these tools are not ideal for DE detection in such conditions." They suggest to use EBSeq, but I found max a handful of papers on google scholar that actually used RSEM-EBSeq. I'm new to all this and it's getting quite confusing. Could you please help? What would I have to do with my data and/or my reference transcriptome to be able to use eg the RSEM - DESeq (maybe DESeq2) pipeline? Is there a pipeline that you could recommend in my situation? > Thank you so much for your time. > Kind Regards, > Nicole > University of the Sunshine Coast, Locked Bag 4, Maroochydore DC, Queensland, 4558 Australia. > CRICOS Provider No: 01595D > Please consider the environment before printing this email. > This email is confidential. If received in error, please delete it from your system. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.7 years ago James W. MacDonald 68k

0

Entering edit mode

hi Nicole, Here was my response to a similar question: https://stat.ethz.ch/pipermail/bioconductor/2014-May/059479.html, basically saying the same as you quote from the RSEM authors. Briefly, you can use DESeq2 on estimated counts, with the caveat that our software is not taking into account the uncertainty of the estimation of counts. I am not certain about the details of the RSEM/EBSeq handoff, so I can't comment on that. You mention, > > I've seen the >> "Count-based differential expression analysis of RNA sequencing data using R >> and Bioconductor" publication online and in it is mentioned that in the case >> of no genome, a reference transcriptome can be built, reads aligned to it >> and counted and then the standard pipeline for differential analysis used. It seems reasonable to try this approach as well. Regardless, I would recommend looking at the results of any pipeline by eye, and cross-referencing with raw data (or raw-ish data, e.g. aligned reads to your reference transcriptome in IGV). Mike >> The documentation for DESeq (and DESeq2), says to use raw counts, and >> nothing (rounded) normalised or counts of covered base pairs. I had a look >> at the RSEM and eXpress documentation and both seem to do some kind of >> estimation due to the isoforms inherent in a transcriptome? On the RSEM >> website it mentions that "popular di! > > ff! >> >> erential expression (DE) analysis tools such as edgeR and DESeq do not >> take variance due to read mapping uncertainty into consideration. Beacause >> read mapping ambiguity is prevalent among isoforms and de novo assembled >> transcripts, these tools are not ideal for DE detection in such conditions." >> They suggest to use EBSeq, but I found max a handful of papers on google >> scholar that actually used RSEM-EBSeq. I'm new to all this and it's getting >> quite confusing. Could you please help? What would I have to do with my data >> and/or my reference transcriptome to be able to use eg the RSEM - DESeq >> (maybe DESeq2) pipeline? Is there a pipeline that you could recommend in my >> situation? >> Thank you so much for your time. >> Kind Regards, >> Nicole >> University of the Sunshine Coast, Locked Bag 4, Maroochydore DC, >> Queensland, 4558 Australia. >> CRICOS Provider No: 01595D >> Please consider the environment before printing this email. >> This email is confidential. If received in error, please delete it from >> your system. >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.7 years ago Michael Love 43k

Login before adding your answer.