Reference for differential gene expression analysis
2
0
Entering edit mode
fagire • 0
@fagire-7144
Last seen 9.4 years ago
Uruguay

Hi,

I plan to use your DESeq2 package for differential expression analysis between two conditions and I'm wondering which transcriptome/s (consensus or singles) should I use as reference. I don't have a genome for my specie.

Some people suggest generate a single assembly based on combining all reads across all samples as inputs and then align the reads separately back to the single ("consensus") assembly for downstream analysis of differential expression.

The other option simply consists on aligning the reads of each sample with its corresponding assembly. I do not know to what extent the heterogeneity of individual assemblies (distinct number of genes and isoforms, differences on transcripts lengths, etc) can affect the analysis of differential gene expression.

Which would be the best option?

One last question, many people use RSEM to obtain the expected count (for non-model species) in spite that DESeq use raw count. Do you think, for example, that the Corset (http://genomebiology.com/2014/15/7/410) approach would be better?

Thanks in advance,

Facundo

deseq2 • 2.9k views
ADD COMMENT
2
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 3.8 years ago
Zentrum für Molekularbiologie, Universi…

Hi

I think you have already answered your own question:

"I do not know to what extent the heterogeneity of individual assemblies (distinct number of genes and isoforms, differences on transcripts lengths, etc) can affect the analysis of differential gene expression."

Exactly. But it seems safe to say that it will somehow affect the analysis, and that you then cannot say whether any differences between groups that you see are really biological differences or simply due to differences in quality and content of the sample-specific references.

Therefore, your first option seems to me to be the only sensible way to go: "... generate a single assembly based on combining all reads across all samples as inputs and then align the reads separately back to the single ("consensus") assembly for downstream analysis of differential expression."

"One last question, many people use RSEM to obtain the expected count (for non-model species) in spite that DESeq use raw count. Do you think, for example, that the Corset (http://genomebiology.com/2014/15/7/410) approach would be better?"

I'm not that familiar with RSEM and hadn't heard of Corset until now to give a qualified answer. But judging from the Corset paper's abstract, this sounds like a quite useful approach. Maybe somebody else here can share some first-hand experience in using it?

  Simon
 

ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

Unfortunately I personally don't have enough experience in generation of new transcriptomes to answer which will give better results. 

Both options, single consensus or each to its assembly, can be accommodated by DESeq2. The single consensus method is the typical workflow. If you align reads of each sample to its own assembly, you would obviously need to make sure you've properly matched up the genes from the different samples, and would have to remove any genes which don't have a match across all samples. Secondly, the count of the reads which align to a gene is proportional to the average effective transcript length, where the average is weighted by the proportional expression of each transcript of a gene. This is, for example, provided by RSEM's rsem-calculate-expression as a column "effective_length" in the *genes.results file. So you can use this column, aggregated over samples, to account for the effect of the differences in transcript lengths across samples on the count of reads which aligned uniquely to the genes. The way to do this is to supply a matrix of average effective transcript lengths (so a matrix which is # genes x # samples) to the normMatrix argument of estimateSizeFactors(), and then continuing with DESeq(). If you go with multiple assemblies and try DESeq2, this would be my recommendation, and not to use the expected count.

Yes, Corset looks like it is worth trying here.

ADD COMMENT

Login before adding your answer.

Traffic: 735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6