Search
Question: RNA-Seq differential analysis with ballgown, getting more significantly deferentially expressed genes than significantly deferentially expressed transcripts
0
gravatar for ryanding2003
3 months ago by
ryanding20030 wrote:

Hi all,

I used Hisat2, Stringtie and Ballgown to perform differential analysis on my data. I followed the tutorial in the HISAT, StringTie, and Ballgown paper and everything went fine until I used Ballgown to extract genes and transcripts with q-value less than 0.05. I got 293 significantly differentially expressed genes but only 112 significantly differentially expressed transcripts, which doesn't make sense to me. I believe there should be more significantly differentially expressed transcripts than genes since one gene correspond to at least one transcript.

My question is: is it possible to have more differentially expressed genes than transcripts? Or maybe I made some mistakes in my analysis. Any thoughts are extremely valuable to me.

My data has two conditions and each condition has three biological replications.

I am totally new to RNA-Seq and differential analysis so I am sorry if my question is stupid.

Thank you in advance.

ADD COMMENTlink modified 3 months ago by James W. MacDonald47k • written 3 months ago by ryanding20030
1
gravatar for James W. MacDonald
3 months ago by
United States
James W. MacDonald47k wrote:

You should in general expect to have fewer differentially expressed transcripts (DET) than differentially expressed genes (DEG). There are a couple of reasons for this. The first and most obvious is that you are apportioning the available reads for a given gene to all of the transcripts that can arise from that gene. So as an example, say we have a gene with four transcripts, and you have 100 reads that align to the gene. When you count reads/transcript, if they are equally apportioned to each transcript you now have only 25 reads/transcript. As the number of reads/thing you care about goes down, the variance goes up, and as variance goes up, your ability to reliably detect differences is reduced.

So all things equal you should expect fewer DET than DEG. Another less obvious issue is the inherent variability in estimating transcript counts. If the read length were long enough, it would be easy to say what transcript you measured. However, most read lengths are far shorter than the transcript, so you have to probabilistically infer what transcript a read came from. That inference carries its own variability (you are never quite sure you accurately determined what transcript a read came from, whereas you have a higher confidence at the gene level), so you need to also account for the fact that your read counts have an additional source of variation that is probably much higher than what you have at the gene level. This increased variability also reduces your ability to detect DET as compared to DEG.

ADD COMMENTlink written 3 months ago by James W. MacDonald47k

I really appreciate your detailed explanation!

ADD REPLYlink written 3 months ago by ryanding20030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 233 users visited in the last hour