Search
Question: EdgeR differentially expressed genes vs normal boxplot visualization
0
gravatar for snowru
17 months ago by
snowru0
snowru0 wrote:

I got a differentially expressed gene, with log(mean CPM) = 2.2447; logFC = 11.2344; p-adjusted = 0.0016;

This looks neat. But the problem araises when I take the tpm (transcript per million) values of these samples in 2 groups and draw boxplots.

Attached is a boxplot.

It turns out that the medians of both groups are ZERO, and visually, these two groups should not be called different at all!

Here are the two arrays that I used for boxplot: 

[0,0,0.0363,0,0,0,0,15.1621,0,0,0,0.091,13.1992,0,0.064,0,0,27.9052,15.4516,0,0,0,22.6814,0,0.0124,5.3274]

 [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

 

Here is the boxplot picture https://drive.google.com/open?id=0B0AM3r3EIYRUVl8zNFphWWJCbEk (somehow can't attached to this form)

 

Has anyone already encountered this problem? And I would like to know how to justify this case (statistical package edgeR calls it differential expressed, but it's clearly not -- visually). 

Thanks. 

 

SnowRu

 

 

ADD COMMENTlink modified 17 months ago • written 17 months ago by snowru0
1
gravatar for Aaron Lun
17 months ago by
Aaron Lun20k
Cambridge, United Kingdom
Aaron Lun20k wrote:

The statistical tests in edgeR don't care about the median. If you have two groups, then edgeR will test the null hypothesis that the mean count (normalized by effective library size) is equal between groups. In your case, the means are clearly different as one of the groups has all-zero expression and the other group has many samples with non-zero expression. The data provides evidence against the null hypothesis; ergo, you get a low p-value.

ADD COMMENTlink modified 17 months ago • written 17 months ago by Aaron Lun20k
0
gravatar for snowru
17 months ago by
snowru0
snowru0 wrote:

agree, but if you present this boxplot to your audience, you will have very hard time persuading them that this gene is differentially expresses. 

ADD COMMENTlink written 17 months ago by snowru0

Well, I'm not sure what you want edgeR to say. Clearly, this gene is differentially expressed between your groups. Perhaps not in every sample, but the mean expression is definitely different, so what more do you want? Similar scenarios arise in analyses of single-cell RNA-seq data where a gene may not be expressed in every cell of a population, but the average population-level expression is still different between two groups. I've never found this hard to explain.

P.S. Reply to answers using the "add comment" or "add reply" buttons, not the "add answer" button.

ADD REPLYlink modified 17 months ago • written 17 months ago by Aaron Lun20k

This isn't a problem specific to RNA-seq. Any measurement at or near the detection limit of any assay is going to have a boxplot that looks like this.

ADD REPLYlink written 17 months ago by Ryan C. Thompson6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 141 users visited in the last hour