Why is my gene not significantly differentially expressed?
Entering edit mode
Simran • 0
Last seen 5 weeks ago

Hi! I conducted DGE analysis between 2 groups of cell lines MYCN amplified vs MYCN non amplified.

Cell lines in the MYCN non amplified group had these FPKM values for MYCN gene: 5.182582, 3.104376, 4.962478

Cell lines in the MYCN amplified group had these FPKM values for the MYCN gene: 101.2204, 301.8182 , 280.6712

Now visually there is a marked difference between these two groups and should be significantly different regardless of the transformation/normalization. However, upon applying logCPM, lmFit and eBayes, I get the following log FC/p.vals for MYCN

logFC: 1.27382121 p.value : 0.8228308 adj. P.value: 0.92421833

why is this change so insignificant??

I'm hoping to catch the eye of Gordon Smyth on this post!

Code shown below

logCPM_SD_rep <- cpm(samp.SD.representative+1, log=TRUE, prior.count=3)
fit.sd.rep <- lmFit(logCPM_SD_rep, design.sd.rep)
fit.sd.rep <- eBayes(fit.sd.rep, trend=TRUE)
topTable(fit.sd.rep, coef=ncol(design.sd.rep))
contr_SD<-makeContrasts(group.sd.repAmp - group.sd.repNot_Amp, levels = colnames(design.sd.rep))
tmp.sd.rep <- eBayes(tmp.sd.rep)
top.table.sd.rep <- topTable(tmp.sd.rep, sort.by = "P", n = Inf)
head(top.table.sd.rep, 20)
limmaGUI edgeR limma • 375 views
Entering edit mode
Entering edit mode
Last seen 1 hour ago
United States

I imagine Gordon will be along in a while. For now, if your code reflects what you have done, you are not doing it correctly.

  1. Don't use FPKM. Use the counts, either with glmQLFit/glmQLFTest or voomLmFit
  2. If you must use FPKM, don't convert to logCPM. FPKM stands for fragments per kilobase/million reads and already accounts for library size. You might want to take logs though? I don't know. Don't use FPKM is a better rule.
  3. If using lmFit/contrasts.fit/eBayes, that's the order. There is no reason to do lmFit, then eBayes, then lmFit (again!) and then contrasts.fit, and then eBayes (again!). I have no idea what that will do, and maybe there is error checking to ignore repeated calls like that, but you are relying on the code to save you from your error. There is a limma User's Guide that has tons of examples, none of which look like that. You should read the User's guide.
Entering edit mode
Last seen 5 hours ago
WEHI, Melbourne, Australia

Given the big difference in expression values that you show for MYCN, the gene would almost certainly come up as significant in a DE analysis. Given the large p-value, I would have to guess that you've made a mistake in the analysis somewhere. It's impossible to tell from the information you show though. The code works on data and on a design matrix that are not shown or explained, and the code to extract the p-value is also not shown, so anything could be happening. Adding 1 to the data before you run cpm() is never recommended, but I suspect that there are other issues other than that.


Login before adding your answer.

Traffic: 563 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6