Question

Why is my gene not significantly differentially expressed?

0

Entering edit mode

Simran • 0

@de0859cf

Last seen 10 months ago

Thailand

Hi! I conducted DGE analysis between 2 groups of cell lines MYCN amplified vs MYCN non amplified.

Cell lines in the MYCN non amplified group had these FPKM values for MYCN gene: 5.182582, 3.104376, 4.962478

Cell lines in the MYCN amplified group had these FPKM values for the MYCN gene: 101.2204, 301.8182 , 280.6712

Now visually there is a marked difference between these two groups and should be significantly different regardless of the transformation/normalization. However, upon applying logCPM, lmFit and eBayes, I get the following log FC/p.vals for MYCN

logFC: 1.27382121 p.value : 0.8228308 adj. P.value: 0.92421833

why is this change so insignificant??

I'm hoping to catch the eye of Gordon Smyth on this post!

Code shown below


logCPM_SD_rep <- cpm(samp.SD.representative+1, log=TRUE, prior.count=3)
fit.sd.rep <- lmFit(logCPM_SD_rep, design.sd.rep)
fit.sd.rep <- eBayes(fit.sd.rep, trend=TRUE)
topTable(fit.sd.rep, coef=ncol(design.sd.rep))
contr_SD<-makeContrasts(group.sd.repAmp - group.sd.repNot_Amp, levels = colnames(design.sd.rep))
tmp.sd.rep<-contrasts.fit(fit.sd.rep,contr_SD)
tmp.sd.rep <- eBayes(tmp.sd.rep)
top.table.sd.rep <- topTable(tmp.sd.rep, sort.by = "P", n = Inf)
head(top.table.sd.rep, 20)

limmaGUI edgeR limma • 1.0k views

ADD COMMENT • link updated 22 months ago by Gordon Smyth 53k • written 22 months ago by Simran • 0

0

Entering edit mode

Gordon Smyth

ADD REPLY • link 22 months ago Simran • 0

score 1 · Answer 1 · 2024-04-10

I imagine Gordon will be along in a while. For now, if your code reflects what you have done, you are not doing it correctly.

Don't use FPKM. Use the counts, either with glmQLFit/glmQLFTest or voomLmFit
If you must use FPKM, don't convert to logCPM. FPKM stands for fragments per kilobase/million reads and already accounts for library size. You might want to take logs though? I don't know. Don't use FPKM is a better rule.
If using lmFit/contrasts.fit/eBayes, that's the order. There is no reason to do lmFit, then eBayes, then lmFit (again!) and then contrasts.fit, and then eBayes (again!). I have no idea what that will do, and maybe there is error checking to ignore repeated calls like that, but you are relying on the code to save you from your error. There is a limma User's Guide that has tons of examples, none of which look like that. You should read the User's guide.

score 0 · Answer 2 · 2024-04-14

Given the big difference in expression values that you show for MYCN, the gene would almost certainly come up as significant in a DE analysis. Given the large p-value, I would have to guess that you've made a mistake in the analysis somewhere. It's impossible to tell from the information you show though. The code works on data and on a design matrix that are not shown or explained, and the code to extract the p-value is also not shown, so anything could be happening. Adding 1 to the data before you run cpm() is never recommended, but I suspect that there are other issues other than that.