Dear all,
I am a bit frustrated right now. I wanted to double check my DESEQ2 Differential Expression (RNA SEQ) results in Limma and now I am getting different results. I dont know what I did differently that the results are so different. I am here interested in the difference in Gender: Male vs. Female. Please find below the relevant part of the two different approaches, if you can see any obvious difference I would be very happy:
DESEQ2:
dds0 <- DESeqDataSetFromMatrix(countData = cts,
colData = colData,
design = ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis + Gender)
......
# Second approach for filtering
### Set thresholds
padj.cutoff <- 0.05
lfc.cutoff <- 1.2
Limma:
design = model.matrix( ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis + Gender, metadata)
fitDupCor <- lmFit(geneExpr, design)
# Fit Empirical Bayes for moderated t-statistics
fitDupCor <- eBayes( fitDupCor )
...
result <- topTable(fitDupCor, number = 50, adjust = "BH", p.value = 0.05, lfc = 1.2, coef = "GenderM")
I used raw counts for both analysis, maybe that was a mistake?
Thank you all!!!
Thank you. I did normalisation and filtering. I tried to show only the main analysis here. I cant find a clear answer whether I should also use raw counts for Limma as I did for DESEQ2. Do you have an opinion on that?
Thanks a lot!
I don't really understand your question. What would you be analysing if not the counts?
We can't help you if you don't show the complete analysis you did from start to finish. The whole analysis is just a few lines, so I don't quite understand why you are only showing snippets of the code in your question.
Thank you.
Sorry I think I am a bit confused.
I agree I should be more clear and show the complete analysis.
Input Limma & DESEQ2:
raw counts
Filtering for Limma & DESEQ2:
So what I did for both models is that I followed the GTEx Pipeline for the filtering: A gene was kept if it has a transcript per million (TPM) ≥ 0.1 and a raw read count ≥ 6 in ≥ 20% samples.
Normalisation for Limma and DESEQ2:
Limma:
Following https://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf I used logCPM with limma trend ("In the limma approach to RNA-seq, read counts are converted to log2-counts-per-million (logCPM) and the mean-variance relationship is modelled either with precision weights or with an empirical Bayes prior trend. The precision weights approach is called “voom” and the prior trend approach is called “limma-trend”).
DESEQ:
For DESEQ2 I understand the normalisation is done automatically by the DeSeq() function. It performs the median of ratios method of normalization (https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html).
Differential Expression Analysis:
DESEQ2:
Limma:
Please just ignore the lfc=1.2 for now, I will change it.
Thank you very much!
You are using two different algorithms. You should expect them to be close, but you can't expect them to be identical.
Thank you for checking. Yes, I agree, but e.g. 2 genes were clearly significant in one algorithm and arent at all in the other.. I think I should only see small differences or? Gordon Smyth @swbarnes2
Well, plot the log CPM and the normalized counts of each, just to eyeball what's happening. Maybe that finding is correct for those different normalizations.