Question

Different results between Limma and DESEQ2 for Differential Expression Analysis

0

Entering edit mode

Bine ▴ 50

@bine-23912

Last seen 6 months ago

UK

Dear all,

I am a bit frustrated right now. I wanted to double check my DESEQ2 Differential Expression (RNA SEQ) results in Limma and now I am getting different results. I dont know what I did differently that the results are so different. I am here interested in the difference in Gender: Male vs. Female. Please find below the relevant part of the two different approaches, if you can see any obvious difference I would be very happy:

DESEQ2:

dds0 <- DESeqDataSetFromMatrix(countData = cts,
                               colData = colData,
                               design = ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis +  Gender)

......

# Second approach for filtering
### Set thresholds
padj.cutoff <- 0.05

lfc.cutoff <- 1.2

Limma:

design = model.matrix( ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis +  Gender, metadata)

fitDupCor <- lmFit(geneExpr, design)

# Fit Empirical Bayes for moderated t-statistics
fitDupCor <- eBayes( fitDupCor )

...

result <- topTable(fitDupCor, number = 50, adjust = "BH", p.value = 0.05, lfc = 1.2, coef = "GenderM")

I used raw counts for both analysis, maybe that was a mistake?

Thank you all!!!

DESeq2 limma • 2.9k views

ADD COMMENT • link updated 3.2 years ago by swbarnes2 ★ 1.4k • written 3.2 years ago by Bine ▴ 50

score 2 · Answer 1 · 2022-09-25

2

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

There are a number of guides explaining how to do analyses in limma. You need to do all the steps, including normalization and filtering, not just the steps you have shown in your question.

I have already advised you on Biostars not to use lfc=1.2.

ADD COMMENT • link 3.2 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you. I did normalisation and filtering. I tried to show only the main analysis here. I cant find a clear answer whether I should also use raw counts for Limma as I did for DESEQ2. Do you have an opinion on that?

Thanks a lot!

ADD REPLY • link 3.2 years ago Bine ▴ 50

0

Entering edit mode

I don't really understand your question. What would you be analysing if not the counts?

We can't help you if you don't show the complete analysis you did from start to finish. The whole analysis is just a few lines, so I don't quite understand why you are only showing snippets of the code in your question.

ADD REPLY • link 3.2 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you.

Sorry I think I am a bit confused.

I agree I should be more clear and show the complete analysis.

Input Limma & DESEQ2:

raw counts

Filtering for Limma & DESEQ2:

So what I did for both models is that I followed the GTEx Pipeline for the filtering: A gene was kept if it has a transcript per million (TPM) ≥ 0.1 and a raw read count ≥ 6 in ≥ 20% samples.

Normalisation for Limma and DESEQ2:

Limma:

Following https://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf I used logCPM with limma trend ("In the limma approach to RNA-seq, read counts are converted to log2-counts-per-million (logCPM) and the mean-variance relationship is modelled either with precision weights or with an empirical Bayes prior trend. The precision weights approach is called “voom” and the prior trend approach is called “limma-trend”).

DESEQ:

For DESEQ2 I understand the normalisation is done automatically by the DeSeq() function. It performs the median of ratios method of normalization (https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html).

Differential Expression Analysis:

DESEQ2:

dds0 <- DESeqDataSetFromMatrix(countData = cts,
                               colData = colData,
                               design = ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis +  Gender)

### Set thresholds
padj.cutoff <- 0.05

lfc.cutoff <- 1.2

Limma:

design = model.matrix( ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis +  Gender, metadata)

 logCPM <- cpm(dge, log=TRUE, prior.count=3)

 fit <- lmFit(logCPM, design)

 fit <- eBayes(fit, trend=TRUE)

result <- topTable(fitDupCor, number = 50, adjust = "BH", p.value = 0.05, lfc = 1.2, coef = "GenderM")

Please just ignore the lfc=1.2 for now, I will change it.

Thank you very much!

ADD REPLY • link 3.2 years ago Bine ▴ 50

0

Entering edit mode

You are using two different algorithms. You should expect them to be close, but you can't expect them to be identical.

ADD REPLY • link 3.2 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Thank you for checking. Yes, I agree, but e.g. 2 genes were clearly significant in one algorithm and arent at all in the other.. I think I should only see small differences or? Gordon Smyth @swbarnes2

ADD REPLY • link 3.2 years ago Bine ▴ 50

1

Entering edit mode

Well, plot the log CPM and the normalized counts of each, just to eyeball what's happening. Maybe that finding is correct for those different normalizations.

ADD REPLY • link 3.2 years ago swbarnes2 ★ 1.4k