Search
Question: Why does Deseq, Deseq2, and EdgeR give vastly different results.
1
gravatar for michael.steffen
5 months ago by
michael.steffen20 wrote:

 

I am trying to understand why I am getting vastly different results between these 3 different program. I understand why Deseq probably isn't appropriate to use anymore, but I don't understand why Deseq2 and EdgeR are giving such different numbers of DEGs.

Below is the code for Deseq. This results in 146 DEGs:

total_counts<-read.table(file="TestFile.txt",head=TRUE,row.names=1)
expt_design <- data.frame(row.names = colnames(total_counts), 
                          condition = c("YQ", "YW", "OQ", "OW", "OQ", "OW", "OQ", "OW", "YQ", "YW", "YQ", "YW", "OS", "YS", "YS", "OS", "YS", "OS"))
conditions = expt_design$condition
# Input Data into DESeq
data <- newCountDataSet(total_counts, conditions)

#Estimate size factors - difference in coverage between replicates
data <- estimateSizeFactors(data)
sizeFactors(data)
# Estimate dispersion which is computed per-condition
data <- estimateDispersions( data, method="blind", sharingMode="fit-only")
dispTable (data)

#Example pairwise minimum group test.
OQ_YQ <- nbinomTest( data, "OQ", "YQ")
write.table(OQ_YQ,file="TotalResults.txt",sep="\t",row.names=rownames(OQ_YQ),
            col.names=colnames(OQ_YQ),quote=F)
OQ_YQ.Sig <- OQ_YQ[ OQ_YQ$padj < .05, ]
OQ_YQ.Sig <- OQ_YQ.Sig[complete.cases(OQ_YQ.Sig),]
write.table(OQ_YQ.Sig,file="Results_sig.txt",sep="\t",row.names=rownames(OQ_YQ.Sig),
            col.names=colnames(OQ_YQ.Sig),quote=F)
#End DESeq (v1) example

 

 

 

Below is the code for Deseq2. This results in Zero DEGS:

total_counts<-read.table(file="TestFile.txt",head=TRUE,row.names=1)
expt_design <- data.frame(rows = colnames(total_counts),
                          condition = c("YQ", "YW", "OQ", "OW", "OQ", "OW", "OQ", "OW", "YQ", "YW", "YQ", "YW", "OS", "YS", "YS", "OS", "YS", "OS"))

dds <- DESeqDataSetFromMatrix( countData = total_counts, colData = expt_design,           
                               design = ~ condition)                

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
dds
OQ_YQ <- results(dds, contrast=c("condition","QW","YQ"))
OQ_YQ <- as.data.frame(QWvsYQ)
write.csv(OQ_YQ, "Results.csv", row.names=TRUE)
#End Deseq2 examples

Below is the code I have used for EdgeR. This results in 15 DEGs:

#Start edgeR example
x <- read.delim("TestFile.txt",head=TRUE,row.names=1)
group <- factor(c("YQ", "YW", "OQ", "OW", "OQ", "OW", "OQ", "OW", "YQ", "YW", "YQ", "YW", "OS", "YS", "YS", "OS", "YS", "OS"))
y <- DGEList(counts=x,group=group)
y <- calcNormFactors(y)
design <- model.matrix(~0+group)
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
fit <- glmFit(y, design)

#Test Group 1 vs Group 2
lrt <- glmLRT(fit, contrast=c(-1,1,0,0,0,0))
topTags(lrt)
tab1 <- topTags(lrt, n=Inf)
write.csv(tab1, file="Results.csv")

So why do you get such differences? Its all using the same data. Am I doing something wrong in these? All data is raw count data from an RNAseq experiment. The thing that particularly concerns me is using the same code on different data, the DEG breakdown goes as follows: Deseq:59, Deseq2:606, EdgeR: 350. This is going in vastly different directions than the previous dataset, so consistence kind of goes out the window. 

Thanks,

Mike 

ADD COMMENTlink modified 5 months ago • written 5 months ago by michael.steffen20
2
gravatar for Aaron Lun
5 months ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

Well, for starters, it seems like you're using edgeR's glmLRT to compare the first two groups, which are OQ and OS. But it looks like you're comparing different groups with DESeq2, namely QW and YQ (despite the fact that the variable is named OQ_YQ). And for DESeq, again a different comparison: OQ and YQ.

Needless to say, there's no guarantee of getting the same DE genes across different contrasts.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Aaron Lun17k
1

Sorry, yeah, that was a typo. It was meant to be the same exact comparisions that I posted. I had done multiple comparisons across different things and copied the wrong comparisons code, but everything should be the same across comparisons. ​

I edited the question to now reflect that properly. 

Mike 

ADD REPLYlink written 5 months ago by michael.steffen20

Different methods will produce different sets of results, this is true for even basic statistical tests (t-test and Wilcoxon and so on). The methods' operating characteristics also depend on the dataset (in terms of sample size, dynamic range, biological variability, effect size distribution, etc., etc.). If you look at benchmarks, the behavior is pretty consistent, and methods tend to have high overlap on the top 'x' genes. See e.g.: 

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878611/

ADD REPLYlink written 5 months ago by Michael Love14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 212 users visited in the last hour