Question

Validation of DESeq2 v1.30 results

0

Entering edit mode

Samuel Daniel • 0

@Samuel Daniel-24529

Last seen 3.2 years ago

Hello!

I am currently setting up a pipeline for RNAseq data analysis to identify differentially expressed genes (DEG) using DESeq2. I am trying to validate the results of my pipeline with previously published experiments but I dont seem to find any papers using the current version of DESeq2 (v1.30.0) with available reads. I found that installing previous versions of DESeq2 can be troublesome and that different versions of the software can generate significantly different results.

¿Does anyone know of a dataset that I can use to validate my pipeline with v1.30.0 of DESeq2?

Thanks!

DESeq2 • 1.0k views

ADD COMMENT • link updated 3.3 years ago by Michael Love 41k • written 3.3 years ago by Samuel Daniel • 0

score 1 · Answer 1 · 2021-01-08

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

hi Samuel,

DESeq2 hasn't changed the default methods for DESeq() since v1.16 which was released in April 2017. The majority of benchmarks from the past 5 years are evaluating this version.

I'd recommend always using the latest version of Bioconductor software anyway.

ADD COMMENT • link 3.3 years ago Michael Love 41k

0

Entering edit mode

Thank you for your response Michael Love !

This journey started when we were trying to replicate some results obtained by an outside service with DESeq2 1.22.0 using the current version (1.30.0), we found substantially different results using the same count tables with both versions. I have found a few examples on this forum of people obtaining different results with different DESeq2 versions so I thought this could be what we're seeing with our results. This is also the first time I use software in R and the first time I analyze RNA-seq data, so the differences could also be due to some kind of missuse of the software.

Is it not possible to be seing different results with the same dataset for versions 1.30.0 and 1.22.0?

I'll leave the R code I use for generating the DEG tables, which I later filter for padj = 0.05 and log2foldchange = 1:

# HTSeq-count files are named as "XXX1_transcript_counts.txt" where XXX is  
# three characters that are common to the same conditions 
sampleFiles <- grep("transcript",list.files(directory),value=TRUE)
sampleCondition <- substr(sampleFiles, 0, 3)
sampleTable <- data.frame(sampleName = sampleFiles,
                          fileName = sampleFiles,
                          condition = sampleCondition)
sampleTable$condition <- factor(sampleTable$condition)

library("DESeq2")
dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                       directory = directory,
                                       design= ~ condition)

dds <- DESeq(dds)

 # For XXX as the control and YYY as one of the 5 genotypes under study 
res <- results(dds, contrast=c("condition","XXX","YYY"))  
resOrdered <- res[order(res$padj), ]
write.csv(as.data.frame(resOrdered), 
          file="XXX_YYY_results.csv")

Thank you for the support!

ADD REPLY • link 3.3 years ago Samuel Daniel • 0

1

Entering edit mode

I don't think any of the core methods changed between 1.22 and 1.30 (the previous changes had to do with version < 1.16 when they updated to 1.16 typically). So I'm not sure why you would have seen large changes. Did you obtain the code used by the service, such that you could also reproduce their calls with version 1.22?

Also, do the DE genes from the two methods have any intersection or none?

ADD REPLY • link 3.3 years ago Michael Love 41k

0

Entering edit mode

We were not able to obtain the code they use, they're a private business and apparently can't share some of their steps which is making it pretty difficult for us to replicate their results. We do however have their table of counts and they claim that the use default parameters for the DESeq2 analysis.

The intersection of the DEG is generally 85-95% of their DEG are in our DEG list, with more discrepancies at lower log2FC than higher log2FC. We have one particular case in which they report 23 DEG while we find 62, only 14 of which are present in the intersection. Also, for some samples our number of DEG is higher while for other is lower.

I have tried using the "replaceOutliersWithTrimmedMean()" method as well as adding different lfcThreshold values to the results() function, with altHypothesis="greaterAbs", the intersections vary but they dont get much better. This is why I was looking for datasets analyzed with our currend version of DESeq2 to validate our pipeline.

We will try to use alternative datasets analyzed with version < 1.16 to validate our pipeline since you say that we shouldnt see any changes between the current versions and these ones. Any other suggestion is greatly apreciated!

Thank you!

ADD REPLY • link 3.3 years ago Samuel Daniel • 0

1

Entering edit mode

I don't have any further suggestions, except that I wouldn't recommend to work with closed source consultants. As an open source developer, I can't help to resolve the discrepancy between my side (open source) and their side (closed source).

Re: 1.16, I think you've confused my remarks. I'm noting that, yes, changes in core methods occured pre-1.16, but later versions did not have method changes to the core p-value generation (instead the changes involved auxiliary functions like lfcShrink).

ADD REPLY • link 3.3 years ago Michael Love 41k