Question

DESEQ2 1.24 versus 1.14

0

Entering edit mode

michalina.marija • 0

@michalinamarija-22617

Last seen 4.3 years ago

Hi, I performed analysis using old 1.14 version of DESeq2. I had to repeat such analysis now with version 1.24. I've received very different results. With 1.14 I've received 63 hits of differentially expressed genes, while with 1.24 - 293 hits, and 18 overlapped. I try to understand where the difference comes from? Maybe I did sth wrong? for both I've used the followed command: conditions <- factor( c("P1", "P2", "P3", "P4", "P5", "P6", "CTRL", "CTRL", "CTRL", "CTRL", "CTRL")) conditions <- relevel(conditions, "CTRL") design <- data.frame( condition=conditions ) rownames(design) <- colnames(counts) dataset <- DESeqDataSetFromMatrix(countData = counts, colData = design, design = ~condition) dataset <- DESeq(dataset) deresults <- results(dataset) deresults <- deresults[ deresults$padj < 0.05 & complete.cases(deresults$padj), ] deresults <- deresults[ order(deresults$log2FoldChange, decreasing=T),] foldchange <- 2**deresults$log2FoldChange foldchangeinverse <- 1/foldchange newcolumns <- data.frame(GeneID=rownames(deresults), Foldchange=foldchange, Foldchangeinverse=foldchangeinverse) deresults <- cbind( newcolumns, deresults) write.table(as.data.frame(de_results), file="results.xls", sep="\t", quote=F, row.names=F)

deseq2 • 760 views

ADD COMMENT • link updated 4.4 years ago by Michael Love 41k • written 4.4 years ago by michalina.marija • 0

score 0 · Answer 1 · 2019-12-28

0

Entering edit mode

michalina.marija • 0

@michalinamarija-22617

Last seen 4.3 years ago

sorry for wrong format of the command, here it goes: conditions <- factor( c("P1", "P2", "P3", "P4", "P5", "P6", "CTRL", "CTRL", "CTRL", "CTRL", "CTRL")); conditions <- relevel(conditions, "CTRL"); design <- data.frame( condition=conditions ); rownames(design) <- colnames(counts); dataset <- DESeqDataSetFromMatrix(countData = counts, colData = design, design = ~condition); dataset <- DESeq(dataset); deresults <- results(dataset); deresults <- deresults[ deresults$padj < 0.05 & complete.cases(deresults$padj), ]; deresults <- deresults[ order(deresults$log2FoldChange, decreasing=T),]; foldchange <- 2**deresults$log2FoldChange; foldchangeinverse <- 1/foldchange; newcolumns <- data.frame(GeneID=rownames(deresults), Foldchange=foldchange, Foldchangeinverse=foldchangeinverse); deresults <- cbind( newcolumns, deresults); write.table(as.data.frame(de_results), file="results.xls", sep="\t", quote=F, row.names=F)

ADD COMMENT • link 4.4 years ago michalina.marija • 0

0

Entering edit mode

Thanks for your answer. The code is below. Last night I've discovered quite strange behaviour. With absolutely the same counts matrix I'am receiving different number of hits of differentially expressesed genes depending on how I name condition. Namely: with this:

conditions <- factor( c("P1", "P2", "P3", "P4", "P5", "P6", "CTRL", "CTRL", "CTRL", "CTRL", "CTRL"))" conditions <- relevel(conditions, "CTRL")

I've got more than 200 hits.

while with this:

conditions <- factor( c("PATIENTK1", "PATIENTK4", "PATIENTK10", "PATIENTK13", "PATIENTK19", "PATIENTK22", "DGCTRL", "DGCTRL", "DGCTRL", "DGCTRL", "DGCTRL")) conditions <- relevel(conditions, "DGCTRL")

I ve got about 50 hits.

Could you give me advise why is that? what should I change? I've used Deseq2 1.14, and with DEseq 1.26 was about 75 hits for first scenario and about 300 hits for the second. What is going on? I am not too much experienced (yet ;) in bioinformatics.

The full code is below.

library(DESeq2) setwd("C:/Users/Kamil/Desktop/kasia1v2betternamesDeseq21.14") counts <- read.delim("COUNTSKASIA2019betterNames.xls", sep="\t", header=T, row.names=1) counts <- as.matrix(counts) View(counts) conditions <- factor( c("P1", "P2", "P3", "P4", "P5", "P6", "CTRL", "CTRL", "CTRL", "CTRL", "CTRL")) conditions <- relevel(conditions, "CTRL") design <- data.frame( condition=conditions ) rownames(design) <- colnames(counts) dataset <- DESeqDataSetFromMatrix(countData = counts, colData = design, design = ~condition) dataset <- DESeq(dataset) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing deresults <- results(dataset) deresults <- deresults[ deresults$padj < 0.05 & complete.cases(deresults$padj), ] deresults <- deresults[ order(deresults$log2FoldChange, decreasing=T),] foldchange <- 2**deresults$log2FoldChange foldchangeinverse <- 1/foldchange newcolumns <- data.frame(GeneID=rownames(deresults), Foldchange=foldchange, Foldchangeinverse=foldchangeinverse) deresults <- cbind( newcolumns, deresults) write.table(as.data.frame(deresults), file="resultsbetternames_31.12.xls", sep="\t", quote=F, row.names=F)

sessionInfo() R version 3.3.1 (2016-06-21) Platform: x8664-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362) locale: [1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252 LCMONETARY=EnglishUnited States.1252 LCNUMERIC=C LCTIME=EnglishUnited States.1252
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] knitr1.26 DESeq21.14.1 SummarizedExperiment1.2.3 Biobase2.34.0 GenomicRanges1.24.3 GenomeInfoDb1.2.5 IRanges2.6.1
[8] S4Vectors0.10.3 BiocGenerics0.18.0
loaded via a namespace (and not attached): [1] locfit1.5-9.1 Rcpp0.12.16 lattice0.20-33 zeallot0.1.0 digest0.6.23 R62.4.1 backports1.1.5 acepack1.4.1 RSQLite2.1.5 ggplot23.2.1
[11] pillar1.4.3 zlibbioc1.12.0 rlang0.4.2 lazyeval0.2.2 rstudioapi0.10 data.table1.12.8 annotate1.50.1 blob1.2.0 rpart4.1-10 Matrix1.2-6
[21] checkmate1.9.4 splines3.3.1 BiocParallel1.8.2 geneplotter1.50.0 stringr1.4.0 foreign0.8-66 htmlwidgets1.5.1 RCurl1.95-4.12 bit1.1-14 munsell0.5.0
[31] xfun0.11 pkgconfig2.0.3 base64enc0.1-3 htmltools0.4.0 nnet7.3-12 tibble2.1.3 gridExtra2.3 htmlTable1.13.3 Hmisc4.3-0 XML3.98-1.20
[41] crayon1.3.4 bitops1.0-6 grid3.3.1 xtable1.8-4 gtable0.3.0 lifecycle0.1.0 DBI1.1.0 magrittr1.5 scales1.1.0 stringi1.4.3
[51] XVector0.14.1 genefilter1.54.2 latticeExtra0.6-28 vctrs0.2.0 Formula1.2-3 RColorBrewer1.1-2 tools3.3.1 bit640.9-7 survival2.44-1 AnnotationDbi_1.34.4

ADD REPLY • link 4.3 years ago michalina.marija • 0

0

Entering edit mode

See levels(dds$condition)

And as I said previously check the specific contrast printed at the top of the results table when printed to the console.

The point of printing this contrast at the top of the table is so users can be reminded of what test they are performing.

ADD REPLY • link 4.3 years ago Michael Love 41k

0

Entering edit mode

Thanks, now I understand your point.

I have 5 controls and 6 patients to compare. Could you tell me what would be the correct approach:

levels(dataset$condition) [1] "CTRL" "P1" "P2" "P3" "P4" "P5" "P6"

or

levels(dds$condition) [1] "CTRL" "P"

ADD REPLY • link 4.3 years ago michalina.marija • 0

0

Entering edit mode

The analysis is up to you. This support site is for software related questions. I’d recommend discussing your analysis plan with a statistician.

ADD REPLY • link 4.3 years ago Michael Love 41k

0

Entering edit mode

Thank you very much for your help!

ADD REPLY • link 4.3 years ago michalina.marija • 0

score 0 · Answer 2 · 2019-12-28

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 hour ago

United States

This is not too surprising that the list may fluctuate. Methods change over releases and are all documented in the NEWS file.

I can’t easily look over your unformatted code but I would recommend that you ensure when you print the results table to the console that the contrast printed at the top is what you were expecting.

ADD COMMENT • link 4.4 years ago Michael Love 41k

0

Entering edit mode

Thanks for your answer. The code is below. Last night I've discovered quite strange behaviour. With absolutely the same counts matrix I'am receiving different number of hits of differentially expressesed genes depending on how I name condition. Namely: with this:

conditions <- factor( c("P1", "P2", "P3", "P4", "P5", "P6", "CTRL", "CTRL", "CTRL", "CTRL", "CTRL"))" conditions <- relevel(conditions, "CTRL")

I've got more than 200 hits.

while with this:

conditions <- factor( c("PATIENTK1", "PATIENTK4", "PATIENTK10", "PATIENTK13", "PATIENTK19", "PATIENTK22", "DGCTRL", "DGCTRL", "DGCTRL", "DGCTRL", "DGCTRL")) conditions <- relevel(conditions, "DGCTRL")

I ve got about 50 hits.

Could you give me advise why is that? what should I change? I've used Deseq2 1.14, and with DEseq 1.26 was about 75 hits for first scenario and about 300 hits for the second. What is going on? I am not too much experienced (yet ;) in bioinformatics.

The full code is below.

library(DESeq2) setwd("C:/Users/Kamil/Desktop/kasia1v2betternamesDeseq21.14") counts <- read.delim("COUNTSKASIA2019betterNames.xls", sep="\t", header=T, row.names=1) counts <- as.matrix(counts) View(counts) conditions <- factor( c("P1", "P2", "P3", "P4", "P5", "P6", "CTRL", "CTRL", "CTRL", "CTRL", "CTRL")) conditions <- relevel(conditions, "CTRL") design <- data.frame( condition=conditions ) rownames(design) <- colnames(counts) dataset <- DESeqDataSetFromMatrix(countData = counts, colData = design, design = ~condition) dataset <- DESeq(dataset) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing deresults <- results(dataset) deresults <- deresults[ deresults$padj < 0.05 & complete.cases(deresults$padj), ] deresults <- deresults[ order(deresults$log2FoldChange, decreasing=T),] foldchange <- 2**deresults$log2FoldChange foldchangeinverse <- 1/foldchange newcolumns <- data.frame(GeneID=rownames(deresults), Foldchange=foldchange, Foldchangeinverse=foldchangeinverse) deresults <- cbind( newcolumns, deresults) write.table(as.data.frame(deresults), file="resultsbetternames_31.12.xls", sep="\t", quote=F, row.names=F)

sessionInfo() R version 3.3.1 (2016-06-21) Platform: x8664-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362) locale: [1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252 LCMONETARY=EnglishUnited States.1252 LCNUMERIC=C LCTIME=EnglishUnited States.1252
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] knitr1.26 DESeq21.14.1 SummarizedExperiment1.2.3 Biobase2.34.0 GenomicRanges1.24.3 GenomeInfoDb1.2.5 IRanges2.6.1
[8] S4Vectors0.10.3 BiocGenerics0.18.0
loaded via a namespace (and not attached): [1] locfit1.5-9.1 Rcpp0.12.16 lattice0.20-33 zeallot0.1.0 digest0.6.23 R62.4.1 backports1.1.5 acepack1.4.1 RSQLite2.1.5 ggplot23.2.1
[11] pillar1.4.3 zlibbioc1.12.0 rlang0.4.2 lazyeval0.2.2 rstudioapi0.10 data.table1.12.8 annotate1.50.1 blob1.2.0 rpart4.1-10 Matrix1.2-6
[21] checkmate1.9.4 splines3.3.1 BiocParallel1.8.2 geneplotter1.50.0 stringr1.4.0 foreign0.8-66 htmlwidgets1.5.1 RCurl1.95-4.12 bit1.1-14 munsell0.5.0
[31] xfun0.11 pkgconfig2.0.3 base64enc0.1-3 htmltools0.4.0 nnet7.3-12 tibble2.1.3 gridExtra2.3 htmlTable1.13.3 Hmisc4.3-0 XML3.98-1.20
[41] crayon1.3.4 bitops1.0-6 grid3.3.1 xtable1.8-4 gtable0.3.0 lifecycle0.1.0 DBI1.1.0 magrittr1.5 scales1.1.0 stringi1.4.3
[51] XVector0.14.1 genefilter1.54.2 latticeExtra0.6-28 vctrs0.2.0 Formula1.2-3 RColorBrewer1.1-2 tools3.3.1 bit640.9-7 survival2.44-1 AnnotationDbi_1.34.4

ADD REPLY • link 4.3 years ago michalina.marija • 0

0

Entering edit mode

Thanks for your answer. The code is below. Last night I've discovered quite strange behaviour. With absolutely the same counts matrix I'am receiving different number of hits of differentially expressesed genes depending on how I name condition. Namely: with this:

conditions <- factor( c("P1", "P2", "P3", "P4", "P5", "P6", "CTRL", "CTRL", "CTRL", "CTRL", "CTRL"))" conditions <- relevel(conditions, "CTRL")

I've got more than 200 hits.

while with this:

conditions <- factor( c("PATIENTK1", "PATIENTK4", "PATIENTK10", "PATIENTK13", "PATIENTK19", "PATIENTK22", "DGCTRL", "DGCTRL", "DGCTRL", "DGCTRL", "DGCTRL")) conditions <- relevel(conditions, "DGCTRL")

I ve got about 50 hits.

Could you give me advise why is that? what should I change? I've used Deseq2 1.14, and with DEseq 1.26 was about 75 hits for first scenario and about 300 hits for the second. What is going on? I am not too much experienced (yet ;) in bioinformatics.

The full code is below.

library(DESeq2) setwd("C:/Users/Kamil/Desktop/kasia1v2betternamesDeseq21.14") counts <- read.delim("COUNTSKASIA2019betterNames.xls", sep="\t", header=T, row.names=1) counts <- as.matrix(counts) View(counts) conditions <- factor( c("P1", "P2", "P3", "P4", "P5", "P6", "CTRL", "CTRL", "CTRL", "CTRL", "CTRL")) conditions <- relevel(conditions, "CTRL") design <- data.frame( condition=conditions ) rownames(design) <- colnames(counts) dataset <- DESeqDataSetFromMatrix(countData = counts, colData = design, design = ~condition) dataset <- DESeq(dataset) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing deresults <- results(dataset) deresults <- deresults[ deresults$padj < 0.05 & complete.cases(deresults$padj), ] deresults <- deresults[ order(deresults$log2FoldChange, decreasing=T),] foldchange <- 2**deresults$log2FoldChange foldchangeinverse <- 1/foldchange newcolumns <- data.frame(GeneID=rownames(deresults), Foldchange=foldchange, Foldchangeinverse=foldchangeinverse) deresults <- cbind( newcolumns, deresults) write.table(as.data.frame(deresults), file="resultsbetternames_31.12.xls", sep="\t", quote=F, row.names=F)

sessionInfo() R version 3.3.1 (2016-06-21) Platform: x8664-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362) locale: [1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252 LCMONETARY=EnglishUnited States.1252 LCNUMERIC=C LCTIME=EnglishUnited States.1252
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] knitr1.26 DESeq21.14.1 SummarizedExperiment1.2.3 Biobase2.34.0 GenomicRanges1.24.3 GenomeInfoDb1.2.5 IRanges2.6.1
[8] S4Vectors0.10.3 BiocGenerics0.18.0
loaded via a namespace (and not attached): [1] locfit1.5-9.1 Rcpp0.12.16 lattice0.20-33 zeallot0.1.0 digest0.6.23 R62.4.1 backports1.1.5 acepack1.4.1 RSQLite2.1.5 ggplot23.2.1
[11] pillar1.4.3 zlibbioc1.12.0 rlang0.4.2 lazyeval0.2.2 rstudioapi0.10 data.table1.12.8 annotate1.50.1 blob1.2.0 rpart4.1-10 Matrix1.2-6
[21] checkmate1.9.4 splines3.3.1 BiocParallel1.8.2 geneplotter1.50.0 stringr1.4.0 foreign0.8-66 htmlwidgets1.5.1 RCurl1.95-4.12 bit1.1-14 munsell0.5.0
[31] xfun0.11 pkgconfig2.0.3 base64enc0.1-3 htmltools0.4.0 nnet7.3-12 tibble2.1.3 gridExtra2.3 htmlTable1.13.3 Hmisc4.3-0 XML3.98-1.20
[41] crayon1.3.4 bitops1.0-6 grid3.3.1 xtable1.8-4 gtable0.3.0 lifecycle0.1.0 DBI1.1.0 magrittr1.5 scales1.1.0 stringi1.4.3
[51] XVector0.14.1 genefilter1.54.2 latticeExtra0.6-28 vctrs0.2.0 Formula1.2-3 RColorBrewer1.1-2 tools3.3.1 bit640.9-7 survival2.44-1 AnnotationDbi_1.34.4