Hi,
last year I ran DESeq2 on RNASeq data comparing treated vs untreated cells. When I initially ran DESeq2 I found that I had rather low log2FC and a kind of high variation. However, I now re-ran DESeq2 on the same data with the same script
ddsFullCountTable <- DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design = ~ pheno) ddsFullCountTable <- DESeq(ddsFullCountTable, parallel=F) #parallel somehow doesn`t to work anymore... res <- results(ddsFullCountTable,contrast=c("pheno","ctrl", "mf"), parallel=F) # rank genes by pvalue resOrdered <- res[order(res$padj),] resSig <- subset(resOrdered, padj<pvalue_cutoff)
and when I look at the results they look more or less comletely different, which might be illustrated by thes two volcano plots:
old (Aug 17):
new (Jun 18):
The gene in the new plot with a log2FC ~-21 is a transcript which in the DESeq normalised read counts has in the three control samples 0 counts and in two of the three replicates of the treatmean twice 0 and once 180, both in the "old" read count list and the "new" one:.
old:
> TS.28922.norm.old["comp2268393_c0_seq2",] ctrl_50112 ctrl_50113 ctrl_50114 mf_50115 mf_50116 mf_50117 comp2268393_c0_seq2 0 0 0 0 0 180.3483
new:
> TS.28922.norm.new["comp2268393_c0_seq2",] ctrl_50112 ctrl_50113 ctrl_50114 mf_50115 mf_50116 mf_50117 comp2268393_c0_seq2 0 0 0 0 0 180.3483
In the DESeq result from last year this was at the bottom of the result list marked as outlier (i.e. pvalue NA) with a log2FC of ~-0.046:
old:
> subset(DESeqResultOld, X=="comp2268393_c0_seq2") X baseMean log2FoldChange lfcSE stat pvalue padj 28833 comp2268393_c0_seq2 20.23561 -0.04622691 0.04388205 -1.053435 NA NA
new:
> subset(DESeqResultNew, X=="comp2268393_c0_seq2") X baseMean log2FoldChange lfcSE stat pvalue padj 1 comp2268393_c0_seq2 20.23561 -21.16507 4.248619 -4.981636 6.304886e-07 0.01526217
So my question now is, how can this be? Has anything changed in the default DESeq() or reslults() calls between the August 17 version (I don`t know the version number anymore) and v1.20. that I am not aware off? I thing the example should clearly be considered as an outlier (I also tested with calling cooksCutoff = TRUE
and cooksCutoff = FALSE
, but the results are completely different)?
Cheers
Sven
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices datasets utils methods
[9] base
other attached packages:
[1] ggplot2_2.2.1 gplots_3.0.1 RColorBrewer_1.1-2
[4] DESeq2_1.20.0 SummarizedExperiment_1.10.1 DelayedArray_0.6.0
[7] BiocParallel_1.14.1 matrixStats_0.53.1 Biobase_2.40.0
[10] GenomicRanges_1.32.3 GenomeInfoDb_1.16.0 IRanges_2.14.10
[13] S4Vectors_0.18.3 BiocGenerics_0.26.0 genefilter_1.62.0
[16] BiocInstaller_1.30.0 installr_0.20.0 stringr_1.3.1
loaded via a namespace (and not attached):
[1] bit64_0.9-7 splines_3.5.0 gtools_3.5.0
[4] Formula_1.2-3 latticeExtra_0.6-28 blob_1.1.1
[7] GenomeInfoDbData_1.1.0 pillar_1.2.3 RSQLite_2.1.1
[10] backports_1.1.2 lattice_0.20-35 digest_0.6.15
[13] XVector_0.20.0 checkmate_1.8.5 colorspace_1.3-2
[16] htmltools_0.3.6 Matrix_1.2-14 plyr_1.8.4
[19] XML_3.98-1.11 zlibbioc_1.26.0 xtable_1.8-2
[22] snow_0.4-2 scales_0.5.0 gdata_2.18.0
[25] htmlTable_1.12 tibble_1.4.2 annotate_1.58.0
[28] nnet_7.3-12 lazyeval_0.2.1 survival_2.41-3
[31] magrittr_1.5 memoise_1.1.0 foreign_0.8-70
[34] tools_3.5.0 data.table_1.11.4 munsell_0.5.0
[37] locfit_1.5-9.1 cluster_2.0.7-1 AnnotationDbi_1.42.1
[40] compiler_3.5.0 caTools_1.17.1 rlang_0.2.1
[43] grid_3.5.0 RCurl_1.95-4.10 rstudioapi_0.7
[46] htmlwidgets_1.2 bitops_1.0-6 base64enc_0.1-3
[49] gtable_0.2.0 DBI_1.0.0 gridExtra_2.3
[52] knitr_1.20 bit_1.1-14 Hmisc_4.1-1
[55] KernSmooth_2.23-15 stringi_1.1.7 Rcpp_0.12.17
[58] geneplotter_1.58.0 rpart_4.1-13 acepack_1.4.1
Thanks, I actually saw that betaPrior is now set to FALSE, but I ignored it as I thought I`d anyway never used it, but if it was TRUE by default. My mistake.
When I now run DESeq with betaPrior=TRUE and get the same result as before (Aug 17), which should give the result same as if I had used the lfcShrink function, if I understood that correctly from the post you sent.