Strange results from Deseq2
1
0
Entering edit mode
swbarnes2 ★ 1.3k
@swbarnes2-14086
Last seen 17 hours ago
San Diego

I'm getting strange results from DESeq2 results.  I'm comparing 3 samples to another group of 3 samples, which I know is less than ideal

 

counts<-read.csv("raw_180803.csv", header = TRUE, row.names = 1, check.names = FALSE)
anno <- read.csv("anno_180803.csv", header = TRUE, row.names = 1)

anno$donor <- factor(anno$donor)

dds <- DESeqDataSetFromMatrix(countData = counts, colData = anno, design =~ 1)

dds <- dds[rowSums(counts(dds)) > 1,]

design(dds) <- ~ group + donor

dds <-DESeq(dds)

res<-results(dds, contrast = c("group", "hIgG1", "unstim_none"))

summary(res)

out of 20305 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 1, 0.0049%
LFC < 0 (down)     : 0, 0%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%

(mean count < 0)

 

 

res<-results(dds, contrast = c("group", "hIgG4", "unstim_none"))

summary(res)

out of 20305 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 26, 0.13%
LFC < 0 (down)     : 10, 0.049%
outliers [1]       : 0, 0%
low counts [2]     : 12991, 64%
(mean count < 124)

But the counts for IgG4 and IgG1 just don't look that different across the board. Why is the mean count threshold set so differently between the two contrasts?  (The mean of normalized counts for the hIgG1 samples ranges between 650-790).  

Here are normalized counts for one gene, but every comparison I do, except for one, shows NA for the adjusted p-value

 

59.04707 79.40045 37.52253 63.28349 82.1004 42.47123 53.48357 68.60829 36.32574 71.87155 73.46171 46.15779 56.68832 60.98544

39.50018

 I thought that NA's meant either outliers or too few reads, but can that be the case with these normalized counts?  

Should I stop worrying and just believe my data?  What should I look at to figure out why these two contrasts are so different from each other?

EDIT:

I can force the software to filter based on normalized means at the level I choose (I chose 180 here) instead of letting the software pick for me

 

res<-results(dds, contrast = c("group", "hIgG1", "unstim_none"), filter = dds@rowRanges@elementMetadata$baseMean > 180)

summary(res)

out of 20305 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 2, 0.0098%
LFC < 0 (down)     : 1, 0.0049%
outliers [1]       : 0, 0%
low counts [2]     : 13766, 68%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I guess the issue is that there are almost no significant differences in this particular comparison, (while there are big donor differences), while for the other comparisons, dropping large number of genes which have decent expression is improving the stats for the best ones.

Is the default independent filtering in the results command maybe not doing that great a job, and I should be setting it myself?  It seems plausible that there might be good DE genes with normalized means of about 100 or lower, but the default filtering settings are getting rid of these in most comparisons.

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.9 (Final)

Matrix products: default
BLAS: /.automount/tools/GNU/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /.automount/tools/GNU/R/3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] org.Hs.eg.db_3.6.0          AnnotationDbi_1.42.1
 [3] DESeq2_1.20.0               SummarizedExperiment_1.10.1
 [5] DelayedArray_0.6.2          BiocParallel_1.14.2
 [7] matrixStats_0.54.0          Biobase_2.40.0
 [9] GenomicRanges_1.32.6        GenomeInfoDb_1.16.0
[11] IRanges_2.14.10             S4Vectors_0.18.3
[13] BiocGenerics_0.26.0

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.5.0          Formula_1.2-3
 [4] assertthat_0.2.0       latticeExtra_0.6-28    blob_1.1.1
 [7] GenomeInfoDbData_1.1.0 pillar_1.3.0           RSQLite_2.1.1
[10] backports_1.1.2        lattice_0.20-35        glue_1.3.0
[13] digest_0.6.15          RColorBrewer_1.1-2     XVector_0.20.0
[16] checkmate_1.8.5        colorspace_1.3-2       htmltools_0.3.6
[19] Matrix_1.2-14          plyr_1.8.4             XML_3.98-1.13
[22] pkgconfig_2.0.1        genefilter_1.62.0      zlibbioc_1.26.0
[25] purrr_0.2.5            xtable_1.8-2           scales_0.5.0
[28] htmlTable_1.12         tibble_1.4.2           annotate_1.58.0
[31] ggplot2_3.0.0          nnet_7.3-12            lazyeval_0.2.1
[34] survival_2.42-6        magrittr_1.5           crayon_1.3.4
[37] memoise_1.1.0          foreign_0.8-71         tools_3.5.0
[40] data.table_1.11.4      stringr_1.3.1          locfit_1.5-9.1
[43] munsell_0.5.0          cluster_2.0.7-1        bindrcpp_0.2.2
[46] compiler_3.5.0         rlang_0.2.1            grid_3.5.0
[49] RCurl_1.95-4.11        rstudioapi_0.7         htmlwidgets_1.2
[52] bitops_1.0-6           base64enc_0.1-3        gtable_0.2.0
[55] DBI_1.0.0              R6_2.2.2               gridExtra_2.3
[58] knitr_1.20             dplyr_0.7.6            bit_1.1-14
[61] bindr_0.1.1            Hmisc_4.1-1            stringi_1.2.4
[64] Rcpp_0.12.18           geneplotter_1.58.0     rpart_4.1-13
[67] acepack_1.4.1          tidyselect_0.2.4

deseq2 • 608 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

 The threshold that is chosen by independent filtering depends on whether greater power can it be achieved by increasing the threshold. It needs to be the case that there are alternative hypotheses with sufficient power at higher counts, and that filtering out lower count rows helps to make these differences detectable at the rejection level ‘alpha’ that you specified.  So the filter threshold can go higher with one contrast because there are alternative hypotheses which are aided by filtering, where is the filter stays at 0 for another contrast because there are no alternative hypotheses and only nulls. You can turn the automatic filtering off with independentFiltering=FALSE and you can try alternative methods to filtering such as IHW using the code in the vignette.

ADD COMMENT
0
Entering edit mode

Thanks, I tried some things, and added a follow-up in my question

ADD REPLY
0
Entering edit mode

 There may be genes that you’re missing with low count, and you could turn off independent filtering to try to capture these, but you’ll end up with less differentially expresses genes in total (because that’s how the independent filter works, see the diagram in vignette).

ADD REPLY
0
Entering edit mode

Yeah, I realized that after I wrote it.  Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6