I am trying to run the DESeq2 with only 1378 genes. This subset of genes are defined by the GO term "DNA binding proteins", as I am only interested in the transcription factors in this data set. I have tried with just the standard workflow of using a common dispersion trend but I think due to the low number of genes, it is underpowered to detect any significant differences (FDR 10%). So I thought of using the dispersion per gene (based on a suggestion here: http://seqanswers.com/forums/showthread.php?t=33618).
Here are my codes, where cds.dbp is now a DESeq2 object limited to the 1378 genes:
cds.dbp <- estimateSizeFactors( cds.dbp )
cds.dbp <- estimateDispersionsGeneEst( cds.dbp)
dispersions(cds.dbp) <- mcols(cds.dbp)$dispGeneEst
cds.dbp.2 <- nbinomWaldTest( cds.dbp )
But I received the following error message from the nbinomWaldTest step:
Error in approx(cumsum(wts), x, xout = c(low, high), method = "constant", :
zero non-NA points
Here're the first few rows of mcols(cds.dbp):
DataFrame with 1378 rows and 5 columns
baseMean baseVar allZero dispGeneEst dispersion
<numeric> <numeric> <logical> <numeric> <numeric>
1 5.556475 4.148693e+00 FALSE 0.0000000100 0.0000000100
2 1438.080398 3.912841e+04 FALSE 0.0122872985 0.0122872985
3 2634.390047 2.243086e+05 FALSE 0.0100523772 0.0100523772
4 684.371282 2.216909e+03 FALSE 0.0009121098 0.0009121098
5 172.285501 1.240765e+03 FALSE 0.0270022439 0.0270022439
I am unable to figure out the problem and would appreciate some help. I am also happy to learn if there's any suggestions in general on how test just a subset of genes with the DESeq2 workflow. Thanks.
sessionInfo():
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.10.1 RcppArmadillo_0.6.400.2.2 Rcpp_0.12.2
[4] SummarizedExperiment_1.0.1 Biobase_2.30.0 GenomicRanges_1.22.2
[7] GenomeInfoDb_1.6.1 IRanges_2.4.6 S4Vectors_0.8.5
[10] BiocGenerics_0.16.1 data.table_1.9.6
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3 XVector_0.10.0
[5] futile.options_1.0.0 tools_3.2.3 zlibbioc_1.16.0 rpart_4.1-10
[9] RSQLite_1.0.0 annotate_1.48.0 gtable_0.1.2 lattice_0.20-33
[13] DBI_0.3.1 gridExtra_2.0.0 genefilter_1.52.0 cluster_2.0.3
[17] locfit_1.5-9.1 grid_3.2.3 nnet_7.3-11 AnnotationDbi_1.32.3
[21] XML_3.98-1.3 survival_2.38-3 BiocParallel_1.4.3 foreign_0.8-66
[25] latticeExtra_0.6-26 Formula_1.2-1 geneplotter_1.48.0 ggplot2_2.0.0
[29] lambda.r_1.1.7 Hmisc_3.17-1 scales_0.3.0 splines_3.2.3
[33] xtable_1.8-0 colorspace_1.2-6 acepack_1.3-3.3 munsell_0.4.2
[37] chron_2.3-47