problem DESeq2 paired samples analysis
1
0
Entering edit mode
zhizhon • 0
@zhizhon-14287
Last seen 6.1 years ago

when I run DESeq2, I get some trouble with design=~patient+phenotype

> head(coldata)

                       patient phenotype

ESC00000000001N ESC00000000001    normal

ESC00000000001T ESC00000000001     tumor

ESC00000000002N ESC00000000002    normal

ESC00000000002T ESC00000000002     tumor

ESC00000000003N ESC00000000003    normal

ESC00000000003T ESC00000000003     tumor

> dim(coldata)
[1] 242   2

>dds1=DESeqDataSetFromTximport(txi,colData=coldata,design=~phenotype)

>dds1=dds1[1:500,]

> dds1=DESeq(dds1)

estimating size factors

using 'avgTxLength' from assays(dds), correcting for library size

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

-- replacing outliers and refitting for 12 genes

-- DESeq argument 'minReplicatesForReplace' = 7

-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

This takes about one minute, but when I change the design to design=~patient+phenotype , it stuck!

  > dds2=DESeqDataSetFromTximport(txi,colData=coldata,design=~patient+phenotype)
using counts and average transcript lengths from tximport
> dds2=dds2[1:500,]
> dds2=DESeq(dds2)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates

 What did I do wrong?

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)

Matrix products: default
BLAS: /software/bcbio/anaconda/lib/R/lib/libRblas.so
LAPACK: /software/bcbio/anaconda/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
 [1] splines   parallel  stats4    stats     graphics  grDevices utils
 [8] datasets  methods   base

other attached packages:
 [1] dplyr_0.7.4                ggrepel_0.6.5
 [3] ggplot2_2.2.0              CancerSubtypes_1.4.0
 [5] NMF_0.20.6                 cluster_2.0.6
 [7] rngtools_1.2.4             pkgmaker_0.22
 [9] registry_0.3               sigclust_1.1.0
[11] cqn_1.24.0                 quantreg_5.33
[13] SparseM_1.76               preprocessCore_1.40.0
[15] nor1mix_1.2-3              mclust_5.3
[17] IHW_1.6.0                  DESeq2_1.18.1
[19] SummarizedExperiment_1.8.0 DelayedArray_0.4.1
[21] matrixStats_0.52.2         Biobase_2.38.0
[23] GenomicRanges_1.30.0       GenomeInfoDb_1.14.0
[25] IRanges_2.12.0             S4Vectors_0.16.0
[27] BiocGenerics_0.24.0        tximport_1.6.0

 

 

 

loaded via a namespace (and not attached):
 [1] bitops_1.0-6                bit64_0.9-5
 [3] doParallel_1.0.10           RColorBrewer_1.1-2
 [5] tools_3.4.1                 backports_1.0.5
 [7] R6_2.2.0                    rpart_4.1-10
 [9] Hmisc_4.0-3                 DBI_0.6-1
[11] lazyeval_0.2.0              colorspace_1.3-2
[13] nnet_7.3-12                 gridExtra_2.2.1
[15] bit_1.1-12                  compiler_3.4.1
[17] fdrtool_1.2.15              htmlTable_1.11.1
[19] labeling_0.3                slam_0.1-40
[21] scales_0.4.1                checkmate_1.8.2
[23] readr_1.1.0                 genefilter_1.60.0
[25] stringr_1.2.0               digest_0.6.12
[27] foreign_0.8-67              XVector_0.18.0
[29] pkgconfig_2.0.1             base64enc_0.1-3
[31] htmltools_0.3.6             lpsymphony_1.6.0
[33] limma_3.34.1                htmlwidgets_0.9
[35] rlang_0.1.2                 impute_1.52.0
[37] rstudioapi_0.7              RSQLite_2.0
[39] bindr_0.1                   BiocParallel_1.12.0
[41] acepack_1.4.1               RCurl_1.95-4.8
[43] magrittr_1.5                GenomeInfoDbData_0.99.1
[45] Formula_1.2-1               heatmap.plus_1.3

[47] Matrix_1.2-7.1              Rcpp_0.12.13
[49] munsell_0.4.3               stringi_1.1.2
[51] zlibbioc_1.24.0             plyr_1.8.4
[53] grid_3.4.1                  blob_1.1.0
[55] lattice_0.20-34             annotate_1.56.0
[57] hms_0.3                     locfit_1.5-9.1
[59] knitr_1.16                  SNFtool_2.2.1
[61] rjson_0.2.15                geneplotter_1.56.0
[63] reshape2_1.4.2              codetools_0.2-15
[65] glue_1.1.1                  XML_3.98-1.6
[67] iCluster_2.1.0              latticeExtra_0.6-28
[69] data.table_1.10.4           foreach_1.4.3
[71] MatrixModels_0.4-1          gtable_0.2.0
[73] assertthat_0.1              gridBase_0.4-7
[75] xtable_1.8-2                ConsensusClusterPlus_1.42.0
[77] survival_2.40-1             tibble_1.3.3
[79] iterators_1.0.8             AnnotationDbi_1.40.0
[81] memoise_1.1.0               bindrcpp_0.2

 

Thanks for any response!

 

 

deseq2 • 521 views
ADD COMMENT
0
Entering edit mode

Hi,Michael!

I have 121 tumor vs normal pairs. and I only took genes with TPM >1 in  at least 75% samples. After filtering, 28102 genes left.  It have been running more than one day.  I interrupted it and then did the different expression analysis with "limma-voom". it is really fast! 

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

It's probably not "stuck" but just taking a lot longer, likely because there are many coefficients you are adding to the design. How many patients are in the dataset? My typical recommendations to speed up DESeq2 is to be more aggressive about filtering low count genes and you can use the parallel argument to use multiple cores.

ADD COMMENT

Login before adding your answer.

Traffic: 488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6