Question

problem DESeq2 paired samples analysis

0

Entering edit mode

zhizhon • 0

@zhizhon-14287

Last seen 6.1 years ago

when I run DESeq2, I get some trouble with design=~patient+phenotype

> head(coldata)

patient phenotype

ESC00000000001N ESC00000000001 normal

ESC00000000001T ESC00000000001 tumor

ESC00000000002N ESC00000000002 normal

ESC00000000002T ESC00000000002 tumor

ESC00000000003N ESC00000000003 normal

ESC00000000003T ESC00000000003 tumor

> dim(coldata)
[1] 242 2

>dds1=DESeqDataSetFromTximport(txi,colData=coldata,design=~phenotype)

>dds1=dds1[1:500,]

> dds1=DESeq(dds1)

estimating size factors

using 'avgTxLength' from assays(dds), correcting for library size

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

-- replacing outliers and refitting for 12 genes

-- DESeq argument 'minReplicatesForReplace' = 7

-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

This takes about one minute, but when I change the design to design=~patient+phenotype , it stuck!

> dds2=DESeqDataSetFromTximport(txi,colData=coldata,design=~patient+phenotype)
using counts and average transcript lengths from tximport
> dds2=dds2[1:500,]
> dds2=DESeq(dds2)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates

What did I do wrong?

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)

Matrix products: default
BLAS: /software/bcbio/anaconda/lib/R/lib/libRblas.so
LAPACK: /software/bcbio/anaconda/lib/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines parallel stats4 stats graphics grDevices utils
[8] datasets methods base

other attached packages:
[1] dplyr_0.7.4 ggrepel_0.6.5
[3] ggplot2_2.2.0 CancerSubtypes_1.4.0
[5] NMF_0.20.6 cluster_2.0.6
[7] rngtools_1.2.4 pkgmaker_0.22
[9] registry_0.3 sigclust_1.1.0
[11] cqn_1.24.0 quantreg_5.33
[13] SparseM_1.76 preprocessCore_1.40.0
[15] nor1mix_1.2-3 mclust_5.3
[17] IHW_1.6.0 DESeq2_1.18.1
[19] SummarizedExperiment_1.8.0 DelayedArray_0.4.1
[21] matrixStats_0.52.2 Biobase_2.38.0
[23] GenomicRanges_1.30.0 GenomeInfoDb_1.14.0
[25] IRanges_2.12.0 S4Vectors_0.16.0
[27] BiocGenerics_0.24.0 tximport_1.6.0

loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-5
[3] doParallel_1.0.10 RColorBrewer_1.1-2
[5] tools_3.4.1 backports_1.0.5
[7] R6_2.2.0 rpart_4.1-10
[9] Hmisc_4.0-3 DBI_0.6-1
[11] lazyeval_0.2.0 colorspace_1.3-2
[13] nnet_7.3-12 gridExtra_2.2.1
[15] bit_1.1-12 compiler_3.4.1
[17] fdrtool_1.2.15 htmlTable_1.11.1
[19] labeling_0.3 slam_0.1-40
[21] scales_0.4.1 checkmate_1.8.2
[23] readr_1.1.0 genefilter_1.60.0
[25] stringr_1.2.0 digest_0.6.12
[27] foreign_0.8-67 XVector_0.18.0
[29] pkgconfig_2.0.1 base64enc_0.1-3
[31] htmltools_0.3.6 lpsymphony_1.6.0
[33] limma_3.34.1 htmlwidgets_0.9
[35] rlang_0.1.2 impute_1.52.0
[37] rstudioapi_0.7 RSQLite_2.0
[39] bindr_0.1 BiocParallel_1.12.0
[41] acepack_1.4.1 RCurl_1.95-4.8
[43] magrittr_1.5 GenomeInfoDbData_0.99.1
[45] Formula_1.2-1 heatmap.plus_1.3

[47] Matrix_1.2-7.1 Rcpp_0.12.13
[49] munsell_0.4.3 stringi_1.1.2
[51] zlibbioc_1.24.0 plyr_1.8.4
[53] grid_3.4.1 blob_1.1.0
[55] lattice_0.20-34 annotate_1.56.0
[57] hms_0.3 locfit_1.5-9.1
[59] knitr_1.16 SNFtool_2.2.1
[61] rjson_0.2.15 geneplotter_1.56.0
[63] reshape2_1.4.2 codetools_0.2-15
[65] glue_1.1.1 XML_3.98-1.6
[67] iCluster_2.1.0 latticeExtra_0.6-28
[69] data.table_1.10.4 foreach_1.4.3
[71] MatrixModels_0.4-1 gtable_0.2.0
[73] assertthat_0.1 gridBase_0.4-7
[75] xtable_1.8-2 ConsensusClusterPlus_1.42.0
[77] survival_2.40-1 tibble_1.3.3
[79] iterators_1.0.8 AnnotationDbi_1.40.0
[81] memoise_1.1.0 bindrcpp_0.2

Thanks for any response!

deseq2 • 521 views

ADD COMMENT • link 6.1 years ago zhizhon • 0

0

Entering edit mode

Hi,Michael!

I have 121 tumor vs normal pairs. and I only took genes with TPM >1 in at least 75% samples. After filtering, 28102 genes left. It have been running more than one day. I interrupted it and then did the different expression analysis with "limma-voom". it is really fast!

ADD REPLY • link 6.1 years ago zhizhon • 0

score 1 · Accepted Answer · 2018-03-08

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 13 hours ago

United States

It's probably not "stuck" but just taking a lot longer, likely because there are many coefficients you are adding to the design. How many patients are in the dataset? My typical recommendations to speed up DESeq2 is to be more aggressive about filtering low count genes and you can use the parallel argument to use multiple cores.

ADD COMMENT • link 6.1 years ago Michael Love 41k