Dear Michael Love
I got a problem running DESeq2. Briefly, I have 27 different human tissue and each of them has 3 replicates and some of them only has 2 replicates with a total 76 samples. I want to study the differential expression across the human tissue. I wanna get which gene has differential alternative splicing across tissues.
sampleFiles <- c ( "brain_3b" , "brain_3c" , "brain_a" , ..."Skin"),
samplesCondition <- c ( "Brain" , "Brain" , "Brain" , ..."Skin"),
sampleNames <-c ( "ERR315477_count.txt" , "ERR315455_count.txt" , "ERR315432_count.txt", ..."ERR315339_count.txt")
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleNames, condition=sampleCondition)
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
ddsHTSeq
> dds <- DESeq(ddsHTSeq)
estimating size factors
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, :
every gene contains at least one zero, cannot compute log geometric means
So, I checked the '0' value and produced below;
> rs <- rowSums( counts(ddsHTSeq) == 0 )
> table(rs)
rs
1 2 3 4 5 6 7 8 9 10 11
163108 27467 16926 12820 9801 8075 6833 6311 5607 5181 4867
12 13 14 15 16 17 18 19 20 21 22
4638 4417 4325 4102 3824 3784 3687 3560 3389 3398 3325
23 24 25 26 27 28 29 30 31 32 33
3311 3216 3259 3099 3196 3038 3128 3102 3139 3121 3027
34 35 36 37 38 39 40 41 42 43 44
3143 3118 3036 3105 3234 3213 3261 3251 3171 3245 3233
45 46 47 48 49 50 51 52 53 54 55
3363 3387 3523 3388 3553 3486 3547 3810 3730 3832 4027
56 57 58 59 60 61 62 63 64 65 66
4044 4163 4305 4424 4526 4761 4929 4941 5446 5519 5810
67 68 69 70 71 72 73 74 75 76
6139 6567 7147 7480 8735 9786 12177 18243 10859 84717
Additional information about sampleTable
> head(sampleTable)
sampleName fileName condition
1 brain_3b ERR315477_count.txt Brain
2 brain_3c ERR315455_count.txt Brain
3 brain_a ERR315432_count.txt Brain
4 thyroid_5b ERR315412_count.txt Thyroid
5 thyroid_5c ERR315363_count.txt Thyroid
6 thyroid_5d ERR315397_count.txt Thyroid
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Scientific Linux release 6.5 (Carbon)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BiocParallel_1.2.20 DESeq2_1.8.1
[3] RcppArmadillo_0.5.400.2.0 Rcpp_0.12.0
[5] GenomicRanges_1.20.5 GenomeInfoDb_1.4.2
[7] IRanges_2.2.4 S4Vectors_0.6.5
[9] BiocGenerics_0.14.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3
[4] XVector_0.8.0 futile.options_1.0.0 tools_3.2.0
[7] rpart_4.1-10 digest_0.6.8 RSQLite_1.0.0
[10] annotate_1.46.1 gtable_0.1.2 lattice_0.20-33
[13] DBI_0.3.1 proto_0.3-10 gridExtra_2.0.0
[16] genefilter_1.50.0 stringr_1.0.0 cluster_2.0.3
[19] locfit_1.5-9.1 nnet_7.3-11 grid_3.2.0
[22] Biobase_2.28.0 AnnotationDbi_1.30.1 XML_3.98-1.2
[25] survival_2.38-3 foreign_0.8-66 latticeExtra_0.6-26
[28] Formula_1.2-1 geneplotter_1.46.0 ggplot2_1.0.1
[31] reshape2_1.4.1 lambda.r_1.1.7 magrittr_1.5
[34] scales_0.3.0 Hmisc_3.16-0 MASS_7.3-44
[37] splines_3.2.0 xtable_1.7-4 colorspace_1.2-6
[40] stringi_0.5-5 acepack_1.3-3.3 munsell_0.4.2
Is there any way to solve this problem? Thanks in advance.
When exactly did the error occur?
After the error occurs can you run
Can you show:
Oops, I meant traceback().
Hmm I can't reproduce this error on my end. I have a new question though:
Are the zeros coming from a single sample? Can you do:
And then see if you have a single sample with many zeros? You could remove this one if it is an outlier.