DESeq2 - Error in estimateSizeFactorsForMatrix(counts(object)
4
0
Entering edit mode
@fahmi-nazarie-8790
Last seen 9.1 years ago
United Kingdom

Dear Michael Love

I got a problem running DESeq2. Briefly, I have 27 different human tissue and each of them has 3 replicates and some of them only has 2 replicates with a total 76 samples. I want to study the differential expression across the human tissue. I wanna get which gene has differential alternative splicing across tissues. 

 

sampleFiles <- c ( "brain_3b" , "brain_3c" , "brain_a" , ..."Skin"), 

samplesCondition <- c ( "Brain" , "Brain" , "Brain" , ..."Skin"),

sampleNames <-c ( "ERR315477_count.txt" , "ERR315455_count.txt" , "ERR315432_count.txt", ..."ERR315339_count.txt")

 

sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleNames, condition=sampleCondition)

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
ddsHTSeq

 

> dds <- DESeq(ddsHTSeq)

estimating size factors

Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  :

  every gene contains at least one zero, cannot compute log geometric means

 

So, I checked the '0' value and produced below;

 

> rs <- rowSums( counts(ddsHTSeq) == 0 )

> table(rs)

rs

     1      2      3      4      5      6      7      8      9     10     11

163108  27467  16926  12820   9801   8075   6833   6311   5607   5181   4867

    12     13     14     15     16     17     18     19     20     21     22

  4638   4417   4325   4102   3824   3784   3687   3560   3389   3398   3325

    23     24     25     26     27     28     29     30     31     32     33

  3311   3216   3259   3099   3196   3038   3128   3102   3139   3121   3027

    34     35     36     37     38     39     40     41     42     43     44

  3143   3118   3036   3105   3234   3213   3261   3251   3171   3245   3233

    45     46     47     48     49     50     51     52     53     54     55

  3363   3387   3523   3388   3553   3486   3547   3810   3730   3832   4027

    56     57     58     59     60     61     62     63     64     65     66

  4044   4163   4305   4424   4526   4761   4929   4941   5446   5519   5810

    67     68     69     70     71     72     73     74     75     76

  6139   6567   7147   7480   8735   9786  12177  18243  10859  84717

 

Additional information about sampleTable

 

> head(sampleTable)

  sampleName            fileName condition

1   brain_3b ERR315477_count.txt     Brain

2   brain_3c ERR315455_count.txt     Brain

3    brain_a ERR315432_count.txt     Brain

4 thyroid_5b ERR315412_count.txt   Thyroid

5 thyroid_5c ERR315363_count.txt   Thyroid

6 thyroid_5d ERR315397_count.txt   Thyroid

 

> sessionInfo()

R version 3.2.0 (2015-04-16)

Platform: x86_64-unknown-linux-gnu (64-bit)

Running under: Scientific Linux release 6.5 (Carbon)

 

locale:

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C

 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8

 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8

 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C

 [9] LC_ADDRESS=C               LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 

attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets

[8] methods   base

 

other attached packages:

[1] BiocParallel_1.2.20       DESeq2_1.8.1

[3] RcppArmadillo_0.5.400.2.0 Rcpp_0.12.0

[5] GenomicRanges_1.20.5      GenomeInfoDb_1.4.2

[7] IRanges_2.2.4             S4Vectors_0.6.5

[9] BiocGenerics_0.14.0

 

loaded via a namespace (and not attached):

 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3

 [4] XVector_0.8.0        futile.options_1.0.0 tools_3.2.0

 [7] rpart_4.1-10         digest_0.6.8         RSQLite_1.0.0

[10] annotate_1.46.1      gtable_0.1.2         lattice_0.20-33

[13] DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0

[16] genefilter_1.50.0    stringr_1.0.0        cluster_2.0.3

[19] locfit_1.5-9.1       nnet_7.3-11          grid_3.2.0

[22] Biobase_2.28.0       AnnotationDbi_1.30.1 XML_3.98-1.2

[25] survival_2.38-3      foreign_0.8-66       latticeExtra_0.6-26

[28] Formula_1.2-1        geneplotter_1.46.0   ggplot2_1.0.1

[31] reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5

[34] scales_0.3.0         Hmisc_3.16-0         MASS_7.3-44

[37] splines_3.2.0        xtable_1.7-4         colorspace_1.2-6

[40] stringi_0.5-5        acepack_1.3-3.3      munsell_0.4.2

 

Is there any way to solve this problem? Thanks in advance.

deseq2 • 11k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

Take a look at this previous answer:

A: DESeq2 Error in varianceStabilizingTransformation

ADD COMMENT
0
Entering edit mode
@fahmi-nazarie-8790
Last seen 9.1 years ago
United Kingdom

Thank Michael for replying my message. I have followed your advice, but I have got an error. I removed all the gene with 0 value by using 

dds = ddsHTSeq[ rowSums(counts(ddsHTSeq)) > 5, ]

Then 

 

cts = counts(dds)

 geoMeans = apply(cts, 1, function(row) if (all(row == 0)) 0 else exp(mean(log(row[row != 0]))))

dds = estimateSizeFactors(dds, geoMeans=geoMeans)

But the error came out

Error in if (any(value <= 0)) { : missing value where TRUE/FALSE needed

Even if I add TRUE/FALSE function it still doesn't work

geoMeans = apply(cts, 1, function(row) if (all(row == 0 , na.rm = TRUE, zneg.rm = FALSE)) 0 else exp(mean(log(row[row != 0]), na.rm = TRUE, zneg.rm = FALSE)))​

Are there any missing things I missed here? Thanks

 

 

ADD COMMENT
0
Entering edit mode

When exactly did the error occur?

After the error occurs can you run

traceback()

Can you show:

summary(geoMeans)
ADD REPLY
0
Entering edit mode

Oops, I meant traceback().

Hmm I can't reproduce this error on my end. I have a new question though:

Are the zeros coming from a single sample? Can you do: 

table(colSums(counts(dds) == 0))

And then see if you have a single sample with many zeros? You could remove this one if it is an outlier.

idx <- which.max(colSums(counts(dds) == 0))

dds2 <- dds[ , -idx]
ADD REPLY
0
Entering edit mode
@fahmi-nazarie-8790
Last seen 9.1 years ago
United Kingdom

The error occured when I ran this code, which after I filtered out '0' value in my dataset.

dds = estimateSizeFactors(dds, geoMeans=geoMeans)

 

I am not sure if there is a package called trackback().

> trackback()

Error: could not find function "trackback"

 

Here is the summary for geoMeans.

> summary(geoMeans)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
     1.0      3.1      8.0     44.8     30.5 330200.0
ADD COMMENT
0
Entering edit mode
@fahmi-nazarie-8790
Last seen 9.1 years ago
United Kingdom

Hi Michael

I solved the problem. How do I get list of all genes (row) vs tissues (27 different tissues - column) with all the p-value or p-adjsuted value. I just could not find the command line in any questions or manual. Thanks

ADD COMMENT
1
Entering edit mode

Fahim, I faced a similar problem. Could you please tell me how did you solve the issue?

ADD REPLY
0
Entering edit mode
You can make a results table with the results() function. This table has p values and adjusted p values for all genes. Look at rownames(res). You can index the results table using square brackets: res[idx, ]
ADD REPLY
0
Entering edit mode

What I do is like this as says in vignette

res<- results(dds2, contrast=c("condition" , "Brain" , "Thyroid" ))

But I have 27 different tissues, if I make results one by one it will take an ages to finish. How can I extract all these in one command and place it in one file of 'res'?

 

ADD REPLY
0
Entering edit mode
We don't have a function for this. You can write a simple for loop in an R script (there are many free online resources for learning R) or you could ask someone who is familiar with R at your institution to help.
ADD REPLY
0
Entering edit mode

Alright, thanks Michael for your help.

ADD REPLY

Login before adding your answer.

Traffic: 809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6