Question

Low count of differential expression data using Deseq2

0

Entering edit mode

aristotele_m ▴ 40

@aristotele_m-6821

Last seen 6.9 years ago

Italy

I have compare 2 group of sample (4 vs 2 control). I use standar ùDESEq2 pipeline but I have obtain this results:

Pca show not homogeneous group .

summary(res)

out of 35000 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 2, 0.0063%
LFC < 0 (down)   : 1, 0.0031%
outliers [1]     : 13, 0.041%
low counts [2]   : 19532, 62%
(mean count < 36.7)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Any idea in what could be wrong on this situation?

R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] genefilter_1.50.0         rafalib_1.0.0             ggplot2_1.0.1             limma_3.24.15            
 [5] RColorBrewer_1.1-2        gplots_2.17.0             org.Hs.eg.db_3.1.2        RSQLite_1.0.0            
 [9] DBI_0.3.1                 annotate_1.46.1           XML_3.98-1.2              AnnotationDbi_1.30.1     
[13] Biobase_2.28.0            biomaRt_2.24.0            DESeq2_1.8.1              RcppArmadillo_0.5.500.2.0
[17] Rcpp_0.12.1               GenomicRanges_1.20.6      GenomeInfoDb_1.4.2        IRanges_2.2.7            
[21] S4Vectors_0.6.5           BiocGenerics_0.14.0       Nozzle.R1_1.1-1          

loaded via a namespace (and not attached):
 [1] gtools_3.5.0         locfit_1.5-9.1       reshape2_1.4.1       splines_3.2.0       
 [5] lattice_0.20-33      colorspace_1.2-6     survival_2.38-3      foreign_0.8-66      
 [9] BiocParallel_1.2.21  lambda.r_1.1.7       plyr_1.8.3           stringr_1.0.0       
[13] munsell_0.4.2        gtable_0.1.2         futile.logger_1.4.1  caTools_1.17.1      
[17] labeling_0.3         latticeExtra_0.6-26  geneplotter_1.46.0   proto_0.3-10        
[21] KernSmooth_2.23-15   acepack_1.3-3.3      xtable_1.7-4         scales_0.3.0        
[25] gdata_2.16.1         Hmisc_3.16-0         XVector_0.8.0        gridExtra_2.0.0     
[29] digest_0.6.8         stringi_0.5-5        grid_3.2.0           tools_3.2.0         
[33] bitops_1.0-6         magrittr_1.5         RCurl_1.95-4.7       Formula_1.2-1       
[37] cluster_2.0.1        futile.options_1.0.0 MASS_7.3-44          rpart_4.1-10        
[41] nnet_7.3-11

deseq2 • 1.6k views

ADD COMMENT • link updated 8.6 years ago by Michael Love 41k • written 8.6 years ago by aristotele_m ▴ 40

score 0 · Answer 1 · 2015-09-17

Having no significant genes (even when increasing the independent filtering threshold to the optimal value of 36.7 here) means that the biological and technical variation in the experiment dominates any true log fold changes across condition given your sample size. Another way to say this is that the experiment was underpowered to detect the changes across condition. Ways to increase power in RNA-seq include increasing the sequencing depth and/or the number of biological replicates.