deseq2 strange error: high number of diffferential genes
2
0
Entering edit mode
@jarod_v6liberoit-6654
Last seen 5.1 years ago
Italy

I have 14 samples divide in two groups ( 11 vs 3 samples).

After normal deseq2 work-flow I found this results:

out of 22492 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 20748, 92%
LFC < 0 (down)   : 100, 0.44%
outliers [1]     : 0, 0%
low counts [2]   : 21, 0.093%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Whi all the genes are  differential expressed...i

 

 

> head(res)
log2 fold change (MLE): Intercept
Wald test p-value: Intercept
DataFrame with 6 rows and 6 columns
                  baseMean log2FoldChange      lfcSE      stat        pvalue
                 <numeric>      <numeric>  <numeric> <numeric>     <numeric>
ENSG00000000003  223.53931       7.804385 0.29525254  26.43291 5.736351e-154
ENSG00000000419  106.27997       6.731726 0.10858462  61.99520  0.000000e+00
ENSG00000000457  127.99856       6.999984 0.09604285  72.88396  0.000000e+00
ENSG00000000460   59.39814       5.892346 0.14531761  40.54805  0.000000e+00
ENSG00000000938   20.36792       4.348227 0.35805062  12.14417  6.160687e-34
ENSG00000000971 3296.12479      11.686555 0.34137287  34.23399 7.549703e-257
                         padj
                    <numeric>
ENSG00000000003 1.339792e-153
ENSG00000000419  0.000000e+00
ENSG00000000457  0.000000e+00
ENSG00000000460  0.000000e+00
ENSG00000000938  9.749256e-34
ENSG00000000971 2.170346e-256

 

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] pheatmap_1.0.8             ggplot2_2.2.1              genefilter_1.54.2         
 [4] biomaRt_2.28.0             DESeq2_1.12.4              SummarizedExperiment_1.2.3
 [7] Biobase_2.32.0             GenomicRanges_1.24.3       GenomeInfoDb_1.8.7        
[10] IRanges_2.6.1              S4Vectors_0.10.3           BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1       splines_3.3.2        lattice_0.20-34      colorspace_1.3-2    
 [5] htmltools_0.3.5      base64enc_0.1-3      survival_2.40-1      XML_3.98-1.5        
 [9] foreign_0.8-67       DBI_0.5-1            BiocParallel_1.6.6   RColorBrewer_1.1-2  
[13] plyr_1.8.4           stringr_1.1.0        zlibbioc_1.18.0      munsell_0.4.3       
[17] gtable_0.2.0         htmlwidgets_0.8      memoise_1.0.0        labeling_0.3        
[21] latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.50.0   AnnotationDbi_1.34.4
[25] htmlTable_1.9        Rcpp_0.12.9          acepack_1.4.1        xtable_1.8-2        
[29] backports_1.0.5      scales_0.4.1         checkmate_1.8.2      limma_3.28.21       
[33] Hmisc_4.0-2          annotate_1.50.1      XVector_0.12.1       gridExtra_2.2.1     
[37] digest_0.6.12        stringi_1.1.2        grid_3.3.2           tools_3.3.2         
[41] bitops_1.0-6         magrittr_1.5         lazyeval_0.2.0       RCurl_1.95-4.8      
[45] tibble_1.2           RSQLite_1.1-2        Formula_1.2-1        cluster_2.0.5       
[49] Matrix_1.2-8         data.table_1.10.0    assertthat_0.1       rpart_4.1-10        
[53] nnet_7.3-12   

 

deseq2 • 1.4k views
ADD COMMENT
0
Entering edit mode

The sample are two type of  tumors. I want to understand if there is a difference on  expression profile between this groups.

The sample size are 53ML of unique reads. I have some sample with low  number of reads as you can see on the picture http://imgur.com/a/zFrpn .

SampleTable <-data.frame(SampleName=metadata$ID,fileName=metadata$NOMIFILE,condition=metadata$CONDITION,ID_clinico =metadata$ID_CLINICO,Corsa=metadata$Corsa)
ddHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=SampleTable,directory="Count/", design= ~ 1) # evels: und<-droplevels(dds$condition)
dds <- estimateSizeFactors(ddHTSeq)

filterCount2<-table(dds1$condition)[2]*0.3
filterCount1<-table(dds1$condition)[1]*0.3

filterCount<-min(filterCount1,filterCount2)

if (filterCount <1.5) {
 
  filterCount = 1.5
 
 
}

idx <- rowSums( counts(dds) >= 10 ) >= filterCount

dds <- dds[idx,]
dds$condition <- relevel(dds$condition, "A")#
dds$condition
dds <- DESeq(dds)

res<-results(dds)
head(res)
summary(res)


 

 

ADD REPLY
0
Entering edit mode
How about the sequencing depth across condition. Is it confounded?
ADD REPLY
0
Entering edit mode

mean(depth[depth$condition == "A",]$Count)
[1] 4.86
>
> mean(depth[depth$condition == "B",]$Count)
[1] 9.87

summary(depth[depth$condition == "A",]$Count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   7.40    7.70    8.00    9.87   11.10   14.20
> summary(depth[depth$condition != "B",]$Count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   0.80    2.25    3.10    4.86    5.70   16.80

 

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States
Can you provide all your code, and tell us more about the samples. Also what about the sequencing depth across the groups. Is it equal or imbalanced?
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

While the sequencing depth is confounded, which is not ideal but not the biggest problem, looking over your code I saw the real problem: your design = ~ 1

So you are testing whether the intercept = 0.

ADD COMMENT
0
Entering edit mode

Sorry I  put the wrong part I use  this for the comparison calculation I use ~1 for unsupervised analisys:

dds1<-DESeqDataSetFromMatrix(countRawCoding,SampleTable, design= ~ condition )


ADD REPLY
0
Entering edit mode
Can you confirm that when you rerun DESeq() and results() with the intended design the results are not as expected? Because the results you report are just what would be expected with ~1.
ADD REPLY
0
Entering edit mode

I check And I found what do you suggest. it was an error on the script. No I have this situation:

out of 20134 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0%
LFC < 0 (down)   : 1, 0.005%
outliers [1]     : 484, 2.4%
low counts [2]   : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

 

Have you any suggestion?

ADD REPLY
0
Entering edit mode

To me this indicates that there are not large differences across condition (not large enough to be found at the sample size you have).

ADD REPLY

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6