A DESeq2 problem in dealing with case-control study
2
0
Entering edit mode
linw • 0
@linw-16024
Last seen 6.6 years ago

Hi everyone:

I have a problem with the result of DESeq2 . When I dealt with case-control research, I used following commands . case=8 , control num=8. I just want the DEGs in case and control. 

> library("DESeq2")
> countData <- read.table('featureCounts.txt', header=TRUE, row.names=1)
> countData <- countData[ ,7:ncol(countData)];
> countData <- as.matrix(countData);
> condition <- factor(c(rep('C0', 8),rep('C1', 8)));
> (colData <- data.frame(row.names=colnames(countData), condition));
           condition
SRR2422919        C0
SRR2422920        C0
SRR2422921        C0
SRR2422922        C0
SRR2422923        C0
SRR2422924        C0
SRR2422925        C0
SRR2422926        C0
SRR2422927        C1
SRR2422928        C1
SRR2422929        C1
SRR2422930        C1
SRR2422931        C1
SRR2422932        C1
SRR2422933        C1
SRR2422934        C1
> dds <- DESeqDataSetFromMatrix(countData=countData, colData=colData, design=~ condition);
> dds <- dds[ rowSums(counts(dds)) >= 2, ];dds$condition <- relevel(dds$condition, ref='C1');
> dds <- DESeq(dds); 
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 452 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
> res <- results(dds,contrast=c("condition","C0","C1"), alpha=0.050000);
> (summary(res));

 

I got a error result ,but I don't know why.  And in the result of res ,all the padj is same value. Is there a limit  number of replicate? Or there is something wrong in my commands or input data?

out of 50569 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 0, 0% 
outliers [1]     : 0, 0% 
low counts [2]   : 6, 0.012% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

NULL

Thank you for any advice you can provide!

Best wishes!

Wei.

my sessionInfo()

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices
[6] utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.18.1              SummarizedExperiment_1.8.1
 [3] DelayedArray_0.4.1         matrixStats_0.53.1        
 [5] Biobase_2.38.0             GenomicRanges_1.30.3      
 [7] GenomeInfoDb_1.14.0        IRanges_2.12.0            
 [9] S4Vectors_0.16.0           BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2     checkmate_1.8.5       
 [3] cluster_2.0.7-1        rstudioapi_0.7        
 [5] magrittr_1.5           acepack_1.4.1         
 [7] gtable_0.2.0           zlibbioc_1.24.0       
 [9] memoise_1.1.0          data.table_1.11.4     
[11] RCurl_1.95-4.10        base64enc_0.1-3       
[13] XVector_0.18.0         pillar_1.2.3          
[15] htmltools_0.3.6        stringr_1.3.1         
[17] genefilter_1.60.0      splines_3.4.4         
[19] Formula_1.2-3          lattice_0.20-35       
[21] bit_1.1-14             survival_2.42-3       
[23] annotate_1.56.2        htmlwidgets_1.2       
[25] plyr_1.8.4             locfit_1.5-9.1        
[27] knitr_1.20             gridExtra_2.3         
[29] Matrix_1.2-14          GenomeInfoDbData_1.0.0
[31] digest_0.6.15          colorspace_1.3-2      
[33] AnnotationDbi_1.40.0   geneplotter_1.56.0    
[35] stringi_1.1.7          RSQLite_2.1.1         
[37] lazyeval_0.2.1         Hmisc_4.1-1           
[39] tibble_1.4.2           compiler_3.4.4        
[41] bit64_0.9-7            rpart_4.1-13          
[43] backports_1.1.2        htmlTable_1.12        
[45] BiocParallel_1.12.0    xtable_1.8-2          
[47] DBI_1.0.0              munsell_0.4.3         
[49] Rcpp_0.12.17           XML_3.98-1.11         
[51] blob_1.1.1             ggplot2_2.2.1         
[53] latticeExtra_0.6-28    tools_3.4.4           
[55] foreign_0.8-70         bitops_1.0-6          
[57] nnet_7.3-12            scales_0.5.0          
[59] rlang_0.2.1            grid_3.4.4  

deseq2 • 1.0k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 17 hours ago
United States

The above is just a message, not an error. Warnings and errors in R will be printed with the word "Warning:" or "Error:"

The above says that there were count outliers detected, and these were replaced and the model refit without the outliers. This is part of the DESeq2 procedure, which you can turn off if you don't want it to do that. See the vignette sections on outliers for more details.

If you have all padj set to 1, this means that the p-values were relatively uniform from 0 to 1. This means that your data is consistent with the null hypothesis, that there are no differences across condition that were larger relative to the variation within condition. Have you looked at a PCA plot of your data?

 

ADD COMMENT
0
Entering edit mode

Thanks for your reply! 

I plot my data ,and here is my code and picture.

vsd <- vst(dds, blind=FALSE);plotPCA(vsd,intgroup=c("condition"))


I still confuse about the result . If the padj in res is the same value ,it means there is no DEGs in my data. Am I right?

Here is part of my res, and padj is almost the same .

baseMean log2FoldChange lfcSE stat pvalue padj

ENSG00000223972.5_2 6.5451997638089 -0.0189375878914287 0.729114828678411 -0.0259733956114359 0.979278558516615 0.999831031256582

ENSG00000227232.5_2 120.847064381123 -0.186905549119038 0.329696931365903 -0.566901088052916 0.570781384906547 0.999831031256582

ENSG00000243485.5_3 0.808323540891408 -0.074012933209564 1.37926898289051 -0.0536609857306127 0.957205266868574 0.999831031256582

ENSG00000240361.2_3 0.239939440001618 1.14714451288691 3.00810143961501 0.381351671781961 0.702942316641613 0.999831031256582

ENSG00000238009.6_4 7.07309580387112 0.304727068397541 0.527410209655461 0.57777999518934 0.563412662998333 0.999831031256582

ENSG00000233750.3_2 8.51424947357038 0.481305868904192 0.645876990754807 0.74519742271932 0.456152380689177 0.999831031256582

ENSG00000237683.5 44.6016455674963 -0.482214380258619 0.608012199829965 -0.793099843051625 0.42771965427787 0.999831031256582

ENSG00000268903.1_3 0.210424498191674 -0.678764339221412 2.60481045070771 -0.260581087210011 0.794415576134089 0.999831031256582

ENSG00000239906.1_3 0.992353998475917 0.24558917118856 1.09000383248799 0.225310374026841 0.821737829837437 0.999831031256582

ENSG00000241860.6_3 40.9000005523611 0.0323096767435896 0.357865331760903 0.0902844558443465 0.928061172140727 0.999831031256582

 

ADD REPLY
0
Entering edit mode

It looks like the differences between groups are not larger in general than the differences among biological replicates, which gives you a visual sense of the lack of small p-values.

The padj being the same is a natural consequence of the method of Benjamini and Hochberg.

ADD REPLY
1
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 8 hours ago
San Diego

Well, just kind of eyeballing your PCA plot, it looks like the most significant thing going on in your data isn't differences caused by treatment.  It's whatever is making those four or five samples separate from the rest.  Batch effect?

So you should probably believe what DESeq2 is telling you.

ADD COMMENT

Login before adding your answer.

Traffic: 909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6