Question

A DESeq2 problem in dealing with case-control study

0

Entering edit mode

linw • 0

@linw-16024

Last seen 5.8 years ago

Hi everyone:

I have a problem with the result of DESeq2 . When I dealt with case-control research, I used following commands . case=8 , control num=8. I just want the DEGs in case and control.

> library("DESeq2")
> countData <- read.table('featureCounts.txt', header=TRUE, row.names=1)
> countData <- countData[ ,7:ncol(countData)];
> countData <- as.matrix(countData);
> condition <- factor(c(rep('C0', 8),rep('C1', 8)));
> (colData <- data.frame(row.names=colnames(countData), condition));
           condition
SRR2422919        C0
SRR2422920        C0
SRR2422921        C0
SRR2422922        C0
SRR2422923        C0
SRR2422924        C0
SRR2422925        C0
SRR2422926        C0
SRR2422927        C1
SRR2422928        C1
SRR2422929        C1
SRR2422930        C1
SRR2422931        C1
SRR2422932        C1
SRR2422933        C1
SRR2422934        C1
> dds <- DESeqDataSetFromMatrix(countData=countData, colData=colData, design=~ condition);
> dds <- dds[ rowSums(counts(dds)) >= 2, ];dds$condition <- relevel(dds$condition, ref='C1');
> dds <- DESeq(dds); 
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 452 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
> res <- results(dds,contrast=c("condition","C0","C1"), alpha=0.050000);
> (summary(res));

I got a error result ,but I don't know why. And in the result of res ,all the padj is same value. Is there a limit number of replicate? Or there is something wrong in my commands or input data?

out of 50569 with nonzero total read count adjusted p-value < 0.05 LFC > 0 (up) : 0, 0% LFC < 0 (down) : 0, 0% outliers [1] : 0, 0% low counts [2] : 6, 0.012% (mean count < 0) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

NULL

Thank you for any advice you can provide!

Best wishes!

Wei.

my sessionInfo()

> sessionInfo() R version 3.4.4 (2018-03-15) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 (build 7601) Service Pack 1

Matrix products: default

locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936

attached base packages: [1] parallel stats4 stats graphics grDevices [6] utils datasets methods base

other attached packages: [1] DESeq2_1.18.1 SummarizedExperiment_1.8.1 [3] DelayedArray_0.4.1 matrixStats_0.53.1 [5] Biobase_2.38.0 GenomicRanges_1.30.3 [7] GenomeInfoDb_1.14.0 IRanges_2.12.0 [9] S4Vectors_0.16.0 BiocGenerics_0.24.0

loaded via a namespace (and not attached): [1] RColorBrewer_1.1-2 checkmate_1.8.5 [3] cluster_2.0.7-1 rstudioapi_0.7 [5] magrittr_1.5 acepack_1.4.1 [7] gtable_0.2.0 zlibbioc_1.24.0 [9] memoise_1.1.0 data.table_1.11.4 [11] RCurl_1.95-4.10 base64enc_0.1-3 [13] XVector_0.18.0 pillar_1.2.3 [15] htmltools_0.3.6 stringr_1.3.1 [17] genefilter_1.60.0 splines_3.4.4 [19] Formula_1.2-3 lattice_0.20-35 [21] bit_1.1-14 survival_2.42-3 [23] annotate_1.56.2 htmlwidgets_1.2 [25] plyr_1.8.4 locfit_1.5-9.1 [27] knitr_1.20 gridExtra_2.3 [29] Matrix_1.2-14 GenomeInfoDbData_1.0.0 [31] digest_0.6.15 colorspace_1.3-2 [33] AnnotationDbi_1.40.0 geneplotter_1.56.0 [35] stringi_1.1.7 RSQLite_2.1.1 [37] lazyeval_0.2.1 Hmisc_4.1-1 [39] tibble_1.4.2 compiler_3.4.4 [41] bit64_0.9-7 rpart_4.1-13 [43] backports_1.1.2 htmlTable_1.12 [45] BiocParallel_1.12.0 xtable_1.8-2 [47] DBI_1.0.0 munsell_0.4.3 [49] Rcpp_0.12.17 XML_3.98-1.11 [51] blob_1.1.1 ggplot2_2.2.1 [53] latticeExtra_0.6-28 tools_3.4.4 [55] foreign_0.8-70 bitops_1.0-6 [57] nnet_7.3-12 scales_0.5.0 [59] rlang_0.2.1 grid_3.4.4

deseq2 • 893 views

ADD COMMENT • link updated 5.9 years ago by swbarnes2 ★ 1.3k • written 5.9 years ago by linw • 0

score 2 · Answer 1 · 2018-06-05

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 19 hours ago

United States

The above is just a message, not an error. Warnings and errors in R will be printed with the word "Warning:" or "Error:"

The above says that there were count outliers detected, and these were replaced and the model refit without the outliers. This is part of the DESeq2 procedure, which you can turn off if you don't want it to do that. See the vignette sections on outliers for more details.

If you have all padj set to 1, this means that the p-values were relatively uniform from 0 to 1. This means that your data is consistent with the null hypothesis, that there are no differences across condition that were larger relative to the variation within condition. Have you looked at a PCA plot of your data?

ADD COMMENT • link 5.9 years ago Michael Love 41k

0

Entering edit mode

Thanks for your reply!

I plot my data ,and here is my code and picture.

vsd <- vst(dds, blind=FALSE);plotPCA(vsd,intgroup=c("condition"))

I still confuse about the result . If the padj in res is the same value ,it means there is no DEGs in my data. Am I right?

Here is part of my res, and padj is almost the same .

baseMean log2FoldChange lfcSE stat pvalue padj

ENSG00000223972.5_2 6.5451997638089 -0.0189375878914287 0.729114828678411 -0.0259733956114359 0.979278558516615 0.999831031256582

ENSG00000227232.5_2 120.847064381123 -0.186905549119038 0.329696931365903 -0.566901088052916 0.570781384906547 0.999831031256582

ENSG00000243485.5_3 0.808323540891408 -0.074012933209564 1.37926898289051 -0.0536609857306127 0.957205266868574 0.999831031256582

ENSG00000240361.2_3 0.239939440001618 1.14714451288691 3.00810143961501 0.381351671781961 0.702942316641613 0.999831031256582

ENSG00000238009.6_4 7.07309580387112 0.304727068397541 0.527410209655461 0.57777999518934 0.563412662998333 0.999831031256582

ENSG00000233750.3_2 8.51424947357038 0.481305868904192 0.645876990754807 0.74519742271932 0.456152380689177 0.999831031256582

ENSG00000237683.5 44.6016455674963 -0.482214380258619 0.608012199829965 -0.793099843051625 0.42771965427787 0.999831031256582

ENSG00000268903.1_3 0.210424498191674 -0.678764339221412 2.60481045070771 -0.260581087210011 0.794415576134089 0.999831031256582

ENSG00000239906.1_3 0.992353998475917 0.24558917118856 1.09000383248799 0.225310374026841 0.821737829837437 0.999831031256582

ENSG00000241860.6_3 40.9000005523611 0.0323096767435896 0.357865331760903 0.0902844558443465 0.928061172140727 0.999831031256582

ADD REPLY • link 5.9 years ago linw • 0

0

Entering edit mode

It looks like the differences between groups are not larger in general than the differences among biological replicates, which gives you a visual sense of the lack of small p-values.

The padj being the same is a natural consequence of the method of Benjamini and Hochberg.

ADD REPLY • link 5.9 years ago Michael Love 41k

score 1 · Answer 2 · 2018-06-06

1

Entering edit mode

swbarnes2 ★ 1.3k

@swbarnes2-14086

Last seen 12 hours ago

San Diego

Well, just kind of eyeballing your PCA plot, it looks like the most significant thing going on in your data isn't differences caused by treatment. It's whatever is making those four or five samples separate from the rest. Batch effect?

So you should probably believe what DESeq2 is telling you.

ADD COMMENT • link 5.9 years ago swbarnes2 ★ 1.3k