Question

Deseq2: padj value changes with varying alpha

0

Entering edit mode

yoursbassanio ▴ 10

@yoursbassanio-12717

Last seen 5.2 years ago

Hi,

I was running for Deseq2 on the same set of genes and samples with varying alpha as shown below.

Why do adjust p-value changes for the same gene when alpha changes, even though the pvalue remains same?

I thought the alpha is the fdr cutoff and the "NA" means that it didn't cleared the FDR cutoff. Is my interpretation correct?

dds <- DESeq(dds, parallel=TRUE )

alpha <- 0.05

Con2vsCon1 <- results(dds,contrast=c(“Condition”,”Con2”,”Con1”),alpha=alpha)

write.table(Con2vsCon1, "Con2vsCon1_0.05.xls", sep="\t")

The result file:

baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
8650.108665	1.016239114	0.144542498	7.030728877	2.05E-12	1.55E-08
423.1219264	0.935612819	0.13608645	6.875135773	6.19E-12	2.34E-08
36.5910756	1.247356344	0.189428319	6.58484618	4.55E-11	1.15E-07
364.305668	0.815937662	0.125767764	6.487653407	8.72E-11	1.35E-07
2463.544709	0.835643838	0.128866592	6.48456536	8.90E-11	1.35E-07
11.02946987	-0.000463788	0.242314322	-0.001913993	0.998472856	0.998737002

alpha <- 0.01

Con2vsCon1 <- results(dds,contrast=c(“Condition”,”Con2”,”Con1”),alpha=alpha)

write.table(Con2vsCon1, "Con2vsCon1_0.01.xls", sep="\t")

baseMean	log2FoldChangen	lfcSEn	statn	pvaluen	padj
8650.108665	1.016239114	0.144542498	7.030728877	2.05E-12	6.56E-09
423.1219264	0.935612819	0.13608645	6.875135773	6.19E-12	9.89E-09
36.5910756	1.247356344	0.189428319	6.58484618	4.55E-11	4.85E-08
364.305668	0.815937662	0.125767764	6.487653407	8.72E-11	5.68E-08
2463.544709	0.835643838	0.128866592	6.48456536	8.90E-11	5.68E-08
11.02946987	-0.000463788	0.242314322	-0.001913993	0.998472856	NA

Session info

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
[9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] pheatmap_1.0.7             DESeq2_1.12.4             
[3] SummarizedExperiment_1.4.0 Biobase_2.32.0            
[5] GenomicRanges_1.26.1       GenomeInfoDb_1.8.7        
[7] IRanges_2.8.0              S4Vectors_0.12.0          
[9] BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
[1] Rcpp_0.12.8          RColorBrewer_1.1-2   plyr_1.8.4          
[4] XVector_0.12.1       tools_3.3.1          zlibbioc_1.18.0     
[7] rpart_4.1-10         RSQLite_1.0.0        annotate_1.50.0     
[10] tibble_1.2           gtable_0.2.0         lattice_0.20-34     
[13] Matrix_1.2-6         DBI_0.4-1            gridExtra_2.2.1     
[16] genefilter_1.56.0    cluster_2.0.5        locfit_1.5-9.1      
[19] grid_3.3.1           nnet_7.3-12          data.table_1.10.0   
[22] AnnotationDbi_1.36.0 XML_3.98-1.4         survival_2.39-4     
[25] BiocParallel_1.6.6   foreign_0.8-67       latticeExtra_0.6-28
[28] Formula_1.2-1        geneplotter_1.50.0   ggplot2_2.2.0       
[31] Hmisc_3.17-4         scales_0.4.1         splines_3.3.1       
[34] assertthat_0.1       colorspace_1.3-1     xtable_1.8-2        
[37] acepack_1.3-3.3      lazyeval_0.2.0       munsell_0.4.3

deseq2 counts rnaseq • 4.9k views

ADD COMMENT • link updated 8.1 years ago by Michael Love 43k • written 8.1 years ago by yoursbassanio ▴ 10

score 1 · Answer 1 · 2017-10-23

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 12 days ago

United States

What changes when you change alpha is the independent filtering. If you look in ?results for the explanation of the argument alpha:

   alpha: the significance cutoff used for optimizing the independent
          filtering (by default 0.1). If the adjusted p-value cutoff
          (FDR) will be a value other than 0.1, ‘alpha’ should be set
          to that value.

The recommendation is to set alpha to the value that will be used to threshold adjusted p-values, because that is what the independent filtering is trying to optimize (and likewise for IHW). For independent filtering, this chosen value affects the filter on mean of normalized counts for discarding genes with too little signal for testing. Removing more genes will lower the adjusted p-values for genes that survive the filter.

ADD COMMENT • link 8.1 years ago Michael Love 43k

0

Entering edit mode

Hi Mike,

Thank you for the reply

Sorry I didn't understand your answer completely. If I change my FDR from 5%(0,05) to 1% (0.01) how does it affect tthe adjusted p-value(q-value) of the same gene which had cleared in both occasion.

As show above p-value for the same gene was (2.05E-12) in both occasion but the q value changes.

Doesn't the adjust p-value remain same ? and only changes to NA if it doesn't satisfy the FDR cutoff?

ADD REPLY • link 8.1 years ago yoursbassanio ▴ 10

1

Entering edit mode

The 'alpha' that you specify to results() is passed along to a function that applies a procedure called independent filtering (IF), or also a new method called IHW if you specify that method when running results(). We have a citation for IF in the help page for the results() function, if you want to read more, and it's also discussed in the DESeq2 paper under a section on independent filtering. In the IF case, the procedure finds a threshold for mean of normalized counts, below which the signal is too low for sufficient power in statistical testing. Changing the number of tests (how many pass the filter) changes the adjusted p-value.

ADD REPLY • link 8.1 years ago Michael Love 43k