Question

DESeq2: Extremely low pvalues

1

Entering edit mode

szenitha ▴ 20

@szenitha-10863

Last seen 7.6 years ago

Dear all,

I have a question regarding DESeq2 package. Please help. I am using this package to analyze a RNA seq dataset. The dataset has 3 different groups and my goal is to find out the DE genes between groups. I used the test 'LRT' to find out genes that are differentially expressed between all the groups. The problem is that I am getting extremly low p-values and do not understand why. I am getting p-values as low as 3.067392e-314.

Code:

dds.fc <- DESeqDataSetFromMatrix(dat0,colData=pheno,design=~ State) #condition
#relevel groups
dds.fc@colData$State<-factor(dds.fc@colData$State,levels=c("non-infected spleen","infected lung","infected spleen"))
dds<-DESeq(dds.fc,test = "LRT",reduced = ~1)

res<-results(dds,name =  "State_infected.lung_vs_non.infected.spleen")

#order with respect to p-value

temp<-res[with(res,order(res[,5])),]

#get result with pvalue below 0.05

sub <- subset(temp,temp$pvalue<0.05)

Result:

head(sub,25)
log2 fold change (MLE): State infected.lung vs non.infected.spleen 
LRT p-value: '~ State' vs '~ 1' 
DataFrame with 25 rows and 6 columns
          baseMean log2FoldChange     lfcSE      stat        pvalue          padj
         <numeric>      <numeric> <numeric> <numeric>     <numeric>     <numeric>
Arg1     2337.0486      14.430641 1.1909019  2463.520             0             0
Ccl12     817.8881      11.012091 0.7409714  1723.197             0             0
Ccl7      855.1171      12.086512 1.0304499  2447.651             0             0
Cd38      881.1564       6.760928 0.2047603  1766.141             0             0
Cd72      533.9823       7.131620 0.2767360  1654.121             0             0
...            ...            ...       ...       ...           ...           ...
Irg1     2944.5393       9.067204 0.2761566  1443.782 3.067392e-314 1.831233e-311
Htr7      393.6399       8.571863 0.4775459  1436.052 1.462933e-312 8.336721e-310
C3ar1    1772.5935       6.709288 0.2091927  1432.418 9.002270e-312 4.907020e-309
Rasgef1b  805.7408       4.032636 0.1431151  1407.784 2.011923e-306 1.050978e-303
Ets2     3110.7434       5.978289 0.1886460  1364.882 4.164787e-297 2.088558e-294

Session Info

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.8             data.table_1.9.6           DESeq2_1.12.3             
 [4] rgl_0.95.1441              corrplot_0.77              RColorBrewer_1.1-2        
 [7] ffpe_1.16.0                TTR_0.23-1                 RUVSeq_1.6.2              
[10] edgeR_3.14.0               limma_3.28.14              EDASeq_2.6.2              
[13] ShortRead_1.30.0           GenomicAlignments_1.8.4    Rsamtools_1.24.0          
[16] Biostrings_2.40.2          XVector_0.12.0             BiocParallel_1.6.2        
[19] RSkittleBrewer_1.1         zebrafishRNASeq_0.106.2    BiocInstaller_1.22.3      
[22] scales_0.4.0               ggplot2_2.1.0              SummarizedExperiment_1.2.3
[25] Biobase_2.32.0             GenomicRanges_1.24.2       GenomeInfoDb_1.8.2        
[28] IRanges_2.6.1              S4Vectors_0.10.1           BiocGenerics_0.18.0       
[31] sva_3.20.0                 genefilter_1.54.2          mgcv_1.8-12               
[34] nlme_3.1-128

deseq2 • 4.2k views

ADD COMMENT • link updated 10 months ago by James W. MacDonald 65k • written 7.8 years ago by szenitha ▴ 20

score 3 · Answer 1 · 2016-07-25

3

Entering edit mode

Michael Love 42k

@mikelove

Last seen 1 day ago

United States

Why don't you plot an example of one of these genes using plotCounts. I'd guess that these are all "off" in one group and have high expression in another. Note that you are testing that there are no differences across State, so if there are any large differences it makes sense that the probability of the data under the null is very small.

The p-value can be arbitrarily low when the null hypothesis is false. In fact it becomes not so interesting as a statistic when the null is clearly false, and other statistics become more interesting, such as the effect size: the log2 fold change between groups. Actually, we discuss just this topic in the DESeq2 paper.

ADD COMMENT • link 7.8 years ago Michael Love 42k

1

Entering edit mode

Thank you michael. Yes I took the plotCounts for some genes and found that they were expressed in one condition and off in the other. I also went through the paper. I hope that I have understood it correctly.
By default we check if there is no differences across condition(null hypothesis). But if there is a clear biological difference across condition then even genes with very small foldchange will be flagged as significant because the null hypothesis is clearly false.

In such a scenario I would need to focus on the magnitude of the fold change.So if I have to find out the significant DE genes between two conditions then I would need to specify a composite null hypothesis wherein the null hypothesis would be that genes with fold change > or < threshold is not significant. So if I am interested in finding DE genes between two groups "infected lung" and "non-infected spleen" that have a fold change > 2 or <-2 then is this correct?

res<- results(dds,contrast=c("State", "infected lung", "non-infected spleen"), altHypothesis="greaterAbs",lfcThreshold = 1)

res[which(res$padj<0.05),]

ADD REPLY • link 7.8 years ago szenitha ▴ 20

0

Entering edit mode

exactly.

one correction is that lfcThreshold=1 and altHypothesis="greaterAbs" corresponds to a fold change of > 2 or < 1/2.