DESeq2: Extremely low pvalues
1
1
Entering edit mode
szenitha ▴ 20
@szenitha-10863
Last seen 8.1 years ago

Dear all,

I have a question regarding DESeq2 package. Please help. I am using this package to analyze a RNA seq dataset. The dataset has  3 different groups and my goal is to find out the DE genes between groups. I used the test 'LRT' to find out genes that are differentially expressed between all the groups. The problem is that I am getting extremly low p-values and do not understand why. I am getting p-values as low as 3.067392e-314.

Code:

dds.fc <- DESeqDataSetFromMatrix(dat0,colData=pheno,design=~ State) #condition
#relevel groups
dds.fc@colData$State<-factor(dds.fc@colData$State,levels=c("non-infected spleen","infected lung","infected spleen"))
dds<-DESeq(dds.fc,test = "LRT",reduced = ~1)

res<-results(dds,name =  "State_infected.lung_vs_non.infected.spleen")

#order with respect to p-value

temp<-res[with(res,order(res[,5])),]

#get result with pvalue below 0.05

sub <- subset(temp,temp$pvalue<0.05)

Result:

head(sub,25)
log2 fold change (MLE): State infected.lung vs non.infected.spleen 
LRT p-value: '~ State' vs '~ 1' 
DataFrame with 25 rows and 6 columns
          baseMean log2FoldChange     lfcSE      stat        pvalue          padj
         <numeric>      <numeric> <numeric> <numeric>     <numeric>     <numeric>
Arg1     2337.0486      14.430641 1.1909019  2463.520             0             0
Ccl12     817.8881      11.012091 0.7409714  1723.197             0             0
Ccl7      855.1171      12.086512 1.0304499  2447.651             0             0
Cd38      881.1564       6.760928 0.2047603  1766.141             0             0
Cd72      533.9823       7.131620 0.2767360  1654.121             0             0
...            ...            ...       ...       ...           ...           ...
Irg1     2944.5393       9.067204 0.2761566  1443.782 3.067392e-314 1.831233e-311
Htr7      393.6399       8.571863 0.4775459  1436.052 1.462933e-312 8.336721e-310
C3ar1    1772.5935       6.709288 0.2091927  1432.418 9.002270e-312 4.907020e-309
Rasgef1b  805.7408       4.032636 0.1431151  1407.784 2.011923e-306 1.050978e-303
Ets2     3110.7434       5.978289 0.1886460  1364.882 4.164787e-297 2.088558e-294

Session Info

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.8             data.table_1.9.6           DESeq2_1.12.3             
 [4] rgl_0.95.1441              corrplot_0.77              RColorBrewer_1.1-2        
 [7] ffpe_1.16.0                TTR_0.23-1                 RUVSeq_1.6.2              
[10] edgeR_3.14.0               limma_3.28.14              EDASeq_2.6.2              
[13] ShortRead_1.30.0           GenomicAlignments_1.8.4    Rsamtools_1.24.0          
[16] Biostrings_2.40.2          XVector_0.12.0             BiocParallel_1.6.2        
[19] RSkittleBrewer_1.1         zebrafishRNASeq_0.106.2    BiocInstaller_1.22.3      
[22] scales_0.4.0               ggplot2_2.1.0              SummarizedExperiment_1.2.3
[25] Biobase_2.32.0             GenomicRanges_1.24.2       GenomeInfoDb_1.8.2        
[28] IRanges_2.6.1              S4Vectors_0.10.1           BiocGenerics_0.18.0       
[31] sva_3.20.0                 genefilter_1.54.2          mgcv_1.8-12               
[34] nlme_3.1-128              

 

deseq2 • 4.6k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 4 days ago
United States

Why don't you plot an example of one of these genes using plotCounts. I'd guess that these are all "off" in one group and have high expression in another. Note that you are testing that there are no differences across State, so if there are any large differences it makes sense that the probability of the data under the null is very small.

The p-value can be arbitrarily low when the null hypothesis is false. In fact it becomes not so interesting as a statistic when the null is clearly false, and other statistics become more interesting, such as the effect size: the log2 fold change between groups. Actually, we discuss just this topic in the DESeq2 paper.

ADD COMMENT
1
Entering edit mode

Thank you michael. Yes I took the plotCounts for some genes and found that they were expressed in one condition and off in the other. I also went through the paper. I hope that I have understood it correctly.
By default we check if there is no differences across condition(null hypothesis). But if there is a clear biological difference across condition then even genes with very small foldchange will be flagged as significant because the null hypothesis is clearly false. 

In such a scenario I would need to focus on the magnitude of the fold change.So if I have to find out the significant DE genes between two conditions then I would need to specify a composite null hypothesis wherein the null hypothesis would be that genes with fold change > or < threshold is not significant.  So if I am interested in finding DE genes between two groups  "infected lung" and  "non-infected spleen" that have a fold change > 2 or <-2 then is this correct?

res<- results(dds,contrast=c("State", "infected lung", "non-infected spleen"), altHypothesis="greaterAbs",lfcThreshold = 1)

res[which(res$padj<0.05),]

ADD REPLY
0
Entering edit mode

exactly.

one correction is that lfcThreshold=1 and altHypothesis="greaterAbs" corresponds to a fold change of > 2 or < 1/2.

ADD REPLY
0
Entering edit mode

Yes. I understand now. Thank you for helping me out. 

ADD REPLY

Login before adding your answer.

Traffic: 552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6