Dear all,
I have a question regarding DESeq2 package. Please help. I am using this package to analyze a RNA seq dataset. The dataset has 3 different groups and my goal is to find out the DE genes between groups. I used the test 'LRT' to find out genes that are differentially expressed between all the groups. The problem is that I am getting extremly low p-values and do not understand why. I am getting p-values as low as 3.067392e-314.
Code:
dds.fc <- DESeqDataSetFromMatrix(dat0,colData=pheno,design=~ State) #condition #relevel groups dds.fc@colData$State<-factor(dds.fc@colData$State,levels=c("non-infected spleen","infected lung","infected spleen")) dds<-DESeq(dds.fc,test = "LRT",reduced = ~1) res<-results(dds,name = "State_infected.lung_vs_non.infected.spleen") #order with respect to p-value temp<-res[with(res,order(res[,5])),] #get result with pvalue below 0.05 sub <- subset(temp,temp$pvalue<0.05)
Result:
head(sub,25) log2 fold change (MLE): State infected.lung vs non.infected.spleen LRT p-value: '~ State' vs '~ 1' DataFrame with 25 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> Arg1 2337.0486 14.430641 1.1909019 2463.520 0 0 Ccl12 817.8881 11.012091 0.7409714 1723.197 0 0 Ccl7 855.1171 12.086512 1.0304499 2447.651 0 0 Cd38 881.1564 6.760928 0.2047603 1766.141 0 0 Cd72 533.9823 7.131620 0.2767360 1654.121 0 0 ... ... ... ... ... ... ... Irg1 2944.5393 9.067204 0.2761566 1443.782 3.067392e-314 1.831233e-311 Htr7 393.6399 8.571863 0.4775459 1436.052 1.462933e-312 8.336721e-310 C3ar1 1772.5935 6.709288 0.2091927 1432.418 9.002270e-312 4.907020e-309 Rasgef1b 805.7408 4.032636 0.1431151 1407.784 2.011923e-306 1.050978e-303 Ets2 3110.7434 5.978289 0.1886460 1364.882 4.164787e-297 2.088558e-294
Session Info
sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] pheatmap_1.0.8 data.table_1.9.6 DESeq2_1.12.3 [4] rgl_0.95.1441 corrplot_0.77 RColorBrewer_1.1-2 [7] ffpe_1.16.0 TTR_0.23-1 RUVSeq_1.6.2 [10] edgeR_3.14.0 limma_3.28.14 EDASeq_2.6.2 [13] ShortRead_1.30.0 GenomicAlignments_1.8.4 Rsamtools_1.24.0 [16] Biostrings_2.40.2 XVector_0.12.0 BiocParallel_1.6.2 [19] RSkittleBrewer_1.1 zebrafishRNASeq_0.106.2 BiocInstaller_1.22.3 [22] scales_0.4.0 ggplot2_2.1.0 SummarizedExperiment_1.2.3 [25] Biobase_2.32.0 GenomicRanges_1.24.2 GenomeInfoDb_1.8.2 [28] IRanges_2.6.1 S4Vectors_0.10.1 BiocGenerics_0.18.0 [31] sva_3.20.0 genefilter_1.54.2 mgcv_1.8-12 [34] nlme_3.1-128
Thank you michael. Yes I took the plotCounts for some genes and found that they were expressed in one condition and off in the other. I also went through the paper. I hope that I have understood it correctly.
By default we check if there is no differences across condition(null hypothesis). But if there is a clear biological difference across condition then even genes with very small foldchange will be flagged as significant because the null hypothesis is clearly false.
In such a scenario I would need to focus on the magnitude of the fold change.So if I have to find out the significant DE genes between two conditions then I would need to specify a composite null hypothesis wherein the null hypothesis would be that genes with fold change > or < threshold is not significant. So if I am interested in finding DE genes between two groups "infected lung" and "non-infected spleen" that have a fold change > 2 or <-2 then is this correct?
res<- results(dds,contrast=c("State", "infected lung", "non-infected spleen"), altHypothesis="greaterAbs",lfcThreshold = 1)
res[which(res$padj<0.05),]
exactly.
one correction is that lfcThreshold=1 and altHypothesis="greaterAbs" corresponds to a fold change of > 2 or < 1/2.
Yes. I understand now. Thank you for helping me out.