Hi,
I have RNA-seq data from 4 conditions (A, B, C and D), each condition has 4 biological replicates. I'm trying to compare different conditions: A vs B, C vs D, A vs C and B vs D. I need to account for batch effects so I include batch in my design (design =~ batch+condition) and I use "reduced =~batch" to account for batch effects.I then use the contrast function to do my comparisons. The problem is when I get my results for
resAvsB<-results(dds, contrast=c("condition", "A", "B"), test = "Wald", alpha = 0.1)
the significant genes (i.e. padj<0.05) are 100 genes, 70 of which have very low counts (mean counts <5).
In prior versions of Deseq2 I believe the p values were calculated after shrinking and hence these low count genes were not deemed significant (rightfully so). I know these low count genes have high variation and thus shrinking seems appropriate to me but using lfcshrink function will only return the shrunken LFCs, but the output significance is the LRT pvalues not the wald test between my conditions of interest. Any ideas how I can get the wald test statistic after shrinking to avoid these low count genes from skewing my data. (betaPrior=TRUE can not be used with "reduced")
Here are my commands:
sample.files <- list.files(".","\.counts")
sample.condition<-(rep(c('A','B','C','D'),each=4))
sample.batch<-c('1','1','2','1','1','2','1','2','2','1','2','1','2','2','2','1')
sample.table<-data.frame(sampleName=sample.files, fileName=sample.files, batch=sample.batch, condition=sample.condition)
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sample.table, directory=".", design=~batch+condition)
colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c('A','B','C','D'))
colData(ddsHTSeq)$batch<-factor(colData(ddsHTSeq)$batch, levels=c('1','2'))
dds <- DESeq(ddsHTSeq, full=design(ddsHTSeq), test="LRT", reduced = ~ batch)
resAvsB<-results(dds, contrast=c("condition", "A", "B"), test = "Wald", alpha = 0.1)
resAvsB<-resAvsB[order(resAvsB$padj),]
head(resAvsB)
log2 fold change (MLE): condition A vs B
Wald test p-value: condition A vs B
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
Fbll1 1.17428447171314 -17.0469009393416 2.26925685441964 -7.51210728135105 0.0000000000000581830757191954
Slc22a29 1.20452478413844 -17.1064474703088 2.30610421435124 -7.41789870720185 0.000000000000118993046272735
Ear-ps10 0.520353920227651 30.4779917237484 4.23683675071066 7.19357235528045 0.000000000000631176794531819
HOTAIR_4 0.450959418951852 -30.4261449322599 4.24803134545744 -7.16241064576787 0.000000000000792705380200065
Cyp2f2 4972.09702191989 -1.20390527857106 0.170578612233484 -7.05777390733595 0.0000000000016919104161979
Abca12 0.393536742904633 -29.4679059060166 4.24019495123902 -6.94965826922788 0.00000000000366172113754251
padj
<numeric>
Fbll1 0.00000000145987155287033
Slc22a29 0.00000000149282726201459
Ear-ps10 0.00000000497244267364996
HOTAIR_4 0.00000000497244267364996
Cyp2f2 0.0000000084903448505643
Abca12 0.00000000918762450620792
#notice the baseMean for most of the top hits is 0-1 counts
#OR
resAvsB<-lfcShrink(dds, contrast=c("condition", "A", "B"), test = "Wald", alpha = 0.1)
resAvsB<-resAvsB[order(resAvsB$padj),]
head(resAvsB)
log2 fold change (MAP): condition A vs B
LRT p-value: '~ batch + condition' vs '~ batch'
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
Mrgprb1 49.2561157295109 0.0887778772861696 0.162630765632405 73.4243837264061 7.88417712183387e-16
Inhba 44.6994040317682 -0.303556877218687 0.158827437979449 65.8612506573468 0.0000000000000328183131327727
Cyp2f2 4972.1571585663 0.134397608032975 0.135253702129119 61.8940021105774 0.000000000000231476366292174
Tnnt2 5387.45642663505 0.490515304615329 0.161930009303795 56.2213786531343 0.00000000000376792276396379
Cpa3 238.510239063982 -0.00834103545100407 0.16263440418453 51.5758174468965 0.0000000000368811885821633
mmu-mir-6236 2167.30015604407 0.478958075604286 0.16208980386688 50.8388011376263 0.0000000000529463493849958
padj
<numeric>
Mrgprb1 0.0000000000128283445949359
Inhba 0.000000000266993386491672
Cyp2f2 0.00000000125545065197999
Tnnt2 0.0000000153269678231137
Cpa3 0.000000120018763884076
mmu-mir-6236 0.000000143581675140544