Using Wald test after lfcShrink (dds, contrast ())
1
0
Entering edit mode
@genomeeditor-18326
Last seen 5.4 years ago

Hi,

I have RNA-seq data from 4 conditions (A, B, C and D), each condition has 4 biological replicates. I'm trying to compare different conditions: A vs B, C vs D, A vs C and B vs D. I need to account for batch effects so I include batch in my design (design =~ batch+condition) and I use "reduced =~batch" to account for batch effects.I then use the contrast function to do my comparisons. The problem is when I get my results for

resAvsB<-results(dds, contrast=c("condition", "A", "B"), test = "Wald", alpha = 0.1)

the significant genes (i.e. padj<0.05) are 100 genes, 70 of which have very low counts (mean counts <5). 

In prior versions of Deseq2 I believe the p values were calculated after shrinking and hence these low count genes were not deemed significant (rightfully so). I know these low count genes have high variation and thus shrinking seems appropriate to me but using lfcshrink function will only return the shrunken LFCs, but the output significance is the LRT pvalues not the wald test between my conditions of interest. Any ideas how I can get the wald test statistic after shrinking to avoid these low count genes from skewing my data. (betaPrior=TRUE can not be used with "reduced")

 

Here are my commands:

sample.files <- list.files(".","\\.counts")
sample.condition<-(rep(c('A','B','C','D'),each=4))
sample.batch<-c('1','1','2','1','1','2','1','2','2','1','2','1','2','2','2','1')

sample.table<-data.frame(sampleName=sample.files, fileName=sample.files, batch=sample.batch, condition=sample.condition)

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sample.table, directory=".", design=~batch+condition)

colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c('A','B','C','D'))

colData(ddsHTSeq)$batch<-factor(colData(ddsHTSeq)$batch, levels=c('1','2'))

dds <- DESeq(ddsHTSeq, full=design(ddsHTSeq), test="LRT", reduced = ~ batch)

resAvsB<-results(dds, contrast=c("condition", "A", "B"), test = "Wald", alpha = 0.1)

resAvsB<-resAvsB[order(resAvsB$padj),]
head(resAvsB)

log2 fold change (MLE): condition A vs B 
Wald test p-value: condition A vs B 
DataFrame with 6 rows and 6 columns
                  baseMean    log2FoldChange             lfcSE              stat                         pvalue
                 <numeric>         <numeric>         <numeric>         <numeric>                      <numeric>
Fbll1     1.17428447171314 -17.0469009393416  2.26925685441964 -7.51210728135105 0.0000000000000581830757191954
Slc22a29  1.20452478413844 -17.1064474703088  2.30610421435124 -7.41789870720185  0.000000000000118993046272735
Ear-ps10 0.520353920227651  30.4779917237484  4.23683675071066  7.19357235528045  0.000000000000631176794531819
HOTAIR_4 0.450959418951852 -30.4261449322599  4.24803134545744 -7.16241064576787  0.000000000000792705380200065
Cyp2f2    4972.09702191989 -1.20390527857106 0.170578612233484 -7.05777390733595    0.0000000000016919104161979
Abca12   0.393536742904633 -29.4679059060166  4.24019495123902 -6.94965826922788   0.00000000000366172113754251
                              padj
                         <numeric>
Fbll1    0.00000000145987155287033
Slc22a29 0.00000000149282726201459
Ear-ps10 0.00000000497244267364996
HOTAIR_4 0.00000000497244267364996
Cyp2f2    0.0000000084903448505643
Abca12   0.00000000918762450620792

#notice the baseMean for most of the top hits is 0-1 counts

#OR

resAvsB<-lfcShrink(dds, contrast=c("condition", "A", "B"), test = "Wald", alpha = 0.1)

resAvsB<-resAvsB[order(resAvsB$padj),]
head(resAvsB)

log2 fold change (MAP): condition A vs B 
LRT p-value: '~ batch + condition' vs '~ batch' 
DataFrame with 6 rows and 6 columns
                     baseMean       log2FoldChange             lfcSE             stat                         pvalue
                    <numeric>            <numeric>         <numeric>        <numeric>                      <numeric>
Mrgprb1      49.2561157295109   0.0887778772861696 0.162630765632405 73.4243837264061           7.88417712183387e-16
Inhba        44.6994040317682   -0.303556877218687 0.158827437979449 65.8612506573468 0.0000000000000328183131327727
Cyp2f2        4972.1571585663    0.134397608032975 0.135253702129119 61.8940021105774  0.000000000000231476366292174
Tnnt2        5387.45642663505    0.490515304615329 0.161930009303795 56.2213786531343   0.00000000000376792276396379
Cpa3         238.510239063982 -0.00834103545100407  0.16263440418453 51.5758174468965    0.0000000000368811885821633
mmu-mir-6236 2167.30015604407    0.478958075604286  0.16208980386688 50.8388011376263    0.0000000000529463493849958
                                    padj
                               <numeric>
Mrgprb1      0.0000000000128283445949359
Inhba         0.000000000266993386491672
Cyp2f2         0.00000000125545065197999
Tnnt2           0.0000000153269678231137
Cpa3             0.000000120018763884076
mmu-mir-6236     0.000000143581675140544

 

 

 

 

deseq2 • 643 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

The fact that the base mean over all samples is low is not necessarily a problem. If you have four groups, and 4 out of 16 samples have a count of ~10 and the rest have 0, that gives you a base mean of 2.5, although there is some evidence here of DE. It depends on the distribution of counts, but I'd guess that you have some samples in one group with moderate counts, and 0 across all the rest.

If you want to perform testing of the shrunken LFCs, we have new functionality for this, you would use:

res <- lfcShrink(dds, coef=..., type="apeglm", svalue=TRUE)

Where you can choose among two methods, type="apeglm" or type="ashr". We discuss these options in the "Extended section on shrinkage estimators" in the vignette. I'd recommend to obtain DESeq2 v1.20 or the latest, v1.22 as these have the fastest implementations of apeglm. The s-value is an aggregate probability of false estimated sign of LFC for a set of genes. Since the maximum value is around 0.5 instead of 1, I typically use s-value cutoffs of 0.01 or 0.005.

ADD COMMENT

Login before adding your answer.

Traffic: 901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6