Question

Interaction at any time.

0

Entering edit mode

dsperley • 0

@dsperley-7315

Last seen 6.6 years ago

United States

I have set up my model matrix following the example 3.3.4 “interaction at any time” in the edgeR user guide. My experimental design has 2 strains: KO and WT, and 3 time points, 0h, 12h, and 36h. I’m interested in genes that respond differently to infection between the KO and WT at 12h or 36h. When I tested for an interaction at any time (coef =5:6), I found 424 genes that were significant at a FDR of 5%. Then, I wanted separate lists of DE genes for each timepoint, and tested coef=5 (StrainKO.Time12h)and coef=6 (StrainKO.Time36h) separately. For coefficient 5, I found 9 DE genes, and coefficient 6 I found 39 DE genes. I was expecting the number of DE genes for both coeff 5,6 to slightly less than the sum of the individual tests (because some genes might be significantly different at both timepoints), however, that does not appear to be the case. Why does testing multiple coefficients together result in many significant genes than the sum of significant genes obtained by testing coefficients separately?

#setting up design matrix

Strain<-c(rep("WT",6),rep("KO",4))

Time<-c(rep(c("0h","12h","36h"),each=2),"0h","12h",rep("36h",2))

samples<-rownames(RNA_data$samples)

targets<-data.frame(samples,Strain,Time)

targets$Strain <- relevel(targets$Strain, ref="WT")

design<-model.matrix(~Strain*Time,data=targets)


fit<-glmFit(RNA_data_filtered,design)


lrt_12h_36h<-glmLRT(fit,coef=5:6])

lrt_12h<-glmLRT(fit,coef=5)

lrt_36h<-glmLRT(fit,coef=6)


lrts<-list(lrt_12h_36h=lrt_12h_36h,lrt_12h=lrt_12h,lrt_36h=lrt_36h)


#extract the results table from the lrt object

Results<-lapply(lrts,topTags,n=Inf)


#subset significant genes

Sig_genes<-lapply(Results,function(x) x$table[x$table$FDR<=0.05,])

edger • 804 views

ADD COMMENT • link updated 8.8 years ago by Aaron Lun ★ 28k • written 8.8 years ago by dsperley • 0

score 2 · Answer 1 · 2015-07-26

The difference in the numbers of DE genes between your comparisons is probably due to the fact that you can get more evidence when you use information from multiple time points, especially if your DE is weak. Consider a gene where there is weak interaction effect for both time points. Assume that the power of the experiment is such that the effect does not get detected as significant in the DE comparison at each time point. However, if you test for any differences at either time point with coef=5:6, you now have enough evidence to reject the null hypothesis for this gene (i.e., the fit of the null model is now sufficiently poor to get a low p-value).

In addition, the nature of the FDR calculation tends to amplify differences in the DE gene numbers. As the number of strongly significant genes increases, the softer the correction penalty becomes for less significant genes. This results in more genes passing the 5% threshold and being reported as significantly DE.