Question: edgeR glmLRT and interaction effect: an example
0
gravatar for David Rengel
3.7 years ago by
David Rengel70
European Union
David Rengel70 wrote:

Hi,

I am analyzing some RNA-Seq data involving three genotypes and 5 time points.

I am particularly interested in those genes for whom time does not present the same effect in all genotypes, that is, genes with a significant interaction genotype:time.

Please find here below a small code summarizing what I have attempted (for info, genotypes are "Col", "KO" and "TAP")

I find the results rather puzzling cause:

1-Only a few genes are significant for the interaction Col:KO or Col:TAP

2-IF I have a closer look at certain genes, I cannot see (a) why either the interaction effect has not been called as significant (e.g. the upper gene in the attached file) or (b) why it is significant in Col (blue line) vs KO (red), but not vs TAP (green) on the bottom gene.

I would be very grateful if you could provide me with any feedback on this matter.

Best,

David Rengel

 

library(edgeR)
dge <- DGEList(counts=counts.F, group=design.factor)
dge <- calcNormFactors(dge)

design=as.data.frame(cbind(rep(c("Col","KO","TAP"),each=15),rep(rep(c("0h","1h","2h","4h","6h"),each=3),3)))
rownames(design)=colnames(counts.F)
colnames(design)=c("gtype","time")
design=model.matrix(~gtype*time,data=design)
rownames(design)=colnames(counts.F)

dge <- estimateGLMCommonDisp(dge,design,verbose=TRUE)
dge <- estimateGLMTrendedDisp(dge,design)
dge <- estimateGLMTagwiseDisp(dge,design)

fit <- glmFit (dge,design)

my.contrasts=as.list(c(1:5))
names(my.contrasts) <- c("KO","TAP","t","it.KO","it.TAP")
my.contrasts[[1]] <- 2 # KO genotype vs Col genotype at t=0
my.contrasts[[2]] <- 3 # TAP genotype vs Col genotype at t=0
my.contrasts[[3]] <- c(4:7) # Time effect, at any time point, on Col genotype
my.contrasts[[4]] <- c(8,10,12,14) # The interaction effect of time in KO gtype, as compared to time in Col genotype.
my.contrasts[[5]] <- c(9,11,13,15) # The interaction effect of time in TAP gtype, as compared to time in Col genotype.

lrt=my.contrasts
for (i in 1:length(my.contrasts))
  lrt[[i]] <- glmLRT(fit, coef=my.contrasts[[i]])

 

 

edger interactions rna-seq glmlrt • 2.1k views
ADD COMMENTlink modified 3.7 years ago by Gordon Smyth39k • written 3.7 years ago by David Rengel70

Hi again,

I guess I did not properly attach the image. Here is the link: http://imgur.com/HIv3sbH

ADD REPLYlink written 3.7 years ago by David Rengel70

For your first question, I don't understand what you mean by a Col:KO or Col:TAP interaction. Col, KO and TAP are levels of the same factor, so you shouldn't get any interaction terms between them, because each sample should only be one or the other. For your second question, you didn't specify which contrast was significant or not significant using for each picture, e.g., for the first picture, is it Col, TAP or KO that isn't significant? Or all of them? What is the "Col vs KO" you refer to in the second picture?

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Aaron Lun25k

Hi Aaron,

Thanks for your post. I was not clear enoguh: I did not mean interaction between two levels of the same factor, I meant the interaction of time: genotype: i.e. does time play any different role in KO or in TAP, as compared to the role played in Col? Concerning the first picture, none of the effects are significant. What I meant in the second picture is that my test considered the blu and the curves significantly different, but not the blue and the green.

ADD REPLYlink written 3.7 years ago by David Rengel70

Also, if you want to add to your original post, try to use "add comment" rather than adding an answer.

ADD REPLYlink written 3.7 years ago by Aaron Lun25k
Answer: edgeR glmLRT and interaction effect: an example
3
gravatar for Gordon Smyth
3.7 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Well, I can see a few issues.

First, you are plotting cpm values on the unlogged scale, whereas the linear model is being fitted on the log-scale. The interaction for the the top gene AT1G01120 would be far less apparent on the log-scale.

Second, it is impossible to detect an interaction between two treatments if the gene is not at all expressed in one of the treatments. This is because the time effects are indeterminate for that treatment.

Third, I think you may be attributing interpretations to parameters in the linear model that they don't have. In particular, you should not ascribe any meaning to the "main effects" in an interaction model.

As I have explained many times on this support site, I think that classical factorial models are actively misleading and pretty much useless for analyzing gene expression experiments. I think it would be more meaningful to treat your experiment as having 15 groups. Setup a single factor to distinguish the 15 groups, then form contrasts between the groups to test the hypotheses that are of interest to you. This approach is statistically equivalent to the factorial model, but has the advantage that each contrast you form has an unambiguous meaning. For example, if you want to test expression is higher in Col vs KO overall, then you would form the contrast:

(Col0+Col1+Col2+Col4+Col6-KO0-KO1-KO2-KO4-KO6)/5

This would give the average log2FC for Col vs KO, averaged over the 5 times. See also Section 3.3.1 of the edgeR User's Guide.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Gordon Smyth39k

Hi Gordon,

Thanks for your helpful reply. Actually I had already done the analysis with a single, 15-level factor. But, at some point, I thought that a complete time*genotype model would be easier to handle. Your answer tells me that this might not be the case at all and that I should stick to my single factor. What I did not think about either is the fact that if the gene is not expressed at all in one treatment, the time effect is considered as indeterminate. And I thank you for your example on the "overall" contrast, this will be helpful.

Best,

ADD REPLYlink written 3.7 years ago by David Rengel70

The factorial model does makes it easier to setup the overall F-test for interaction -- that is its main advantage. But I have found that we don't often report this statistic when the biological results are finally published. For everything else the one-way model is clearer.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Gordon Smyth39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 210 users visited in the last hour