I am having difficulties interpreting the results from a factorial design experimental setup in limma. I sampled 12 individuals repeatedly, before and after treatment (3hrs, 6hrs, 12hrs, and 24hrs post stimulation) in a stimulated group and a control group. Half of the individuals for each treatment were females, and half males.
In my PCA plots the samples clusters according to treatment as well as sex, and individual and hence I have included both fixed (sex) and random effects (individual) in my model for comparative expression. I followed the protocol “RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR” to convert my RNA-seq data to a format that can be analysed with limma.
Some of the questions I want to answer are:
Q1: Which genes respond differently over time in the stimulated group relative to the control?
Q2: What genes respond differently over time for the stimulant in females and males?
For both questions I would like to know what the foldchange of gene expression is between the groups of interest.
As I have samples before and after treatment in both the control group and the stimulated group I used the setup from section 9.6 (Time Course Experiments) in the limma user guide to answer Which genes respond differently over time in the stimulated group relative to the control?
lev <- c("Control.0", "Control.3", "Control.6", "Control.12", "Control.24", "Stimulated.0", "Stimulated.3", "Stimulated.6", "Stimulated.12", "Stimulated.24") f <- factor(y$samples$group, levels=lev) sex <- colData$sex design <- model.matrix(~0+f+sex) colnames(design) <- gsub("f", "", colnames(design)) cont.dif <- makeContrasts(Dif3hr =(Stimulated.3-Stimulated.0)-(Control.3-Control.0), Dif6hr =(Stimulated.6-Stimulated.0)-(Control.6-Control.0), Dif12hr =(Stimulated.12-Stimulated.0)-(Control.12-Control.0), Dif24hr =(Stimulated.24-Stimulated.0)-(Control.24-Control.0), levels =colnames(design)) v <- voom(y, design, plot=TRUE) corfit <- duplicateCorrelation(v,design,block=y$samples$ind) vfit2 <- lmFit(v, design, block=y$samples$ind, correlation=corfit$consensus) vfit2 <- contrasts.fit(vfit2, contrasts=cont.dif) efit2 <- eBayes(vfit2) results_limma_summarizeOverlaps <- summary(decideTests(efit2))
For this analysis I used the setup from section 9.5.4 (Classic Interaction Models) in the limma used guide to find genes that respond differently to the stimulus in females vs males.
levg <- c("Control.0.F", "Control.3.F", "Control.6.F", "Control.12.F", "Control.24.F", "Stimulated.0.F", "Stimulated.3.F", "Stimulated.6.F", "Stimulated.12.F", "Stimulated.24.F", "Control.0.M", "Control.3.M", "Control.6.M", "Control.12.M", "Control.24.M", "Stimulated.0.M", "Stimulated.3.M", "Stimulated.6.M", "Stimulated.12.M", "Stimulated.24.M") g <- factor(y$samples$group.sex, levels=levg) design <- model.matrix(~0+g) colnames(design) <- gsub("g", "", colnames(design)) cont.dif <- makeContrasts(Dif_F_M_3=(Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M), Dif_F_M_6=(Stimulated.6.F-Stimulated.0.F)-(Stimulated.6.M-Stimulated.0.M), Dif_F_M_12=(Stimulated.12.F-Stimulated.0.F)-(Stimulated.12.M-Stimulated.0.M), Dif_F_M_24=(Stimulated.24.F-Stimulated.0.F)-(Stimulated.24.M-Stimulated.0.M), levels =colnames(design)) v <- voom(y, design, plot=TRUE) corfit <- duplicateCorrelation(v,design,block=y$samples$ind) vfit2 <- lmFit(v, design, block=y$samples$ind, correlation=corfit$consensus) vfit2 <- contrasts.fit(vfit2, contrasts=cont.dif) efit2 <- eBayes(vfit2) results_limma_summarizeOverlaps <- summary(decideTests(efit2))
After finding out what genes responded differentially to the stimulant in females and males I also wanted to visualize the foldchanges for the sexes separately for the genes of interest, and therefore I additionally analysed what genes respond differentially to the treatment for females and males separately.
cont.dif <- makeContrasts(Dif3_F =(Stimulated.3.F-Stimulated.0.F)-(Control.3.F-Control.0.F), Dif6_F =(Stimulated.6.F-Stimulated.0.F)-(Control.6.F-Control.0.F), Dif12_F =(Stimulated.12.F-Stimulated.0.F)-(Control.12.F-Control.0.F), Dif24_F =(Stimulated.24.F-Stimulated.0.F)-(Control.24.F-Control.0.F), Dif3_M =(Stimulated.3.M-Stimulated.0.M)-(Control.3.M-Control.0.M), Dif6_M =(Stimulated.6.M-Stimulated.0.M)-(Control.6.M-Control.0.M), Dif12_M =(Stimulated.12.M-Stimulated.0.M)-(Control.12.M-Control.0.M), Dif24_M =(Stimulated.24.M-Stimulated.0.M)-(Control.24.M-Control.0.M), Dif_F_M_3 = (Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M), Dif_F_M_6 = (Stimulated.6.F-Stimulated.0.F)-(Stimulated.6.M-Stimulated.0.M), Dif_F_M_12 = (Stimulated.12.F-Stimulated.0.F)-(Stimulated.12.M-Stimulated.0.M), Dif_F_M_24 = (Stimulated.24.F-Stimulated.0.F)-(Stimulated.24.M-Stimulated.0.M), levels =colnames(design))
(I also tried “Dif_F_M_3_B = ((Stimulated.3.F-Stimulated.0.F)-(Control.3.F-Control.0.F))-((Stimulated.3.M-Stimulated.0.M)-(Control.3.M-Control.0.M))” which gave very similar result to “Dif_F_M_3 = (Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M)”).
However, when comparing the results (log foldchanges) from the analysis between the females and males and for males and females separately I realized something strange (?). Sometimes the log foldchange is namely very similar in females and males for a particular gene, but still this gene has a significant value and high log foldchange value in the model comparing the response between the sexes.
In the table below I have listed a couple of the genes with a p.adj value < 0.05 from the comparison between the sexes, and I added the foldchanges values from the analyses for each sex separately to show what I mean:
For example, the log foldchange in GeneX2 is very similar in the males and females, but comes out as significant with a log foldchange of 3.7 when comparing the change in both sexes. This makes me wonder whether it is not the foldchange that is compared between the groups? Or could it be that when a gene is downregulated in both groups, the minuses used in makeContrast will become plus instead (-X)-(-X), hence leading to a high foldchange between the sexes even though the foldchange is very similar in both groups? Or are there other reason for my results?
I would very much appreciate if someone could give me a hint on what is happening and how I should interpret the data. At the moment I feel very confused and unsure about how to explain what the log foldchange value actually show from this analysis.
PS. Even though my question mainly covers Q2 in this post, the same issue (?) applies to Q1 as I am also comparing the change in two groups there.