Question: Interpretation of the log fold change for a time course/factorial design analysis in limma
0
2.0 years ago by
elinorjax0 wrote:

Dear All,

I am having difficulties interpreting the results from a factorial design experimental setup in limma. I sampled 12 individuals repeatedly, before and after treatment (3hrs, 6hrs, 12hrs, and 24hrs post stimulation) in a stimulated group and a control group. Half of the individuals for each treatment were females, and half males.

 Sample Group sex individual 1 Stimulated.0 F 1 2 Stimulated.3 F 1 3 Stimulated.6 F 1 4 Stimulated.12 F 1 5 Stimulated.24 F 1 6 Stimulated.0 M 2 7 Stimulated.3 M 2 8 Stimulated.6 M 2 9 Stimulated.12 M 2 10 Stimulated.24 M 2 … 51 Control.0 F 5 52 Control.3 F 5 53 Control.6 F 5 54 Control.12 F 5 55 Control.24 F 5 56 Control.0 M 6 57 Control.3 M 6 58 Control.6 M 6 59 Control.12 M 6 60 Control.24 M 6

In my PCA plots the samples clusters according to treatment as well as sex, and individual and hence I have included both fixed (sex) and random effects (individual) in my model for comparative expression. I followed the protocol “RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR” to convert my RNA-seq data to a format that can be analysed with limma.

Some of the questions I want to answer are:

Q1: Which genes respond differently over time in the stimulated group relative to the control?

Q2: What genes respond differently over time for the stimulant in females and males?

For both questions I would like to know what the foldchange of gene expression is between the groups of interest.

Q1:

As I have samples before and after treatment in both the control group and the stimulated group I used the setup from section 9.6 (Time Course Experiments) in the limma user guide to answer Which genes respond differently over time in the stimulated group relative to the control?

lev <- c("Control.0", "Control.3", "Control.6", "Control.12", "Control.24", "Stimulated.0", "Stimulated.3", "Stimulated.6", "Stimulated.12", "Stimulated.24")
f <- factor(y$samples$group, levels=lev)
sex <- colData$sex design <- model.matrix(~0+f+sex) colnames(design) <- gsub("f", "", colnames(design)) cont.dif <- makeContrasts(Dif3hr =(Stimulated.3-Stimulated.0)-(Control.3-Control.0), Dif6hr =(Stimulated.6-Stimulated.0)-(Control.6-Control.0), Dif12hr =(Stimulated.12-Stimulated.0)-(Control.12-Control.0), Dif24hr =(Stimulated.24-Stimulated.0)-(Control.24-Control.0), levels =colnames(design)) v <- voom(y, design, plot=TRUE) corfit <- duplicateCorrelation(v,design,block=y$samples$ind) vfit2 <- lmFit(v, design, block=y$samples$ind, correlation=corfit$consensus)
vfit2 <- contrasts.fit(vfit2, contrasts=cont.dif)
efit2 <- eBayes(vfit2)
results_limma_summarizeOverlaps <- summary(decideTests(efit2))

Q2:

For this analysis I used the setup from section 9.5.4 (Classic Interaction Models) in the limma used guide to find genes that respond differently to the stimulus in females vs males.

levg <- c("Control.0.F", "Control.3.F", "Control.6.F", "Control.12.F", "Control.24.F", "Stimulated.0.F", "Stimulated.3.F", "Stimulated.6.F", "Stimulated.12.F", "Stimulated.24.F", "Control.0.M", "Control.3.M", "Control.6.M", "Control.12.M", "Control.24.M", "Stimulated.0.M", "Stimulated.3.M", "Stimulated.6.M", "Stimulated.12.M", "Stimulated.24.M")
g <- factor(y$samples$group.sex, levels=levg)
design <- model.matrix(~0+g)
colnames(design) <- gsub("g", "", colnames(design))
cont.dif <- makeContrasts(Dif_F_M_3=(Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M),
Dif_F_M_6=(Stimulated.6.F-Stimulated.0.F)-(Stimulated.6.M-Stimulated.0.M),
Dif_F_M_12=(Stimulated.12.F-Stimulated.0.F)-(Stimulated.12.M-Stimulated.0.M),
Dif_F_M_24=(Stimulated.24.F-Stimulated.0.F)-(Stimulated.24.M-Stimulated.0.M),
levels =colnames(design))
v <- voom(y, design, plot=TRUE)
corfit <- duplicateCorrelation(v,design,block=y$samples$ind)
vfit2 <- lmFit(v, design, block=y$samples$ind, correlation=corfit\$consensus)
vfit2 <- contrasts.fit(vfit2, contrasts=cont.dif)
efit2 <- eBayes(vfit2)
results_limma_summarizeOverlaps <- summary(decideTests(efit2))

After finding out what genes responded differentially to the stimulant in females and males I also wanted to visualize the foldchanges for the sexes separately for the genes of interest, and therefore I additionally analysed what genes respond differentially to the treatment for females and males separately.

cont.dif <- makeContrasts(Dif3_F =(Stimulated.3.F-Stimulated.0.F)-(Control.3.F-Control.0.F),
Dif6_F =(Stimulated.6.F-Stimulated.0.F)-(Control.6.F-Control.0.F),
Dif12_F =(Stimulated.12.F-Stimulated.0.F)-(Control.12.F-Control.0.F),
Dif24_F =(Stimulated.24.F-Stimulated.0.F)-(Control.24.F-Control.0.F),
Dif3_M =(Stimulated.3.M-Stimulated.0.M)-(Control.3.M-Control.0.M),
Dif6_M =(Stimulated.6.M-Stimulated.0.M)-(Control.6.M-Control.0.M),
Dif12_M =(Stimulated.12.M-Stimulated.0.M)-(Control.12.M-Control.0.M),
Dif24_M =(Stimulated.24.M-Stimulated.0.M)-(Control.24.M-Control.0.M),
Dif_F_M_3 = (Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M),
Dif_F_M_6 = (Stimulated.6.F-Stimulated.0.F)-(Stimulated.6.M-Stimulated.0.M),
Dif_F_M_12 = (Stimulated.12.F-Stimulated.0.F)-(Stimulated.12.M-Stimulated.0.M),
Dif_F_M_24 = (Stimulated.24.F-Stimulated.0.F)-(Stimulated.24.M-Stimulated.0.M),
levels =colnames(design))

(I also tried “Dif_F_M_3_B = ((Stimulated.3.F-Stimulated.0.F)-(Control.3.F-Control.0.F))-((Stimulated.3.M-Stimulated.0.M)-(Control.3.M-Control.0.M))” which gave very similar result to “Dif_F_M_3 = (Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M)”).

However, when comparing the results (log foldchanges) from the analysis between the females and males and for males and females separately I realized something strange (?). Sometimes the log foldchange is namely very similar in females and males for a particular gene, but still this gene has a significant value and high log foldchange value in the model comparing the response between the sexes.

In the table below I have listed a couple of the genes with a p.adj value < 0.05 from the comparison between the sexes, and I added the foldchanges values from the analyses for each sex separately to show what I mean:

 Sex_Comparison Sex_Comparison Female Male logFC adj.P.Val logFC logFC GeneX1 3.77 0.03 -3.40 -1.66 GeneX2 3.71 0.03 -2.14 -2.02 GeneX3 3.29 0.05 -2.57 -1.84 GeneX4 3.26 0.03 -1.78 -2.07 GeneX5 -2.15 0.04 -0.24 3.03 GeneX6 -2.44 0.05 -0.76 3.30 GeneX7 -5.61 0.03 3.18 4.38 GeneX8 -6.89 0.04 1.33 4.74 GeneX9 0.91 0.05 -0.39 -0.73 GeneX10 -0.94 0.03 -0.14 1.48

For example, the log foldchange in GeneX2 is very similar in the males and females, but comes out as significant with a log foldchange of 3.7 when comparing the change in both sexes. This makes me wonder whether it is not the foldchange that is compared between the groups? Or could it be that when a gene is downregulated in both groups, the minuses used in makeContrast will become plus instead (-X)-(-X), hence leading to a high foldchange between the sexes even though the foldchange is very similar in both groups? Or are there other reason for my results?

I would very much appreciate if someone could give me a hint on what is happening and how I should interpret the data. At the moment I feel very confused and unsure about how to explain what the log foldchange value actually show from this analysis.

Best wishes,

Elinor

PS. Even though my question mainly covers Q2 in this post, the same issue (?) applies to Q1 as I am also comparing the change in two groups there.

modified 2.0 years ago by Aaron Lun25k • written 2.0 years ago by elinorjax0
Answer: Interpretation of the log fold change for a time course/factorial design analysi
0
2.0 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Obviously, they will be different. Your inter-sex comparison is:

(Stimulated.3.F-Stimulated.0.F)-(Stimulated.3.M-Stimulated.0.M)

... while your individual log-fold changes are calculated as:

(Stimulated.3.F-Stimulated.0.F)-(Control.3.F-Control.0.F) # Female
(Stimulated.3.M-Stimulated.0.M)-(Control.3.M-Control.0.M) # Male

As you can see, your inter-sex comparison doesn't use the controls, while your individual log-fold changes are calculated with the controls. If you calculate the individual log-fold changes using only the stimulated samples, you will find that the "Sex_Comparison" log-fold change will be equal to the difference between the Female and Male log-fold changes, as suggested by the algebra in the makeContrasts call. You don't show the result of your Dif_F_M_3_B contrast, so I won't comment on that.

Of course, I'm assuming that you combined your log-fold changes correctly across your two sets of contrasts. topTable returns a table sorted on the B-statistic by default, so you need to turn that off before you cbind different tables together.

P.S. If the stimulation response truly varies between sexes, then blocking on sex in your code for Q1 is not sufficient to model this effect. You will end up with inflated variances from a misspecified model, and probably some decrease in power. I don't really see a problem with using your Q2 model for Q1; if you believe that different sexes might respond differently to the stimulant over time (as per Q2), then you should be considering how that will affect your interpretation of the stimulation response in Q1 anyway. For example, a strong positive effect in Q1 might be driven by a very strong positive effect in males and a weak negative effect in females.

Dear Aaron,

Thank you very much for your answer, I really appreciate your feedback! Yes it makes sense that there will be a difference when including the control for one comparison and not the other. So would you say that it is ok to use the following makeContrast?

cont.dif <- makeContrasts(Dif3_F=(Stimulated.3.F-Stimulated.0.F)-(Control.3.F-Control.0.F),
Dif6_F=(Stimulated.6.F-Stimulated.0.F)-(Control.6.F-Control.0.F),
Dif12_F=(Stimulated.12.F-Stimulated.0.F)-(Control.12.F-Control.0.F),
Dif24_F=(Stimulated.24.F-Stimulated.0.F)-(Control.24.F-Control.0.F),
Dif3_M=(Stimulated.3.M-Stimulated.0.M)-(Control.3.M-Control.0.M),
Dif6_M=(Stimulated.6.M-Stimulated.0.M)-(Control.6.M-Control.0.M),
Dif12_M=(Stimulated.12.M-Stimulated.0.M)-(Control.12.M-Control.0.M),
Dif24_M=(Stimulated.24.M-Stimulated.0.M)-(Control.24.M-Control.0.M),
Dif_F_M_3B=((Stimulated.3.F-Stimulated.0.F)-(Control.3.F-Control.0.F))-((Stimulated.3.M-Stimulated.0.M)-(Control.3.M-Control.0.M)),
Dif_F_M_6B=((Stimulated.6.F-Stimulated.0.F)-(Control.6.F-Control.0.F))-((Stimulated.6.M-Stimulated.0.M)-(Control.6.M-Control.0.M)),
Dif_F_M_12B=((Stimulated.12.F-Stimulated.0.F)-(Control.12.F-Control.0.F))-((Stimulated.12.M-Stimulated.0.M)-(Control.12.M-Control.0.M))
Dif_F_M_24B=((Stimulated.24.F-Stimulated.0.F)-(Control.24.F-Control.0.F))-((Stimulated.24.M-Stimulated.0.M)-(Control.24.M-Control.0.M)),
levels=colnames(design))

Regarding your comment on Q1: Thank you for this feedback, I see your point. I assumed that including sex as a fixed factor would be enough and thought that having six samples per treatment/time point would increase the power of the analysis rather than the opposite. But in your opinion this is not a good idea then? I would be happy to exclude this model if you do not see this appropriate, and analyse the sexes separately from the beginning.

Best wishes,

Elinor

Regarding your contrast matrix: yes, this looks fine, provided that you are willing and able to interpret differences of differences of differences (i.e., log-fold changes). I would suggest reporting all the individual log-fold changes as well, e.g., for Dif_F_M_3B, you should also show, in the result table, log-fold changes for the following contrasts:

Stimulated.3.F-Stimulated.0.F
Control.3.F-Control.0.F
Stimulated.3.M-Stimulated.0.M
Control.3.M-Control.0.M


You don't need their test statistics, just the log-fold changes. This is necessary to figure out what a positive or negative effect in the reported "log-fold change" for Dif_F_M_3B actually means from a scientific/biological perspective.

Regarding the models: as it stands, the model that you use in Q1 is only properly specified if you think that there is no difference in the stimulation response between sexes. Now, normally, this might be a reasonable assumption/approximation. However, in Q2, your aim is to go and find differences in the stimulation response between sexes. If you should find any, you will violate your assumption for the model of Q1. Power considerations aside, you can avoid this logical inconsistency by just using the same design for both questions. In fact, explicitly considering sex in Q1 may give even better answers, by ensuring that differences in the time response between stimulated/control conditions are consistent between sexes.