Question

EdgeR design and contrasts questions

0

Entering edit mode

jjh • 0

@jjh-11698

Last seen 7.5 years ago

EDIT: Sorry, I had completely left out that for each of the groups there are 3 biological replicates!

Hello,

I am not from a bioinformatics background, so apologies in advance if any of my questions have obvious answers or demonstrate a lack of understanding. I have been tasked with the analysis of RNAseq data with 4 experimental groups, with two factors (Treatment and knockdown).

~~Sample~~ Group	Treatment	Knockdown
S1, 2, 3	Control	Control
S4, 5, 6	Control	Knockdown
S7, 8, 9	Treat	Control
S10, 11, 12	Treat	Knockdown

I am familiar enough with EdgeR to perform basic pairwise comparisons between these samples using an additive model. However I have been told I need to perform two way ANOVA, with the goal of finding the combinatorial effect of the treatment and knockdown.

Questions regarding ANOVA :

I have read through the EdgeR manual, and I understand the "ANOVA like" method is not the same as a typical ANOVA test on normally distributed data. However is there a way to specify for it to be one-way or two-way? I have not been able to find any clarification on whether this needs to be set.
Additionally, I understand that the test will detect any differences between the 4 biological groups, but lacks post-hoc analysis to find the specific inter-sample differences. Is there any better way to determine the significance between pairs samples after performing the ANOVA analysis, or am I left with using pairwise comparisons?
The owner of the data had discussed the analysis with a 3rd individual, and from was discussed apparently by setting the design so that the baseline in the control+control and the final group is the Treatment+Knockdown the ANOVA like test will provide statistics that represent the combinatorial effect. From what I've read this does seem correct to me, and I believe it is due to miscommunication, but would anyone be able to offer input as to whether this can be done?

Regarding the aim of combinatorial effect:

From my understanding I think the best option to display a combinatorial effect would be to perform an ANOVA like analysis and on top of this an interaction design. Would this be the correct approach? Further to this, would a nested interaction, or full interaction formula be more appropriate? (I ask this because I only get 150 genes using the full interaction)

Nested interaction:

design.NestedInteraction <- model.matrix(~Treatment+Treatment:Knockdown)
# '(Intercept)'  'Knockdown'  'ControlKnockdown:Treatment'  'Knockdown:Treatment'
fit.NestedInteraction <- glmFit(set_d, design.NestedInteraction) 
lrt.NestedInteraction <- glmLRT(fit.NestedInteraction, coef=3:4)

Full interaction:

design.FullInteraction <- model.matrix(~Treatment+siRNA+Treatment:siRNA
# '(Intercept)'  'Treatment'  'Knockdown'  'Treatment:Knockdown'
fit.FullInteraction <- glmFit(set_d, design.FullInteraction)
lrt.FullInteraction <- glmLRT(fit.FullInteraction, coef=4)

My understanding of interactions may be poor and I am misunderstanding something in this analysis. If this is completely the wrong approach I would greatly appreciate any advice anyone could offer!

Thank you all for the help!

rnaseq edger design and contrast matrix • 1.1k views

ADD COMMENT • link 7.5 years ago jjh • 0

score 3 · Accepted Answer · 2016-10-18

I assume that the "combinatorial effect" refers to the interaction between the knockdown and the treatment. If that's true, then your experiment doesn't have enough samples to model this effect and estimate the dispersion. This is because the interaction model would have four coefficients, such that you'd have no residual degrees of freedom for dispersion estimation. The only way around this would be to fit an additive model for dispersion estimation and then fit an interaction model during hypothesis testing. This will allow testing to be performed, but it really just sweeps the problem under the carpet - you still don't have any replicates for all of your treatment/knockdown combinations, so how do you know the results are reproducible?

Anyway, setting that issue aside for now, the advice you've gotten seems unnecessarily confusing. If you must handle each treatment/knockdown combination separately rather than with additive treatment and knockdown effects, then the easiest way to model your experimental design would be with a one-way layout:

grouping <- factor(c("con_con", "con_kd", "tr_con", "tr_kd"))
design <- model.matrix(~0 + grouping)
colnames(design) <- levels(grouping)

Each coefficient now represents one of the groups (i.e., combinations), which means that you can do pairwise comparisons between groups, ANODEV comparisons for differences across all four groups, etc. If you want the interaction effect, you can get it with:

con <- makeContrasts((tr_kd - tr_con) - (con_kd - con_con), levels=design)

This will identify genes where the KD log-fold change in treated samples is different from the KD log-fold change in control samples, i.e., where there is an interaction between the treatment effect and the knockdown effect.