Hi, I'm interested in performing some DE analysis using edgeR package. For this purpose, I've been following the manual and some other related older contributions made here (design matrix for 4 groups in edgeR , edgeR effects of design on testing main effects and interactions, Different results in edgeR using simple vs GLM).
For my experimental design, I have samples distributed in three groups: Control, Risk and Disease samples. I also have information about the gender of the samples.
My question is, how should I adjust for gender effect on the gene expression?
Option 1 - Additive linear model
My first thought was trying a model that would use a design matrix like this:
mod_matrix1 <- model.matrix(~0+group+gender)
Option 2 - One-way layout
But then I thought that maybe a one-way layout model could be more useful:
group_gender <- paste(group, gender, sep='_')
mod_matrix2 <- model.matrix(~0+group_gender)
For this second approach, I was thinking of testing for the average, using something like this (an example):
makeContrasts( (Control_Male + Control_Female)/2 - (Risk_Male + Risk_Female)/2,
levels=mod_matrix2)
Option 3 - Interaction full model
Also, another option I was originally thinking of using was:
mod_matrix3 <- model.matrix(~0+group*gender)
Option 4 - Interaction model
And also this one:
mod_matrix4 <- model.matrix(~0+gender + gender:group)
Question
My biggest concern is that I'm not sure which way is better for adjusting for gender. As I understand (which might be obviously wrong), Option 1 let me adjust for gender assumming its effect in gene expression is the same in all the groups (Control, Risk and Disease), which might be, in fact, wrong. I think maybe Option 2 is more accurate, since gender might be affecting each group in a different way. And, about Options 3 and 4, most contributions usually state that they are the most difficult ones to interpret and might be of use only in specific cases.
Can you help me understand which way is better for adjusting my model for gender specific effect?
Also, I was trying to perform a more complex model, with two factors (Treatment-4 levels, and GlucemicControl-2 levels) and multiple "covariates" (Gender-2 levels, Age, BMI). For this case, I was thinking of using a one-way layout merging the two factors, but depending on the answers to the previous question maybe it is more interesting using a one-way layout merging Treatment, GlucemicControl and Gender.
Thank you in advance!
Hi, Gordon!
Thanks for your impressive fast response.
When I say "which way is better for adjusting my model for gender specific effect?", I mean "correcting" instead of "adjusting". My bad here, English is not my first language.
We are not interested in searching gender-specific effects. We want to study differentially expressed genes between the groups of interest (Control, Risk, Disease), regardless the gender. As if gender was a batch effect, we want to correct its effect in gene expression.
Approach 1 then.