Question

How to correctly adjust for gender in edgeR

0

Entering edit mode

Guillermo • 0

@860296bd

Last seen 7 months ago

Spain

Hi, I'm interested in performing some DE analysis using edgeR package. For this purpose, I've been following the manual and some other related older contributions made here (design matrix for 4 groups in edgeR , edgeR effects of design on testing main effects and interactions, Different results in edgeR using simple vs GLM).

For my experimental design, I have samples distributed in three groups: Control, Risk and Disease samples. I also have information about the gender of the samples.

My question is, how should I adjust for gender effect on the gene expression?

Option 1 - Additive linear model

My first thought was trying a model that would use a design matrix like this:

mod_matrix1 <- model.matrix(~0+group+gender)

Option 2 - One-way layout

But then I thought that maybe a one-way layout model could be more useful:

group_gender <- paste(group, gender, sep='_')
mod_matrix2 <- model.matrix(~0+group_gender)

For this second approach, I was thinking of testing for the average, using something like this (an example):

makeContrasts( (Control_Male + Control_Female)/2 - (Risk_Male + Risk_Female)/2, 
               levels=mod_matrix2)

Option 3 - Interaction full model

Also, another option I was originally thinking of using was:

mod_matrix3 <- model.matrix(~0+group*gender)

Option 4 - Interaction model

And also this one:

mod_matrix4 <- model.matrix(~0+gender + gender:group)

Question

My biggest concern is that I'm not sure which way is better for adjusting for gender. As I understand (which might be obviously wrong), Option 1 let me adjust for gender assumming its effect in gene expression is the same in all the groups (Control, Risk and Disease), which might be, in fact, wrong. I think maybe Option 2 is more accurate, since gender might be affecting each group in a different way. And, about Options 3 and 4, most contributions usually state that they are the most difficult ones to interpret and might be of use only in specific cases.

Can you help me understand which way is better for adjusting my model for gender specific effect?

Also, I was trying to perform a more complex model, with two factors (Treatment-4 levels, and GlucemicControl-2 levels) and multiple "covariates" (Gender-2 levels, Age, BMI). For this case, I was thinking of using a one-way layout merging the two factors, but depending on the answers to the previous question maybe it is more interesting using a one-way layout merging Treatment, GlucemicControl and Gender.

Thank you in advance!

design glm edgeR • 882 views

ADD COMMENT • link updated 10 months ago by Gordon Smyth 52k • written 10 months ago by Guillermo • 0

score 2 · Accepted Answer · 2024-05-15

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Options 2 to 4 are all equivalent. They are just different ways of parametrizing the same model, depending on what hypotheses are most of interest. They all give exactly the same results. The only difference between them is the ease with which particular contrasts are extracted.

You already know the difference between options 1 and 2. Option 1 adjusts for baseline differences between males and females but assumes the relative disease vs control effects are the same for both genders. Option 2 allows you to check for gender-specific disease effects.

Which model is right for you depends on what hypotheses you want to test and the scientific background to your experiment. You ask "which way is better for adjusting my model for gender specific effect?" Obviously that would be option 2 or, even more directly, option 4.

You propose a contrast using averages, but I don't see the point of that if you're interested in gender-specific effects.

ADD COMMENT • link 10 months ago Gordon Smyth 52k

0

Entering edit mode

Hi, Gordon!

Thanks for your impressive fast response.

When I say "which way is better for adjusting my model for gender specific effect?", I mean "correcting" instead of "adjusting". My bad here, English is not my first language.

We are not interested in searching gender-specific effects. We want to study differentially expressed genes between the groups of interest (Control, Risk, Disease), regardless the gender. As if gender was a batch effect, we want to correct its effect in gene expression.