Question

LIMMA paired analysis adjusting for two continuous variables - microarray data

0

Entering edit mode

danielle.newby • 0

@daniellenewby-12503

Last seen 8.5 years ago

Hi,

I am analysing a dataset where i have subjects who were tested at baseline and then after 2 weeks of calorie restriction. I would like to see the genes DE between baseline and calorie restriction taking into account the persons age and BMI. As there is a before and after condition is only 2 weeks the age and BMI do not change. I am not sure how to adjust for this in my model.

An example of my dataset is:

SubjectID	bmi	Age	Class
1	26.41	73	T0
1	26.41	73	T2weeks
14	23.31	80	T0
14	23.31	80	T2weeks
16	27.71	85	T0
16	27.71	85	T2weeks
17	25.67	76	T0
17	25.67	76	T2weeks
18	20.99	70	T0
18	20.99	70	T2weeks

For creating the design of limma i was thinking of the following design:

design <- model.matrix(~class:ID + age + BMI)

From this i run a normal limma analysis and extract the results from the class:ID term as this tells me the difference between baseline and after 2 weeks of CR for my genes. Is this correct?

Any input would be greatly appreciated!

Thanks

Danielle

limma limma design matrix limma paired analysis microarray • 3.1k views

ADD COMMENT • link 8.8 years ago danielle.newby • 0

0

Entering edit mode

Hi Gavin,

Thanks for your answer.

I was originally using duplicate correlation to take into account subject correlation but the concern i have is that BMI and age will be the same regardless of being baseline or after 2weeks and that this is not the correct thing to do and i need some interaction term.

Thanks

Danielle

ADD REPLY • link 8.8 years ago danielle.newby • 0

0

Entering edit mode

Don't think there's an issue with the continuous variables being constant within the subjects - does everything work ok if you have a design with of ~bmi + age + class, with a blocking factor of id?

I suppose you could add age:class (and/or bmi:class) interaction if you're worried that a gene's age-gradient might differ between CR states, but it's hard for me to conceptualise what a class:id interaction would add, other than looking at a subject-specific effect of class.

ADD REPLY • link 8.8 years ago Gavin Kelly ▴ 690

score 2 · Answer 1 · 2017-03-10

It sounds like you basically want to treat age and BMI as confounding factors and control for their influence, and you're not interested in the genes that are differentially expressed with respect to either one. If so, the design you probably want is just

design <- model.matrix(~class + ID)

This will allow you to test for differences between T0 and T2w while controlling for all differences between individuals. This encompasses any differences due to age and bmi, as well as any other differences between individuals.

score 1 · Answer 2 · 2017-03-10

1

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 1 hour ago

The city by the bay

Your proposed design matrix is far, far more complicated than it needs to be. Simply do this:

ID <- factor(rep(c(1,14,16,17,18), each=2))
class <- factor(rep(c("T0", "T2weeks"), 5))
design <- model.matrix(~ID + class)

The first five coefficients are subject-specific blocking factors that are uninteresting and can be ignored. The last coefficient represents the log-fold change in expression after two weeks on caloric restriction, compared to a zero-time control. There is no need to block on BMI or age, because this is redundant with blocking on ID. Any differences in expression due to BMI or age will be modelled by the ID blocking terms.

ADD COMMENT • link 8.8 years ago Aaron Lun ★ 29k

0

Entering edit mode

Hi Aaron,

So i dont need to take into account using the correlation function that they are same people ?

So you are saying that the above code is the same as the following:

ID <- factor(rep(c(1,14,16,17,18), each=2))
class <- factor(rep(c("T0", "T2weeks"), 5))

design <- model.matrix(~class)

  dc <- duplicateCorrelation( expression_dataset,
                              design = design,
                              block = ID)

  resultmulti1 <- lmFit(  expression_dataset,
                          design = design,
                          block = ID,
                          cor = dc$consensus.correlation )

I have used this correlation function when i am looking at the same subjects or technical replicates but i guess the correlation function it only for the technical replicates (ones that are from the same sample but repeated ) but not for this case?

Thank you very much for your help- its been a long day!

Danielle

ADD REPLY • link 8.8 years ago danielle.newby • 0

0

Entering edit mode

If you block on the subject ID in the design matrix, you don't need to use duplicateCorrelation. This is because any subject-specific effects that might induce correlations in the paired samples are already modelled in the systematic component of the linear model. We generally recommend using duplicateCorrelation in situations where your blocking factor is confounded with the factor of interest. For example, if some subjects were control-only and other subjects were CR-only, then obviously you wouldn't be able to block on the subject in the design matrix, because you can't estimate the blocking factors and the treatment effect at the same time. In such cases, you would block via duplicateCorrelation.

ADD REPLY • link 8.8 years ago Aaron Lun ★ 29k

score 0 · Answer 3 · 2017-03-10

I don't think is quite right - the class:ID term means that a coefficient will be estimated for each unique {ID, class} combination, so you'll get lots of coefficients corresponding to this term (unless you've left ID as a numeric variable - you must convert it to a factor before putting it into the design). You'd then need to drop the intercept to make all the coefficients estimable, and devise a contrast that examined your many coefficients for the class:ID combinations. Also, your dataset has a variable called SubjectID rather than ID, and the capitalisation is inconsistent. You should keep bmi and age as numeric variables, but you may want to mean-centre them so that the coefficients become more obviously interpretable

I'm guessing you want to get a T0 vs T2 comparison, accounting for the paired nature of your data, rather than a per-subject estimate of this effect (which is sort of what your original model is heading towards?). If so, then something like

~ ID + bmi + age + class

will roughly give you what you want, but you might want to investigate the duplicateCorrelation function in limma to take account of within-subject correlation.

score 0 · Answer 4 · 2017-03-13

0

Entering edit mode

danielle.newby • 0

@daniellenewby-12503

Last seen 8.5 years ago

Hi Everyone,

Thank you so much for all your comments it is really helping with my learning of limma! I will try the ~ID + class and do abit more reading of the limma manual

Thanks again,

Danielle

ADD COMMENT • link 8.8 years ago danielle.newby • 0