Question

Twin analysis using limma

0

Entering edit mode

mahes.muniandy • 0

@mahesmuniandy-7955

Last seen 10 days ago

Helsinki

Hello,

My name is Mahes Muniandy and I am a doctoral student working on twin data. I would like to compare the gene expression (Affymetrix HGU133 plus 2) between lean and obese people and see how sex (Male/Female) affects the gene expression. So, I would like to do within twin-pair differences (26 twin pairs) and see which genes are expressed differently in obesity in males and females. Can someone advise me if I am on the right track with my design matrix and limma approach below?

Here is what I have.

LG <- paste(targets$Gender, targets$LeanStatus, sep=".")
LG <- factor(LG, levels=c("F.O","F.L","M.O","M.L"))
Pairs <- factor(c("T1","T1","T2","T2","T3","T3","T4","T4","T5","T5","T6","T6","T7","T8","T8","T7","T9","T9","T10","T10","T11","T11","T12","T12","T13","T13","T14","T14","T15","T15","T16","T16","T17","T17","T18","T18","T19","T19","T20","T20","T21","T21","T22","T22","T23","T23","T24","T24","T25","T26", "T26", "T25"))
design <- model.matrix(~Pairs+LG)
colnames(design) <- levels(LG)
fit <- lmFit(eset, design)

cont.matrix <- makeContrasts(OvsLinF=F.O-F.L, OvsLinM=M.O-M.L, Diff=(M.O-M.L)-(F.O-F.L), levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)

Is this the best design for this kind of comparison?

Many Thanks,

Mahes Muniandy,

Obesity Research Unit,

University of Helsinki

limma limma paired analysis • 1.5k views

ADD COMMENT • link updated 8.3 years ago by Aaron Lun ★ 28k • written 8.3 years ago by mahes.muniandy • 0

score 1 · Answer 1 · 2016-01-04

It's hard to say without looking at the data. You might want to start out with some PCA or MDS plots to see how similar the twins are to each other. Blocking on twins like that is in essence saying that you think that there is a twin-specific mean difference in expression that you want to adjust for. That may well be true for some of the genes, but a more plausible scenario is that the twins are simply more correlated in their gene expression than the unrelated subjects, in which case you would instead use duplicateCorrelation() to estimate the intra-twin correlation and then fit a mixed model.

Of course that has its own set of assumptions, namely that the intra-twin correlation is relatively consistent between the twins, as you will be fitting a block diagonal correlation matrix. I don't think either way is 'wrong', there are just different assumptions being made, and you would do well to try to see which set of assumptions are more plausible.

score 0 · Answer 2 · 2016-01-04

Blocking factors for each twin pair may not be appropriate, depending how the levels of LG are arranged across your 50 samples. For example, if you had a situation where the twins in each pair were always both obese or both lean, then you wouldn't be able to do your first two contrasts. Conversely, if both twins are of the same sex and differ only in their obesity status, then I suspect the design matrix will not be of full rank. Also, this command:

colnames(design) <- levels(LG)

... makes no sense, because you also have additional coefficients from the twin-pair blocking factors.

An alternative approach is to treat the twin pair as a random effect. This is probably reasonable, given that you're testing twins sampled from some larger population of twins. You could then do something like:

design <- model.matrix(~0 + LG)
colnames(design) <- levels(LG)
dupcor <- duplicateCorrelation(eset, design)
fit <- lmFit(eset, design, block=Pairs, correlation=dupcor$consensus)

... and so on, with contrasts.fit and eBayes. This set-up ensures that you can compare between levels of LG, regardless of how they're arranged with respect to Pairs.