Question

contrast matrix to compare 3 groups in limma

0

Entering edit mode

mahes.muniandy • 0

@mahesmuniandy-7955

Last seen 4 months ago

Helsinki

Hi,

My name is Mahes Muniandy and I am a doctoral student working on twin data. I have gene expression (Affymetrix HGU133 plus 2) data for monozygotic twins discordant for obesity. I clustered my gene expression data using heavy/lean ratios for each twin pair and obtained 3 groups and would like to compare all three groups of twins. This is to enable me to see if there are different sub-types to obesity. In other words, I would like to do within twin-pair differences (26 twin pairs) and see which genes are expressed differently in obesity across all 3 groups.
Can someone advise me if I am on the right track with my design matrix and limma approach below?

library(limma)
#twins
Pair <- factor(c("T1","T1","T2","T2","T3","T3","T4","T4","T5","T5","T6","T6","T7","T8","T8","T7","T9","T9","T10","T10","T11","T11","T12","T12","T13","T13","T14","T14","T15","T15","T16","T16","T17","T17","T18","T18","T19","T19","T20","T20","T21","T21","T22","T22","T23","T23","T24","T24","T25","T26", "T26", "T25"))
#obese vs lean
HeavyLean<-factor(c("O","L","L","O","L","O","O","L","O","L","L","O","L","L","O","O","O","L","O","L","L","O","L","O","O","L","O","L","L","O","O","L","O","L","L","O","L","O","L","O","O","L","O","L","O","L","O","L","L","L","O","O"))
#my 3 clusters
Cluster<-factor(c("c2","c2","c2","c2","c2","c2","c3","c3","c2","c2","c2","c2","c3","c2","c2","c3","c2","c2","c2","c2","c2","c2","c2","c2","c2","c2","c2","c2","c3","c3","c2","c2","c2","c2","c1","c1","c2","c2","c3","c3","c2","c2","c3","c3","c2","c2","c2","c2","c1","c2","c2","c1"))

LG <- paste(Cluster, HeavyLean, sep=".")
LG <- factor(LG, levels=c("c1.O","c1.L","c2.O","c2.L","c3.O","c3.L"))

design <- model.matrix(~0 + LG)
colnames(design) <- levels(LG)

dupcor <- duplicateCorrelation(limma_eset, design)
fit <- lmFit(limma_eset, design, block=Pair, correlation=dupcor$consensus)

contrast.matrix <- makeContrasts(C2vsC1inO = "c2.O-c1.O", C2vsC1inL = "c2.L-c1.L", C3vsC1inO = "c3.O-c1.O", C3vsC1inL = "c3.L-c1.L", C3vsC2inO = "c3.O-c2.O", C3vsC2inL = "c3.L-c2.L", C2vsC1 = (c2.O-c2.L)-(c1.O-c1.L), C3vsC1 = (c3.O-c3.L)-(c1.O-c1.L), C3vsC2 = (c3.O-c3.L)-(c2.O-c2.L),levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

I am getting adjusted p values of 1 and wonder if it is because my contrast matrix is not full rank? Can anyone suggest a better approach to me? I tried doing simple anova with Tukey - should I just use that?

Many Thanks,

Mahes Muniandy

Department of Public Health,

University Helsinki

limma • 3.6k views

ADD COMMENT • link updated 9.4 years ago by Aaron Lun ★ 28k • written 9.4 years ago by mahes.muniandy • 0

score 1 · Accepted Answer · 2016-01-28

You should specify block=Pair in the duplicateCorrelation call, otherwise the correlation won't be calculated correctly. In particular, the function will assume that you're calculating correlations between duplicate spots on the same array. This is not the case here, as you want the correlations between paired arrays.

Also, the comparisons between clusters will be more conservative than the comparisons within clusters, as the correlation will increase the standard error of the log-fold change for the former (whereas it decreases it for the latter). So, you might get some more DE genes if you focus on the contrasts between lean and obese in the same cluster. Contrasts like C3vsC1 should also be okay, as the intra-cluster log-fold change is being compared.