Search
Question: multiple comparison using limma
1
gravatar for annabelle.congras
22 months ago by
annabelle.congras10 wrote:

Hi, 

I am using limma to get differentially methylated probes from MethylEPIC data, using M values. I have 9 groups of samples (from groupA to group I) within 3 arrays, containing 2 to 5 samples each. 

I get very different results depending on the number of comparison I perform ie the number of contrasts in my contrast matrix. 

For example I have a number of significant differential probes much lower  in the groupD-groupC comparison when using :  

Group2 <- factor(pheno$Sample_Group2)
design2 <- model.matrix(~0+Group2, data=targets)
colnames(design2) <- levels(Group2)
fit2a <- lmFit(M_flt, design2)
x2<-c("groupA-groupB", "groupA-groupC", "groupD-groupC", "groupE-groupC")
contMatrix2 <- makeContrasts(contrasts=x2, levels=design2)
fit2b <- contrasts.fit(fit2a, contMatrix2)
fit2b <- eBayes(fit2b)

summ_group2 <- summary(decideTests(fit2b, adjust.method="BH", p.value=0.05))

than when I add several other comparisons:  

Group2 <- factor(pheno$Sample_Group2)
design2 <- model.matrix(~0+Group2, data=targets)
colnames(design2) <- levels(Group2)
fit2a <- lmFit(M_flt, design2)
x2<-c("groupA-groupB", "groupA-groupC", "groupD-groupC", "groupE-groupC", "groupD-groupE", "groupF-groupG", "groupH-groupI")
contMatrix2 <- makeContrasts(contrasts=x2, levels=design2)
fit2b <- contrasts.fit(fit2a, contMatrix2)
fit2b <- eBayes(fit2b)

summ_group2 <- summary(decideTests(fit2b, adjust.method="BH", p.value=0.05))

I'm moving from 100000 differentially methylated probes in case 1 to 200000 in case 2! With different logFC values as well. Moreover some probes in the 100000 are not in the 200000 and the top ten most significant are not the same.. 

Is it correct to use all my data in the design matrix and the linear model if I perform comparisons on a subset of group only as in case 1?  Would it be more powerful to remove the unused data before performing the linear model? 

If I keep all data in my design matrix and linear model, which method is the most accurate? I don't know if i'm underestimating significance in case 1 or overestimating it in case 2.. 

Hope you can help, sorry if the question has already been asked by newbies like me! 

ADD COMMENTlink modified 22 months ago • written 22 months ago by annabelle.congras10
1
gravatar for Aaron Lun
22 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

You must have done something wrong here. Let me mock up some pretend data:

set.seed(100)
Group2 <- factor(paste0("group", rep(LETTERS[1:9], each=3)))
group.means <- matrix(rnorm(nlevels(Group2)*100), ncol=nlevels(Group2))
M_flt <- matrix(rnorm(length(Group2)*100, mean=group.means[,Group2]), 
                ncol=length(Group2))

... and run your first block of code (slightly paraphrased);

design2 <- model.matrix(~0+Group2)
colnames(design2) <- levels(Group2)
fit2a <- lmFit(M_flt, design2)
x2 <- c("groupA-groupB", "groupA-groupC", "groupD-groupC", "groupE-groupC")
contMatrix2 <- makeContrasts(contrasts=x2, levels=design2)
fit2b <- contrasts.fit(fit2a, contMatrix2)
fit2b <- eBayes(fit2b)
summ_group2 <- summary(decideTests(fit2b, adjust.method="BH", p.value=0.05))

Looking at summ_group2 gives me:

   groupA-groupB groupA-groupC groupD-groupC groupE-groupC
-1            11            13            11            11
0             78            77            76            80
1             11            10            13             9

... while if we re-run it with the additional contrasts:

x2.alt <- c("groupA-groupB", "groupA-groupC", "groupD-groupC", "groupE-groupC",
            "groupD-groupE", "groupF-groupG", "groupH-groupI")
contMatrix2.alt <- makeContrasts(contrasts=x2.alt, levels=design2)
fit2b.alt <- contrasts.fit(fit2a, contMatrix2.alt)
fit2b.alt <- eBayes(fit2b.alt)
summ_group2.alt <- summary(decideTests(fit2b.alt, adjust.method="BH", p.value=0.05))

... this gives me:

   groupA-groupB groupA-groupC groupD-groupC groupE-groupC groupD-groupE
-1            11            13            11            11            12
0             78            77            76            80            76
1             11            10            13             9            12
   groupF-groupG groupH-groupI
-1             8            13
0             79            75
1             13            12

The output of topTable is also the same for each matching contrast between the two runs above. I can only assume that your data or design matrix is changing between the two runs. Incidentally, your design matrix is being formed from targets$Group2, not Group2. This is probably not what you intended (though it's probably not the cause of your problem either).

ADD COMMENTlink modified 22 months ago • written 22 months ago by Aaron Lun21k
0
gravatar for James W. MacDonald
22 months ago by
United States
James W. MacDonald48k wrote:

Possibly orthogonal to your question, but for methylation array analysis (well, methylation analysis in general), it is a better idea to use something like minfi or DMRcate, DSS, etc to find differentially methylated regions rather than relying on information from individual CpG probes.

The reason for this is that methylation appears to be a regional phenomenon, where you expect contiguous CpGs to be similarly methylated, and measures of CpG methylation tend to be rather noisy. If you borrow information from adjacent CpG probes to estimate the regional methylation status, you are likely to find things that are biologically meaningful (e.g., regions of consistently differently methylated CpGs) and ignore things that may not be (e.g., single CpGs that appear to be highly differentially methylated in a region where no other CpGs agree).

ADD COMMENTlink written 22 months ago by James W. MacDonald48k
0
gravatar for annabelle.congras
22 months ago by
annabelle.congras10 wrote:

Hi, 

Thanks for your answers!

You were right Aaron I might have done something wrong cause it's working fine now after removing all data and re-running my script line by line.. thanks anyway 

James,I actually used minfi for the normalisation/filtering steps, limma for DMP analysis, and I'm now moving to DMRcate for DMR analysis. I might come back soon with questions about that! 

Annabelle

ADD COMMENTlink written 22 months ago by annabelle.congras10

That's good to hear. For future reference, reply to answers with "add comment" rather than "add answer".

ADD REPLYlink written 22 months ago by Aaron Lun21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 343 users visited in the last hour