limma for 450K methylation array data
2
0
Entering edit mode
@andreas-paul-11140
Last seen 7.4 years ago

Dear all,

we want to use the limma package for the analysis of methylation data from the 450K Illumina
array for continuous traits.
We have 50 samples which were processed on the 450k array. The design only consist cases
without any controls. We have more than 5 covariates we want to include into the analysis apart
from age and ethnicity.
Could you give a broad idea/ R commands how to build a model and adjust for the above covariates using limma? We will use normalized and filtered M-values for the statistical analysis.

 

ID GroupAge Ethnicity GroupSurgery GroupSeizures GroupDrug GroupTreatment Group5 Group6 Group7
1 1 CEU 1 2 1 3 .. .. ..
2 1 CEU 2 3 3 3 .. .. ..
3 2 CEU 1 2 2 3 .. .. ..
4 1 CEU 1 2 1 3 .. .. ..
5 3 AFR 3 1 1 2 .. .. ..
6 1 ASN 2 1 2 3 .. .. ..
7 1 ASN 1 2 2 3 .. .. ..
8 1 CEU 3 3 2 2 .. .. ..
9 2 MEX 1 1 1 1 .. .. ..

 

This is how I would do it in the first place. Would be happy if you can have a look at. Im very thankful for any corrections and suggestions for designing the analysis.

samplesheet <- read.table("samplesheet.txt",row.names="Sample_ID")
age <- as.numeric(samplesheet[colnames(m),]$GroupAge)
ethnicity <- as.factor(samplesheet[colnames(m),]$Ethnicity)
group1 <-  as.numeric(samplesheet[colnames(m),]$GroupSurgery)
group2 <- as.numeric(samplesheet[colnames(m),]$GroupSeizures)
group3 <- as.numeric(samplesheet[colnames(m),]$GroupDrug)
group4 <- as.numeric(samplesheet[colnames(m),]$GroupTreatment)

library(limma)
design <- model.matrix(~age+ethnicity+group1+group2+group3+group4)
fit <- lmFit(m.filtered,design)
fit <- eBayes(fit)
topTable(fit, coef="age")

topTable(fit, coef="ethnicity")

toTable(fit, coef="group1")

...


Thank you very much in advance.

limma minfi • 1.3k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

All that as.numeric business will make R think that those things are numeric rather than factor. So unless a surgery of 3 is like 3 times more surgerylicious than a surgery of 1, you should probably not be doing that. The same goes for the seizures, drug, and treatment factors.

Also note that you won't have a coefficient called 'group1' if it is in fact some sort of surgery level. It would probably be easier to just leave it as Surgery, because then you will have a Surgery2 and Surgery3, which will estimate the difference between those surgery 'levels' and the 1 surgery level. If you stick with 'group1', then those coefficients will be 'group12' and 'group13', which are just confusing for no real gain. As a simple example:

> d.f <- data.frame(age = sample(25:75, 10), ethnicity = sample(c("CEU", "AFR","ASN","MEX"), 10, TRUE), Surgery = factor(sample(1:3, 10, TRUE)))
> model.matrix(~age+ethnicity+Surgery, d.f)
   (Intercept) age ethnicityASN ethnicityCEU ethnicityMEX Surgery2 Surgery3
1            1  56            0            0            1        1        0
2            1  65            0            1            0        0        0
3            1  28            1            0            0        0        1
4            1  44            1            0            0        0        0
5            1  33            1            0            0        0        1
6            1  69            0            0            1        0        1
7            1  71            0            0            1        0        0
8            1  61            0            0            0        1        0
9            1  60            0            1            0        1        0
10           1  68            1            0            0        0        1
attr(,"assign")
[1] 0 1 2 2 2 3 3
attr(,"contrasts")
attr(,"contrasts")$ethnicity
[1] "contr.treatment"

attr(,"contrasts")$Surgery
[1] "contr.treatment"

> names(d.f)[3] <- "group1"
> model.matrix(~age+ethnicity+ group1, d.f)
   (Intercept) age ethnicityASN ethnicityCEU ethnicityMEX group12 group13
1            1  56            0            0            1       1       0
2            1  65            0            1            0       0       0
3            1  28            1            0            0       0       1
4            1  44            1            0            0       0       0
5            1  33            1            0            0       0       1
6            1  69            0            0            1       0       1
7            1  71            0            0            1       0       0
8            1  61            0            0            0       1       0
9            1  60            0            1            0       1       0
10           1  68            1            0            0       0       1
attr(,"assign")
[1] 0 1 2 2 2 3 3
attr(,"contrasts")
attr(,"contrasts")$ethnicity
[1] "contr.treatment"

attr(,"contrasts")$group1
[1] "contr.treatment"

In addition, there is this idea that adjacent CpGs should be correlated, based on the idea that methylation is a regional phenomenon. This is a powerful idea, as you can then borrow information from adjacent CpG probes, to reduce the inherent variability of individual measurements. The minfi package will allow you to run bumphunter on your data, which puts a smooth curve on adjacent CpGs, and then you can use those smoothed estimates to look for differences rather than the individual CpGs. This also reduces multiplicity, so it's probably a win-win.

ADD COMMENT
0
Entering edit mode
@andreas-paul-11140
Last seen 7.4 years ago

Thank you very much! This helps as it is really well explained! Thank you again!

ADD COMMENT

Login before adding your answer.

Traffic: 435 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6