Question

should i include PCs or MDSs as covariates in methylation study?

0

Entering edit mode

RC • 0

@rc-9372

Last seen 4.5 years ago

Beijing

Hi there. i'm analyzing a HM450K array methylation data with 2 groups(group1 vs group2. I wanna to compare the two groups in order to find differentially methylated probes(DMPs), so, I ran limma analysis on this dataset.

But when i discuss the pipeline with someone who engages in methylation data analysis in a bioinfo company, he told me that he add the MDS1, MDS2 from multidimensional scaling as covariates in find DMPs.

We have gwas data of those samples. Two groups were mixed together in the PCA plot based on gwas data. What puzzles me is the MDS1&2 he included in covariates were generated from methylation data. I thought if we wanna adjust the potential effect of population stratification, we should use the PCs from gwas data. Some previous studies use the MDS or PCA plot to testify whether methylation data can provide strong signatures to the target condition. So I worry about adding the MDS1, MDS2 as covariates may decrease the difference in DNA methylation between two groups. And i ran the pipeline twice, the difference in the two pipeline is covariates. Probes were considered to be differentially methylated if P-value after BH adjusted < 0.05.

1st: covariates: age + gender + array; differentially methylated probes(DMPs): 152,688 probes

2nd: covariates: age + gender + array+MDS1+MDS2(from methylation data) ; DMPs: 48,046 probes

Two result differs greatly. We got far fewer DMPs in the second time, as i expected. I used the 1st pipeline in analysis, and i'm wondering is that correct?

So, i still confused about this question. Any suggestion will be great appreciated!

limma methylation R covariate MDS • 1.4k views

ADD COMMENT • link updated 7.0 years ago by Aaron Lun ★ 28k • written 7.0 years ago by RC • 0

score 0 · Answer 1 · 2017-05-08

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 8 hours ago

The city by the bay

If you have strong differences between groups in your data, the first two MDS dimensions will correspond to the differences between groups. Using them as covariates will obviously reduce power to reject the null hypothesis, because genuine differences between groups are modelled by the covariates under the null model. If you want to empirically account for population structure, you're better off using methods like RUVnormalize or sva. You could also use PCs obtained from the GWAS data, but this will only account for variation due to genotype.

ADD COMMENT • link 7.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron. There is no difference in age, or gender between groups. All of our samples are from the same population, and PCA plot of gwas data show no population stratification in our samples. So, i thought first two MDS dimensions may correspond to the phenotype of our interest.

I'm gonna remove MDS1,2 from covariates in analysis. Thanks for your reply. :)

ADD REPLY • link 7.0 years ago RC • 0