Hi there. i'm analyzing a HM450K array methylation data with 2 groups(group1 vs group2. I wanna to compare the two groups in order to find differentially methylated probes(DMPs), so, I ran limma analysis on this dataset.
But when i discuss the pipeline with someone who engages in methylation data analysis in a bioinfo company, he told me that he add the MDS1, MDS2 from multidimensional scaling as covariates in find DMPs.
We have gwas data of those samples. Two groups were mixed together in the PCA plot based on gwas data. What puzzles me is the MDS1&2 he included in covariates were generated from methylation data. I thought if we wanna adjust the potential effect of population stratification, we should use the PCs from gwas data. Some previous studies use the MDS or PCA plot to testify whether methylation data can provide strong signatures to the target condition. So I worry about adding the MDS1, MDS2 as covariates may decrease the difference in DNA methylation between two groups. And i ran the pipeline twice, the difference in the two pipeline is covariates. Probes were considered to be differentially methylated if P-value after BH adjusted < 0.05.
1st: covariates: age + gender + array; differentially methylated probes(DMPs): 152,688 probes
2nd: covariates: age + gender + array+MDS1+MDS2(from methylation data) ; DMPs: 48,046 probes
Two result differs greatly. We got far fewer DMPs in the second time, as i expected. I used the 1st pipeline in analysis, and i'm wondering is that correct?
So, i still confused about this question. Any suggestion will be great appreciated!