I have a question on differential DNA methylation probes and genes. Hope somebody could give some guidacne. I am working on an EPIC DNA methylation dataset with more than 200 samples (half control samples, and half disease samples), with other confounding information included, like various cell contents (cell type1 to cell type3) of the samples, and other factors like donator's age, gender, BMI and so on. Then I did differential methylation analysis on two levels, one was on EPIC methylation probe level (use their beta values), and the other was on gene level (use the average beta value of probes in gene promotor region to represent gene methylation level). For both probes and genes, I used limma to find differential ones with an regression model like,
probe/gene beta value ~ Disease_status + Cell_type1_content + Cell_type2_content + Cell_type3_content + Disease_status:Cell_type1_content + Diease_status:Cell_type2_content + Diease_status:Cell_type3_content + age + gender + BMI + ... (other confounding factors)
Then, I checked the adjusted p-value of the interaction terms like, Diesease_status:Cell_type1_content. It inidcated whether the correlation between probe/gene and Cell_type1_content, was significantly different between control group and disease group. However, what I found wired was that, on gene level, more than 3000 such differential genes could be found for the interaction term Disease_status:Cell_type1_content, while on probe level, only less than 500 probes could be found, which was much fewer than the genes. Since gene methylation value was summaried from probe value, as mentioned above, it was unreasonable that its number was much larger than that of probe. I don't know whether it is the problem of my limma regression design, or gene beta value summarization. Hope somebody could give some suggestions. Thank you so much!