Search
Question: champ.DMP phenotypes and covariates
0
8 months ago by
o.giannakopoulou0 wrote:

Hello,

I'm using ChAMP for the analysis of my DNA methylation data obtained with EPIC. I have samples from a set of participants in two time points. I would like to identify the differentially methylated positions between the two time points. The values of Sample_Group are "first" and "second" (standing for first visit and second visit). So the appropriate command would be:

myDMP<-champ.DMP(beta = myCombat,pheno = myLoad$pd$Sample_Group,adjPVal = 0.05,adjust.method = "BH",compare.group = NULL,arraytype = "EPIC")

right? Moreover, from the champ.SVD analysis I have seen that some of the control probes and the proportions of cells types are correlated with methylation levels, even after correcting for batch effects (champ.runCombat). At that point, I would like to clarify something: in the ChAMP package pipeline when it is mentioned in the champ.SVD part that "the darker the color is, the more significant your deconvoluted components are correlated with your phenotype", the "phenotype" describes the methylation levels, right?

Is there any option to include this information in the champ.DMP command and use these variables as covariates in the statistical model?

Thank you,

Olga

modified 8 months ago by Yuan Tian40 • written 8 months ago by o.giannakopoulou0
0
8 months ago by
Yuan Tian40
Shanghai Institute for Biology Science, Shanghai, China
Yuan Tian40 wrote:

Hello Olga:

Based on your description, I think you are actually doing paired analysis, means you actually have paired data, see if some treatment could have effect on then.

champ.DMP() is NOT designed for this analysis, nor do champ.DMR(), champ.DMB() designed for them. Please don't use them for this kind of analysis.

I personally wrote another set of scripts for this work, you may find them in my GitHub. You may find some scripts named champ.PairedDMP() champ.PairedDMR() ... in this folder. These are scripts I wrote for Paired Analysis. I was intended to put them into new version ChAMP, but I have not finished DUI function designed yet. So I suggest you use these script for paired analysis. You may use them get Paired DMP/DMR/DMB.

Secondly, phenotype means covariates like aging, cancer status, race, plate, array... control probe values. They actually means annotation for your data set. When the color is dark, it means your deconvoluted components shows strong correlation with these factors, like age, cancer status. This is important because if you want to analysis cancer status, but you find your top components are correlated with batches, which could means there is biased in your data, means most DMPs your find are actually caused by batches instead of your phenotype. Your cell fractions are correlated with components, which is reasonable in most research, for cell fraction, you may included these cell fraction in your DMP analysis (Here you need to write your own script, champ.DMP() is not designed for integrating covariates yet) as covariate, then their effect would be removed.

I am not fully understand what you mean "phenotype" means "methylation level". You can assume "phenotypes" are measurement for each sample, some may refers to sample's race and life status, but some may means experiment performance on this sample, but they are covariates to be analyzed, not methylation level. Beta matrix represents methylation level.

Best ^_^

Yuan Tian

Thank you again for the really informative reply. Indeed, I have paired data (i.e. data of same patients in two time points). I was thinking that I could overcome that by including the patient_id as a covariate in my model. But as I can see from your reply there is not yet this option in champ.DMP(). I'll definitely go through your champ.PairedDMP() / champ.Paired.DMR() scripts. Thank you a lot for the link.

Regarding the champ.SVD, I'm afraid I'm a bit confused with what "deconvoluted components" mean. I thought that dark color means that the specific factor (eg. aging, control probes, cell counts) is associated with methylation levels and should be included as covariates. If I got it right, the champ.runCombat can correct for the technical variation of the array and the slide so for these factors after the correction there is no need to be included in the model. Am I right?

Thank you again for the valuable help

Best,

Olga

Hello Olga:

The "deconvoluted components" means latent variable which could explain variance in your original data set. For example, SVD is a commonly used deconvolution method for matrix data. You may read something related to SVD or PCA (principle component analysis). In short, we take original data matrix as combination of various latent variables, they mixed together into our data, these variables could be age, race, cancer status or even array, plate... They are all mixed that we can not separate them our clearly, nor could I know if they have effect on original data, so we want some method to know if effect of these variables exist.

Deconvolution method could be used on matrix, these methods includes SVD, ICA, NMF... They are all designed to extract most variable latent factors hidden in original matrix. Thus if you are doing research on Cancer/Normal, surely you hope most variance in your data set are caused by Cancer/Normal phenotype, because if so, all your DMPs, DMRs are correctly caused by Cancer, which would return you fantastic result. However, life is never easy, in most case, data sets are mixed by age, race... which means the variance in your original matrix are not only caused by Cancer/Normal, but combined effect from age and race, means your DMP may not actually related to Cancer.

After SVD, latent variables are used to represent orginal data's variance, especially top components, which counts most most variance exist in original matrix, thus if top components correlated with your phenotype of interest is quite important. That's why SVD correlate all top components to all phenotypes you have. If you find top components only correlated with Cancer/Normal, congratulations. However if it's not, you need to see why it's not, does top component correlated with age or cell type? Which could also be reasonable, because indeed DNA methylation could heavily influenced by them, for this situation you don't need to do Combat on age or cell type. They are indeed biology factors, not unrelated factors. They are just not happen to be your research interest. for this situation I suggest you add these "biology factors" as covariates and remove their effect when your detect DMPs. However, if your top components correlated with array, plate, even source of your sample, which is not reasonable, most likely to be mistake caused during transport or experiment, they represent no biology meanings. For these factors, I suggest you use champ.runCombat() to remove them.

Finally, champ.runCombat() have two parameters, variable and batch, one represent phenotye of interest you are about to research, while the other represent batch you think totally make no sense but showing strong correlation with top components. You may assign which one is batch. So champ.runCombat() is not automatically removed effect of slide or array, you still need to assign the parameter.

Best

Yuan Tian