I am using minfi and calculating DMRs using bumphunter(). That gives me a table with ranges and various statistics, but I don't see the beta values. The bumps object is relatively complex with a lot of contents, so I feel like they might be in there. Are they? How can I retrieve them?
I didn't know about the %over% command. That's very useful. However, that just subsets the beta values for probes overlapping bumps, not the beta values summarized for each bump.
OK, then maybe you could be a bit more explicit about what you want. There are any number of things that could be called a 'beta' in this context. I gave you what I thought you wanted, but evidently I misunderstood. What exactly do you mean by 'beta values summarized for each bump'?
Sorry. I guess my initial question was a bit vague. Bumphunter gives you a table with stats for each bump. However, what is the methylation value for each sample for that bump? getBeta gives you a table of beta values with probes as rows. Then bumphunter condenses probes into bumps. Is there a table of beta values with bumps as rows?
You are still being a bit vague with your question. I can interpret 'methylation value for each sample for that bump' numerous ways, and none of them would match up with whatever I would interpret 'beta values with bumps as rows' to mean. Note that there are beta values that estimate the proportion of a CpG site that is methylated, and a statistician often talks about model coefficients using the term beta as well. So it's still not clear if you want model betas, or methylation betas, or the mean of the betas for all samples in a bump, or something else entirely.
Anyway, bumphunter doesn't condense probes into bumps. What happens is that whatever model you specify is fit on each probe, then bumphunter uses the coefficients from the fit on each probe to decide if there is a bump in a particular region or not. So in a simple model of treated vs control or whatever, the coefficients are the average difference between the two groups, for each CpG. If the differences between groups are all in one direction for a reasonable genomic distance, then bumphunter will likely think there is a 'bump' there.
There is a 'coef' item in the bumps object that has the raw coefficients for each CpG, and a 'fitted' item that has the smoothed coefficients, if you said you wanted smoothing. But again, these are the coefficients, and given the way the model is set up, they are the differences between groups. So there isn't a table of these beta values, because it's a vector, with one value per CpG.
There is also the 'value' column in the bumps table, which is the mean of the coefficients for all the CpGs that were in a given bump. So that gives you the average difference between the two groups in that bump.
Thanks for clarifying. That was very helpful because I haven't been able to find a clear explanation of the bumphunter output.
However, I want to know the methylation level for every bump for every sample. All the measurements seem to be some measure of difference between the two groups. I can tell to what degree the methylation is significantly increasing or decreasing, but not from where to where. I was expecting to find something analogous to cpgCollapse() from minfi:
This function groups adjacent loci into clusters with a specified maximum gap between CpGs in the cluster, and a specified maximum cluster width. The loci within each cluster are summarized resulting in a single methylation estimate per cluster.
Right. I have shown you how to get the beta values for every CpG in a bump, and you can use the corresponding getM function if you want the M values. All cpgCollapse does is compute the column means of the resulting set of values. You can use colMeans to get that.
Does getBeta work like that? I don't see that in the docs and I got an error:
I am giving it GenomicRatioSet and GRanges objects as you suggested.
My bad.
I didn't know about the %over% command. That's very useful. However, that just subsets the beta values for probes overlapping bumps, not the beta values summarized for each bump.
OK, then maybe you could be a bit more explicit about what you want. There are any number of things that could be called a 'beta' in this context. I gave you what I thought you wanted, but evidently I misunderstood. What exactly do you mean by 'beta values summarized for each bump'?
Sorry. I guess my initial question was a bit vague. Bumphunter gives you a table with stats for each bump. However, what is the methylation value for each sample for that bump? getBeta gives you a table of beta values with probes as rows. Then bumphunter condenses probes into bumps. Is there a table of beta values with bumps as rows?
You are still being a bit vague with your question. I can interpret 'methylation value for each sample for that bump' numerous ways, and none of them would match up with whatever I would interpret 'beta values with bumps as rows' to mean. Note that there are beta values that estimate the proportion of a CpG site that is methylated, and a statistician often talks about model coefficients using the term beta as well. So it's still not clear if you want model betas, or methylation betas, or the mean of the betas for all samples in a bump, or something else entirely.
Anyway, bumphunter doesn't condense probes into bumps. What happens is that whatever model you specify is fit on each probe, then bumphunter uses the coefficients from the fit on each probe to decide if there is a bump in a particular region or not. So in a simple model of treated vs control or whatever, the coefficients are the average difference between the two groups, for each CpG. If the differences between groups are all in one direction for a reasonable genomic distance, then bumphunter will likely think there is a 'bump' there.
There is a 'coef' item in the bumps object that has the raw coefficients for each CpG, and a 'fitted' item that has the smoothed coefficients, if you said you wanted smoothing. But again, these are the coefficients, and given the way the model is set up, they are the differences between groups. So there isn't a table of these beta values, because it's a vector, with one value per CpG.
There is also the 'value' column in the bumps table, which is the mean of the coefficients for all the CpGs that were in a given bump. So that gives you the average difference between the two groups in that bump.
Thanks for clarifying. That was very helpful because I haven't been able to find a clear explanation of the bumphunter output.
However, I want to know the methylation level for every bump for every sample. All the measurements seem to be some measure of difference between the two groups. I can tell to what degree the methylation is significantly increasing or decreasing, but not from where to where. I was expecting to find something analogous to cpgCollapse() from minfi:
Right. I have shown you how to get the beta values for every CpG in a bump, and you can use the corresponding
getM
function if you want the M values. AllcpgCollapse
does is compute the column means of the resulting set of values. You can usecolMeans
to get that.