Alternative methylation contexts with MEDIPS?
2
0
Entering edit mode
Chris Fields ▴ 90
@chris-fields-4329
Last seen 2.0 years ago
United States

I noticed that MEDIPS doesn't seem to assess alternative methylation contexts like CHG or CHH.  This seems to stem from not allowing ambiguous matching within MEDIPS.getPositions.  Using something like matchPattern('CHG', subject, fixed=F) does seem to work.  

I can fork this on Github to add a fix, but I'm not sure how this works for Bioconductor modules (particularly since the Github repo mirror is read-only).

MEDIPS • 1.3k views
ADD COMMENT
1
Entering edit mode
Lukas Chavez ▴ 570
@lukas-chavez-5781
Last seen 6.1 years ago
USA/La Jolla/UCSD
Dear Chris, due to Ryan Lister’s 2009 Nature paper demonstrating a certain amount of DNA methylation at cytosines in non-CpG context in ES cells, I have experimented with the functionality of allowing for CHG and CHH context in MEDIPS ~5 years ago. If I remember correctly, I concluded that the low resolution of the MEDIPS enrichment technology does not allow for inferring the sequence context of methylated cytosines. The MEDIPS.couplingVector and MEDIPS,getPositions functions where written some time ago and I always thought they are somewhat flexible, for example changing the pattern parameter in the MEDIPS.couplingVector function to pattern=“C” should work. However, this will extract all genomic positions in CHH context where H = C | T | A | G and does not distinguish between CHG or H != G. If ambiguous matching does not work, please feel free to download the latest version of MEDIPS, change the functions and send them to me. I will be happy to update the package accordingly. Alternatively, I can also share my BioC MEDIPS password with you and you update the package by yourself. Personally, I am certainly willing to have MEDIPS modified, improved and updated by anyone who is interested. I just don’t know if this violates the Bioconductor concept having a dedicated maintainer? All the best, Lukas On 05 Aug 2016, at 06:50, Chris Fields [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User Chris Fields<https: support.bioconductor.org="" u="" 4329=""/> wrote Question: Alternative methylation contexts with MEDIPS?<https: support.bioconductor.org="" p="" 85791=""/>: I noticed that MEDIPS doesn't seem to assess alternative methylation contexts like CHG or CHH. This seems to stem from not allowing ambiguous matching within MEDIPS.getPositions. Using something like matchPattern('CHG', subject, fixed=F) does seem to work. I can fork this on Github to add a fix, but I'm not sure how this works for Bioconductor modules (particularly since the Github repo mirror is read-only). ________________________________ Post tags: MEDIPS You may reply via email or visit Alternative methylation contexts with MEDIPS?
ADD COMMENT
0
Entering edit mode

I'll have a look at modifying the code fairly soon, taking into account the issues Matthias Lienhard mentions re: ambiguous matches.   Once I have a better idea on what can be done I'll repost here.  Thanks for the prompt reply!

ADD REPLY
0
Entering edit mode
@matthias-lienhard-6292
Last seen 12 weeks ago
Max Planck Institute for molecular Gene…

Hi Chris,

I recently developed qsea, an alternative package for the analysis of enrichment based methylation data, which is in the currently in the devel-branch of Bioconductor. The package is based on the ideas of MEDIPS, but extends the functionality and facilitates the usage. The focus is on estimating absolute methylation levels, but it also includes a more flexible function to estimate pattern densities, just as you suggested: the addPatternDensity function has a parameter "fixed", that allows for flexible patterns. This is however not the only change within this function. But even if you chose to extend MEDIPS, I'd suggest having a look at the function, as there are pitfalls with this approach, mainly due to the Ns in the reference, that unintentionally matches any character in the pattern.

Best, Matthias

 

ADD COMMENT
0
Entering edit mode

Hi Matthias,

CHG if using DNAString in a matchPattern should match C[ATC]G and CHH C[ATC][ATC] as long as fixed=F.  I don't believe it will match reference N by strict IUPAC rules, but it's worth confirming.

ADD REPLY
0
Entering edit mode

Hi Chris,

maybe I wasn't clear, what I ment was the following behavior of matchPattern, when setting fixed=F

>library(BSgenome.Hsapiens.UCSC.hg19)
>library(Biostrings)

>chr_seq=getBSgenome("BSgenome.Hsapiens.UCSC.hg19")[["chr1"]]
>chr_seq
  249250621-letter "DNAString" instance
seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

>pIdx=start(matchPattern(pattern=DNAString("CG"),
            subject=chr_seq, fixed=FALSE))

>head(pIdx)
[1] 1 2 3 4 5 6

A "maskMotif" call fixes this:

>chr_seq<-maskMotif(chr_seq, "N")

Best, Matthias

 

ADD REPLY
0
Entering edit mode

Ah, that's good to know, thanks!  This is for a non-model organism with a fair number of gaps, so having the mask will help.  

ADD REPLY

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6