Alternative methylation contexts with MEDIPS?
2
0
Entering edit mode
Chris Fields ▴ 60
@chris-fields-4329
Last seen 5.1 years ago

I noticed that MEDIPS doesn't seem to assess alternative methylation contexts like CHG or CHH.  This seems to stem from not allowing ambiguous matching within MEDIPS.getPositions.  Using something like matchPattern('CHG', subject, fixed=F) does seem to work.

I can fork this on Github to add a fix, but I'm not sure how this works for Bioconductor modules (particularly since the Github repo mirror is read-only).

MEDIPS • 639 views
1
Entering edit mode
Lukas Chavez ▴ 570
@lukas-chavez-5781
Last seen 3.7 years ago
USA/La Jolla/UCSD
0
Entering edit mode

I'll have a look at modifying the code fairly soon, taking into account the issues Matthias Lienhard mentions re: ambiguous matches.   Once I have a better idea on what can be done I'll repost here.  Thanks for the prompt reply!

0
Entering edit mode
@matthias-lienhard-6292
Last seen 4 months ago
Max Planck Institute for molecular Gene…

Hi Chris,

I recently developed qsea, an alternative package for the analysis of enrichment based methylation data, which is in the currently in the devel-branch of Bioconductor. The package is based on the ideas of MEDIPS, but extends the functionality and facilitates the usage. The focus is on estimating absolute methylation levels, but it also includes a more flexible function to estimate pattern densities, just as you suggested: the addPatternDensity function has a parameter "fixed", that allows for flexible patterns. This is however not the only change within this function. But even if you chose to extend MEDIPS, I'd suggest having a look at the function, as there are pitfalls with this approach, mainly due to the Ns in the reference, that unintentionally matches any character in the pattern.

Best, Matthias

0
Entering edit mode

Hi Matthias,

CHG if using DNAString in a matchPattern should match C[ATC]G and CHH C[ATC][ATC] as long as fixed=F.  I don't believe it will match reference N by strict IUPAC rules, but it's worth confirming.

0
Entering edit mode

Hi Chris,

maybe I wasn't clear, what I ment was the following behavior of matchPattern, when setting fixed=F

>library(BSgenome.Hsapiens.UCSC.hg19) >library(Biostrings)

>chr_seq=getBSgenome("BSgenome.Hsapiens.UCSC.hg19")[["chr1"]] >chr_seq   249250621-letter "DNAString" instance seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

>pIdx=start(matchPattern(pattern=DNAString("CG"),             subject=chr_seq, fixed=FALSE))

>head(pIdx) [1] 1 2 3 4 5 6

>chr_seq<-maskMotif(chr_seq, "N")

Best, Matthias

0
Entering edit mode

Ah, that's good to know, thanks!  This is for a non-model organism with a fair number of gaps, so having the mask will help.