Question: Compute the CpG content for all chromosome like the percentage of G+C
gravatar for Tiphaine Martin
2.3 years ago by
Tiphaine Martin20 wrote:


I would like to visualize not the GC content but CpG content in a track of Gviz, for example.

Do you have an idea to do that ?



ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Tiphaine Martin20
gravatar for James W. MacDonald
2.3 years ago by
United States
James W. MacDonald43k wrote:
It depends on what exactly you mean by 'CpG content'. If you mean that you want to plot the CpG islands, then please note that the very first example in the Gviz vignette shows just how to do that. Or do you mean something else?
ADD COMMENTlink written 2.3 years ago by James W. MacDonald43k
gravatar for Tiphaine Martin
2.3 years ago by
Tiphaine Martin20 wrote:

I would like not only the CpG island but  the distribution of CpG along the chromosome or in a genomic region.

So it is like the percentage of CpG in window and this window slips a step along the chromosome (1 or more nucleotide). 

ADD COMMENTlink written 2.3 years ago by Tiphaine Martin20

I'm not sure if this is exactly what you want, but this function takes a BSgenome instance, a chromosome, and a tile width, and calculated CpG % in windows across a particular chromosome

CpG <-
    function(bsgenome, chr, tilewidth)
    dna <- bsgenome[[chr]]

    ## CpG on the plus and minus strand (?)
    islands <- matchPDict(DNAStringSet(c("GC", "CG")), dna)
    cvg <- coverage(islands)    # CpG island coverage

    tiles <- tileGenome(seqlengths(bsgenome)[chr], tilewidth=tilewidth,

    ## Average coverage in each tile
    ## Divide by 2 so each CpG counts only once
    v <- Views(cvg, ranges(tiles))
    tiles$CpG <- viewSums(v) / width(v) / 2

This would seem to be a relatively effective way to quickly visualize CpG content, e.g.,

gr <- CpG(BSgenome.Hsapiens.UCSC.hg19, "chr17", 10000)
plot(start(gr) + width(gr) / 2, gr$CpG, pch=".")

Another formulation might slide rather than tile the window across coverage, along the lines of

slidewidth = 10000
diff(cumsum(cvg), lag=slidewidth) / slidewidth / 2

I'm not sure how to visualize this with Gviz; for smaller regions one might use getSeq() to get the DNA sequence of the specific region.

ADD REPLYlink written 2.3 years ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 270 users visited in the last hour