Question: Compute the CpG content for all chromosome like the percentage of G+C
gravatar for Tiphaine Martin
3.2 years ago by
Tiphaine Martin40 wrote:


I would like to visualize not the GC content but CpG content in a track of Gviz, for example.

Do you have an idea to do that ?



ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Tiphaine Martin40
gravatar for James W. MacDonald
3.2 years ago by
United States
James W. MacDonald46k wrote:
It depends on what exactly you mean by 'CpG content'. If you mean that you want to plot the CpG islands, then please note that the very first example in the Gviz vignette shows just how to do that. Or do you mean something else?
ADD COMMENTlink written 3.2 years ago by James W. MacDonald46k
gravatar for Tiphaine Martin
3.2 years ago by
Tiphaine Martin40 wrote:

I would like not only the CpG island but  the distribution of CpG along the chromosome or in a genomic region.

So it is like the percentage of CpG in window and this window slips a step along the chromosome (1 or more nucleotide). 

ADD COMMENTlink written 3.2 years ago by Tiphaine Martin40

I'm not sure if this is exactly what you want, but this function takes a BSgenome instance, a chromosome, and a tile width, and calculated CpG % in windows across a particular chromosome

CpG <-
    function(bsgenome, chr, tilewidth)
    dna <- bsgenome[[chr]]

    ## CpG on the plus and minus strand (?)
    islands <- matchPDict(DNAStringSet(c("GC", "CG")), dna)
    cvg <- coverage(islands)    # CpG island coverage

    tiles <- tileGenome(seqlengths(bsgenome)[chr], tilewidth=tilewidth,

    ## Average coverage in each tile
    ## Divide by 2 so each CpG counts only once
    v <- Views(cvg, ranges(tiles))
    tiles$CpG <- viewSums(v) / width(v) / 2

This would seem to be a relatively effective way to quickly visualize CpG content, e.g.,

gr <- CpG(BSgenome.Hsapiens.UCSC.hg19, "chr17", 10000)
plot(start(gr) + width(gr) / 2, gr$CpG, pch=".")

Another formulation might slide rather than tile the window across coverage, along the lines of

slidewidth = 10000
diff(cumsum(cvg), lag=slidewidth) / slidewidth / 2

I'm not sure how to visualize this with Gviz; for smaller regions one might use getSeq() to get the DNA sequence of the specific region.

ADD REPLYlink written 3.2 years ago by Martin Morgan ♦♦ 21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 173 users visited in the last hour