Question: Compute the CpG content for all chromosome like the percentage of G+C
0
4.3 years ago by
France
Tiphaine Martin40 wrote:

Hi,

I would like to visualize not the GC content but CpG content in a track of Gviz, for example.

Do you have an idea to do that ?

Regards,

Tiphaine

cpg • 1.1k views
modified 4.3 years ago • written 4.3 years ago by Tiphaine Martin40
Answer: Compute the CpG content for all chromosome like the percentage of G+C
0
4.3 years ago by
United States
James W. MacDonald50k wrote:
It depends on what exactly you mean by 'CpG content'. If you mean that you want to plot the CpG islands, then please note that the very first example in the Gviz vignette shows just how to do that. Or do you mean something else?
Answer: Compute the CpG content for all chromosome like the percentage of G+C
0
4.3 years ago by
France
Tiphaine Martin40 wrote:

I would like not only the CpG island but  the distribution of CpG along the chromosome or in a genomic region.

So it is like the percentage of CpG in window and this window slips a step along the chromosome (1 or more nucleotide).

I'm not sure if this is exactly what you want, but this function takes a BSgenome instance, a chromosome, and a tile width, and calculated CpG % in windows across a particular chromosome

CpG <-
function(bsgenome, chr, tilewidth)
{
dna <- bsgenome[[chr]]

## CpG on the plus and minus strand (?)
islands <- matchPDict(DNAStringSet(c("GC", "CG")), dna)
cvg <- coverage(islands)    # CpG island coverage

tiles <- tileGenome(seqlengths(bsgenome)[chr], tilewidth=tilewidth,
cut.last.tile.in.chrom=TRUE)

## Average coverage in each tile
## Divide by 2 so each CpG counts only once
v <- Views(cvg, ranges(tiles))
tiles$CpG <- viewSums(v) / width(v) / 2 tiles } This would seem to be a relatively effective way to quickly visualize CpG content, e.g., library(BSgenome.Hsapiens.UCSC.hg19) gr <- CpG(BSgenome.Hsapiens.UCSC.hg19, "chr17", 10000) plot(start(gr) + width(gr) / 2, gr$CpG, pch=".")

Another formulation might slide rather than tile the window across coverage, along the lines of

slidewidth = 10000
diff(cumsum(cvg), lag=slidewidth) / slidewidth / 2

I'm not sure how to visualize this with Gviz; for smaller regions one might use getSeq() to get the DNA sequence of the specific region.