Search
Question: Troubleshooting a histogram DataTrack in Gviz
0
13 months ago by
Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada
Russ Fraser0 wrote:

Hello all,

I would like to add a track to my Gviz plot that displays the frequency of variants in a specified region. E.g., I have the structure of Gene A, including exons and introns, and I want to quickly visualize how many variants are present in (arbitrarily defined) 100 bp bins along this gene. Hopefully this will highlight variable regions of gene A, or, conversely, illustrate that variation is pretty similar across the gene.

Progress so far: using VariantAnnotation, I created a DataTrack out of a VCF file as follows:

vcf <- readVcf("my_variants.vcf", genome = genome)
vcf_track <- DataTrack(rowRanges(vcf), name = "Variants", type = "histogram", window = 100)

I then plot the tracks:

plotTracks(c(itrack, bmt, dtrack, vcf_track, strack), from = start_loc, to = end_loc)

Everything looks great except the vcf_track. Instead of plotting the number of entries in the 100 bp window, it instead plots the values from the "QUAL" metadata column - or, more specifically, the mean of the QUAL values for SNPs in the 100 bp window (i.e. the default value of aggregation). What I really want is the number of QUAL values in that window - I don't care what they are - but I am having trouble finding the appropriate function to pass to aggregation. Any suggestions? Or have I totally misinterpreted how Gviz is constructing this histogram?

Thanks,

Russ

modified 13 months ago • written 13 months ago by Russ Fraser0

Although I couldn't find the answer to my question, I was able to perform a workaround.

In short, I imported the VCF as a dataframe into R, keeping only chromosome and position info (cols 1 and 2). I then used the hist function to create breaks and counts of my data. This info can be accessed as follows:

hist_info <- hist(data)
hist_info
hist_info$breaks hist_info$counts

With a bit of massaging, I was able to construct a GRange object using the breaks as start/end positions, and assigning the hist\$count as an mcol. Perhaps not the most elegant solution, but got the job done.

The end product: