Question: VariantTools, Shearwater and reference genome
gravatar for Asma rabe
4.3 years ago by
Asma rabe290
Asma rabe290 wrote:

Hi All,

Below is a part of shearwater manual with example:


3.2 More realistic example

Suppose the bam files are in folder ./bam and the regions of interest are stored in a GRanges() object with
metadata column Gene, indicating which region (typically exons for a pulldown experiment) belongs to
which gene. Also assume that we have a tabix indexed vcf file CosmicCodingMuts_v63_300113.vcf.gz.

The analysis can be parallelized by separately analysing each gene, which is the unit needed to compute
the prior using makePrior.

## Not run

files <- dir("bam", pattern = "*.bam$", full.names = TRUE)
MC_CORES <- getOption("mc.cores", 2L)
vcfList <- list()
for (gene in levels(mcols(regions)$Gene)) {
  rgn <- regions[mcols(regions)$Gene == gene]
  counts <- loadAllData(files, rgn, mc.cores = MC_CORES)

  ## Split into

  BF <- mcChunk("bbb", split = 200, counts, mc.cores = MC_CORES)
  COSMIC <- readVcf("CosmicCodingMuts_v63_300113.vcf.gz", "GRCh37", param =   
    ScanVcfParam(which = rgn))
  prior <- makePrior(COSMIC, rgn, pi.mut = 0.5)
  vcfList[[gene]] <- bf2Vcf(BF = BF, counts = counts, regions = rgn, samples = files,
    cutoff = 0.5, prior = prior)

## Collapse vcfList

vcf <-, vcfList)


In this example no information about reference genome is provided?Can anyone explains why?  is it due to bam files are already aligned files?


Here is another example to call variants using VariantTools where i can not see information for reference genome. Is it also due to that BAM files are aligned files??

bams <- LungCancerLines::LungCancerBamFiles()
  tally.param <- TallyVariantsParam(gmapR::TP53Genome(),
  high_base_quality = 23L,
  which = gmapR::TP53Which())

## simple usage

variants <- callVariants(bams$H1993, tally.param)

## customize

calling.filters <- VariantCallingFilters(p.error = 1/1000)
callVariants(bams$H1993, tally.param, calling.filters)

Thank you

sequencing • 710 views
ADD COMMENTlink modified 4.3 years ago by James W. MacDonald49k • written 4.3 years ago by Asma rabe290
Answer: VariantTools, Shearwater and reference genome
gravatar for James W. MacDonald
4.3 years ago by
United States
James W. MacDonald49k wrote:

The answer to both questions is yes. Bam files are intended to contain information about the read and where (and how well) it aligns to the reference. Any further information about the reference genome would be redundant.

ADD COMMENTlink written 4.3 years ago by James W. MacDonald49k

Thanks James. I ran shearwater using bam files but got an output in which reference nucleotides is different from reference genome for some positions. What might be the reason for this error?

ADD REPLYlink written 4.3 years ago by Asma rabe290

I don't know; having never used deepSNV, nor seeing your code, any guess on my part would be just that. A completely blind guess.

Anyway, are you sure this is an error? How does shearwater determine the reference sequence? Does it use an actual reference, or does it infer from the data in hand? My very limited understanding of deepSNV is that it is intended to be used for experiments with ultra deep coverage, so perhaps 'reference' refers to the most common base at a given position, rather than the sequence from the reference genome.

If you are sure that the reference in this situation actually refers to a particular reference genome, then you might need to contact Moritz Gerstung directly, as I don't see any evidence that he subscribes to this support site.

ADD REPLYlink written 4.3 years ago by James W. MacDonald49k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 300 users visited in the last hour