Question

VariantTools, Shearwater and reference genome

0

Entering edit mode

Asma rabe ▴ 290

@asma-rabe-4697

Last seen 6.2 years ago

Japan

Hi All,

Below is a part of shearwater manual with example:

#-------------

3.2 More realistic example

Suppose the bam files are in folder ./bam and the regions of interest are stored in a GRanges() object with
metadata column Gene, indicating which region (typically exons for a pulldown experiment) belongs to
which gene. Also assume that we have a tabix indexed vcf file CosmicCodingMuts_v63_300113.vcf.gz.

The analysis can be parallelized by separately analysing each gene, which is the unit needed to compute
the prior using makePrior.

## Not run

files <- dir("bam", pattern = "*.bam$", full.names = TRUE)
MC_CORES <- getOption("mc.cores", 2L)
vcfList <- list()
for (gene in levels(mcols(regions)$Gene)) {
  rgn <- regions[mcols(regions)$Gene == gene]
  counts <- loadAllData(files, rgn, mc.cores = MC_CORES)

  ## Split into

  BF <- mcChunk("bbb", split = 200, counts, mc.cores = MC_CORES)
  COSMIC <- readVcf("CosmicCodingMuts_v63_300113.vcf.gz", "GRCh37", param =   
    ScanVcfParam(which = rgn))
  prior <- makePrior(COSMIC, rgn, pi.mut = 0.5)
  vcfList[[gene]] <- bf2Vcf(BF = BF, counts = counts, regions = rgn, samples = files,
    cutoff = 0.5, prior = prior)
}

## Collapse vcfList

vcf <- do.call(rbind, vcfList)

#-----------------------

In this example no information about reference genome is provided?Can anyone explains why? is it due to bam files are already aligned files?

================

Here is another example to call variants using VariantTools where i can not see information for reference genome. Is it also due to that BAM files are aligned files??

bams <- LungCancerLines::LungCancerBamFiles()
  tally.param <- TallyVariantsParam(gmapR::TP53Genome(),
  high_base_quality = 23L,
  which = gmapR::TP53Which())

## simple usage

variants <- callVariants(bams$H1993, tally.param)

## customize

calling.filters <- VariantCallingFilters(p.error = 1/1000)
callVariants(bams$H1993, tally.param, calling.filters)

Thank you

sequencing • 1.3k views

ADD COMMENT • link updated 9.6 years ago by James W. MacDonald 65k • written 9.6 years ago by Asma rabe ▴ 290

score 0 · Answer 1 · 2014-09-29

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 12 hours ago

United States

The answer to both questions is yes. Bam files are intended to contain information about the read and where (and how well) it aligns to the reference. Any further information about the reference genome would be redundant.

ADD COMMENT • link 9.6 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks James. I ran shearwater using bam files but got an output in which reference nucleotides is different from reference genome for some positions. What might be the reason for this error?

ADD REPLY • link 9.6 years ago Asma rabe ▴ 290

0

Entering edit mode

I don't know; having never used deepSNV, nor seeing your code, any guess on my part would be just that. A completely blind guess.

Anyway, are you sure this is an error? How does shearwater determine the reference sequence? Does it use an actual reference, or does it infer from the data in hand? My very limited understanding of deepSNV is that it is intended to be used for experiments with ultra deep coverage, so perhaps 'reference' refers to the most common base at a given position, rather than the sequence from the reference genome.

If you are sure that the reference in this situation actually refers to a particular reference genome, then you might need to contact Moritz Gerstung directly, as I don't see any evidence that he subscribes to this support site.

ADD REPLY • link 9.6 years ago James W. MacDonald 65k