Search
Question: VariantTools, Shearwater and reference genome
0
gravatar for Asma rabe
3.9 years ago by
Asma rabe290
Japan
Asma rabe290 wrote:

Hi All,

Below is a part of shearwater manual with example:

#-------------

3.2 More realistic example

Suppose the bam files are in folder ./bam and the regions of interest are stored in a GRanges() object with
metadata column Gene, indicating which region (typically exons for a pulldown experiment) belongs to
which gene. Also assume that we have a tabix indexed vcf file CosmicCodingMuts_v63_300113.vcf.gz.

The analysis can be parallelized by separately analysing each gene, which is the unit needed to compute
the prior using makePrior.

## Not run

files <- dir("bam", pattern = "*.bam$", full.names = TRUE)
MC_CORES <- getOption("mc.cores", 2L)
vcfList <- list()
for (gene in levels(mcols(regions)$Gene)) {
  rgn <- regions[mcols(regions)$Gene == gene]
  counts <- loadAllData(files, rgn, mc.cores = MC_CORES)

  ## Split into

  BF <- mcChunk("bbb", split = 200, counts, mc.cores = MC_CORES)
  COSMIC <- readVcf("CosmicCodingMuts_v63_300113.vcf.gz", "GRCh37", param =   
    ScanVcfParam(which = rgn))
  prior <- makePrior(COSMIC, rgn, pi.mut = 0.5)
  vcfList[[gene]] <- bf2Vcf(BF = BF, counts = counts, regions = rgn, samples = files,
    cutoff = 0.5, prior = prior)
}

## Collapse vcfList

vcf <- do.call(rbind, vcfList)

#-----------------------

In this example no information about reference genome is provided?Can anyone explains why?  is it due to bam files are already aligned files?

================

Here is another example to call variants using VariantTools where i can not see information for reference genome. Is it also due to that BAM files are aligned files??

bams <- LungCancerLines::LungCancerBamFiles()
  tally.param <- TallyVariantsParam(gmapR::TP53Genome(),
  high_base_quality = 23L,
  which = gmapR::TP53Which())

## simple usage

variants <- callVariants(bams$H1993, tally.param)

## customize

calling.filters <- VariantCallingFilters(p.error = 1/1000)
callVariants(bams$H1993, tally.param, calling.filters)

Thank you

ADD COMMENTlink modified 3.9 years ago by James W. MacDonald47k • written 3.9 years ago by Asma rabe290
0
gravatar for James W. MacDonald
3.9 years ago by
United States
James W. MacDonald47k wrote:

The answer to both questions is yes. Bam files are intended to contain information about the read and where (and how well) it aligns to the reference. Any further information about the reference genome would be redundant.

ADD COMMENTlink written 3.9 years ago by James W. MacDonald47k

Thanks James. I ran shearwater using bam files but got an output in which reference nucleotides is different from reference genome for some positions. What might be the reason for this error?

ADD REPLYlink written 3.9 years ago by Asma rabe290

I don't know; having never used deepSNV, nor seeing your code, any guess on my part would be just that. A completely blind guess.

Anyway, are you sure this is an error? How does shearwater determine the reference sequence? Does it use an actual reference, or does it infer from the data in hand? My very limited understanding of deepSNV is that it is intended to be used for experiments with ultra deep coverage, so perhaps 'reference' refers to the most common base at a given position, rather than the sequence from the reference genome.

If you are sure that the reference in this situation actually refers to a particular reference genome, then you might need to contact Moritz Gerstung directly, as I don't see any evidence that he subscribes to this support site.

ADD REPLYlink written 3.9 years ago by James W. MacDonald47k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 131 users visited in the last hour