VariantTools, Shearwater and reference genome
1
0
Entering edit mode
Asma rabe ▴ 290
@asma-rabe-4697
Last seen 6.2 years ago
Japan

Hi All,

Below is a part of shearwater manual with example:

#-------------

3.2 More realistic example

Suppose the bam files are in folder ./bam and the regions of interest are stored in a GRanges() object with
metadata column Gene, indicating which region (typically exons for a pulldown experiment) belongs to
which gene. Also assume that we have a tabix indexed vcf file CosmicCodingMuts_v63_300113.vcf.gz.

The analysis can be parallelized by separately analysing each gene, which is the unit needed to compute
the prior using makePrior.

## Not run

files <- dir("bam", pattern = "*.bam$", full.names = TRUE)
MC_CORES <- getOption("mc.cores", 2L)
vcfList <- list()
for (gene in levels(mcols(regions)$Gene)) {
  rgn <- regions[mcols(regions)$Gene == gene]
  counts <- loadAllData(files, rgn, mc.cores = MC_CORES)

  ## Split into

  BF <- mcChunk("bbb", split = 200, counts, mc.cores = MC_CORES)
  COSMIC <- readVcf("CosmicCodingMuts_v63_300113.vcf.gz", "GRCh37", param =   
    ScanVcfParam(which = rgn))
  prior <- makePrior(COSMIC, rgn, pi.mut = 0.5)
  vcfList[[gene]] <- bf2Vcf(BF = BF, counts = counts, regions = rgn, samples = files,
    cutoff = 0.5, prior = prior)
}

## Collapse vcfList

vcf <- do.call(rbind, vcfList)

#-----------------------

In this example no information about reference genome is provided?Can anyone explains why?  is it due to bam files are already aligned files?

================

Here is another example to call variants using VariantTools where i can not see information for reference genome. Is it also due to that BAM files are aligned files??

bams <- LungCancerLines::LungCancerBamFiles()
  tally.param <- TallyVariantsParam(gmapR::TP53Genome(),
  high_base_quality = 23L,
  which = gmapR::TP53Which())

## simple usage

variants <- callVariants(bams$H1993, tally.param)

## customize

calling.filters <- VariantCallingFilters(p.error = 1/1000)
callVariants(bams$H1993, tally.param, calling.filters)

Thank you

sequencing • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States

The answer to both questions is yes. Bam files are intended to contain information about the read and where (and how well) it aligns to the reference. Any further information about the reference genome would be redundant.

ADD COMMENT
0
Entering edit mode

Thanks James. I ran shearwater using bam files but got an output in which reference nucleotides is different from reference genome for some positions. What might be the reason for this error?

ADD REPLY
0
Entering edit mode

I don't know; having never used deepSNV, nor seeing your code, any guess on my part would be just that. A completely blind guess.

Anyway, are you sure this is an error? How does shearwater determine the reference sequence? Does it use an actual reference, or does it infer from the data in hand? My very limited understanding of deepSNV is that it is intended to be used for experiments with ultra deep coverage, so perhaps 'reference' refers to the most common base at a given position, rather than the sequence from the reference genome.

If you are sure that the reference in this situation actually refers to a particular reference genome, then you might need to contact Moritz Gerstung directly, as I don't see any evidence that he subscribes to this support site.

ADD REPLY

Login before adding your answer.

Traffic: 936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6