Search
Question: VariantTools, Shearwater and reference genome
0
4.1 years ago by
Asma rabe290
Japan
Asma rabe290 wrote:

Hi All,

Below is a part of shearwater manual with example:

#-------------

3.2 More realistic example

Suppose the bam files are in folder ./bam and the regions of interest are stored in a GRanges() object with
metadata column Gene, indicating which region (typically exons for a pulldown experiment) belongs to
which gene. Also assume that we have a tabix indexed vcf file CosmicCodingMuts_v63_300113.vcf.gz.

The analysis can be parallelized by separately analysing each gene, which is the unit needed to compute
the prior using makePrior.

## Not run

files <- dir("bam", pattern = "*.bam$", full.names = TRUE) MC_CORES <- getOption("mc.cores", 2L) vcfList <- list() for (gene in levels(mcols(regions)$Gene)) {
rgn <- regions[mcols(regions)Gene == gene] counts <- loadAllData(files, rgn, mc.cores = MC_CORES) ## Split into BF <- mcChunk("bbb", split = 200, counts, mc.cores = MC_CORES) COSMIC <- readVcf("CosmicCodingMuts_v63_300113.vcf.gz", "GRCh37", param = ScanVcfParam(which = rgn)) prior <- makePrior(COSMIC, rgn, pi.mut = 0.5) vcfList[[gene]] <- bf2Vcf(BF = BF, counts = counts, regions = rgn, samples = files, cutoff = 0.5, prior = prior) } ## Collapse vcfList vcf <- do.call(rbind, vcfList) #----------------------- In this example no information about reference genome is provided?Can anyone explains why? is it due to bam files are already aligned files? ================ Here is another example to call variants using VariantTools where i can not see information for reference genome. Is it also due to that BAM files are aligned files?? bams <- LungCancerLines::LungCancerBamFiles() tally.param <- TallyVariantsParam(gmapR::TP53Genome(), high_base_quality = 23L, which = gmapR::TP53Which()) ## simple usage variants <- callVariants(bamsH1993, tally.param)

## customize

calling.filters <- VariantCallingFilters(p.error = 1/1000)
callVariants(bams\$H1993, tally.param, calling.filters)

Thank you

modified 4.1 years ago by James W. MacDonald48k • written 4.1 years ago by Asma rabe290
0
4.1 years ago by
United States
James W. MacDonald48k wrote:

The answer to both questions is yes. Bam files are intended to contain information about the read and where (and how well) it aligns to the reference. Any further information about the reference genome would be redundant.

Thanks James. I ran shearwater using bam files but got an output in which reference nucleotides is different from reference genome for some positions. What might be the reason for this error?

I don't know; having never used deepSNV, nor seeing your code, any guess on my part would be just that. A completely blind guess.

Anyway, are you sure this is an error? How does shearwater determine the reference sequence? Does it use an actual reference, or does it infer from the data in hand? My very limited understanding of deepSNV is that it is intended to be used for experiments with ultra deep coverage, so perhaps 'reference' refers to the most common base at a given position, rather than the sequence from the reference genome.

If you are sure that the reference in this situation actually refers to a particular reference genome, then you might need to contact Moritz Gerstung directly, as I don't see any evidence that he subscribes to this support site.