Entering edit mode
Taylor, Sean D
▴
250
@taylor-sean-d-5800
Last seen 10.2 years ago
I am trying to read a specific set of records from a bam file for use
in the VariantTools package. I'm trying to construct a which argument
(a GRanges object) that will pull in a set of records from all reads
that only start at a specified position. (i.e. all reads that start at
position 100). So far I have only been able to specify reads that
overlap position 100, but have not been able to find a way to define
the start site.
#Example code:
> bams <- LungCancerLines::LungCancerBamFiles()
> bam <- bams$H1993
> which<-GRanges(seqnames=c('TP53'), IRanges(1110426, width=1), '+')
> tally.param <- VariantTallyParam(gmapR::TP53Genome(),
+ readlen = 100L,
+ high_base_quality = 23L,
+ which = which)
> raw.variants <- tallyVariants(bam, tally.param)
This code shows all the variants at position 1110426, but not all the
variants from the reads that start at position 1110426.
Ultimately, I am trying to do this for all start positions in my data
set, so I would want something that looks like this pseudocode:
> raw.variants<-lapply (start(bam), function (x){
which<-GRanges(seqnames=c('chrM'), '+', start=x)
tally.param<-VariantTallyParam(gmap, readlen=100L, which=which)
tallyVariants(bamfile, tally.param)
})
Thanks,
Sean
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel stats graphics grDevices utils
datasets methods base
other attached packages:
[1] RSQLite_0.11.4 DBI_0.2-7
BiocInstaller_1.10.2
[4] LungCancerLines_0.0.8 GenomicFeatures_1.12.1
AnnotationDbi_1.22.5
[7] Biobase_2.20.0 gmapR_1.2.0
latticeExtra_0.6-24
[10] lattice_0.20-15 RColorBrewer_1.0-5 genoPlotR_0.8
[13] ade4_1.5-2 VariantTools_1.2.2
VariantAnnotation_1.6.6
[16] Rsamtools_1.12.3 Biostrings_2.28.0
GenomicRanges_1.12.4
[19] IRanges_1.18.1 BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] biomaRt_2.16.0 bitops_1.0-5
[3] BSgenome_1.28.0
BSgenome.Hsapiens.UCSC.hg19_1.3.19
[5] graph_1.38.2 Matrix_1.0-12
[7] org.Hs.eg.db_2.9.0 RBGL_1.36.2
[9] RCurl_1.95-4.1 rtracklayer_1.20.2
[11] stats4_3.0.1 tools_3.0.1
[13] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 XML_3.96-1.1
[15] zlibbioc_1.6.0
[[alternative HTML version deleted]]