Entering edit mode
Chris Cabanski
▴
30
@chris-cabanski-5801
Last seen 10.3 years ago
Hi,
I have a bam file and I am trying to calculate the coverage at each
exon
position (base) within a gene. First, I find the exon boundaries for
the
gene of interest:
library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
genesym <- c("CDKN2A")
geneid <- select(org.Hs.eg.db, keys=genesym,
keytype="SYMBOL",cols="ENTREZID")
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ex <- exonsBy(txdb, "gene")
ex1 <- ex[[geneid$ENTREZID[1]]]
> ex1
GRanges with 6 ranges and 2 metadata columns:
seqnames ranges strand | exon_id exon_name
<rle> <iranges> <rle> | <integer> <character>
[1] chr9 [21967751, 21968241] - | 127318 <na>
[2] chr9 [21968574, 21968770] - | 127319 <na>
[3] chr9 [21970901, 21971207] - | 127320 <na>
[4] chr9 [21974403, 21974826] - | 127321 <na>
[5] chr9 [21974677, 21975132] - | 127322 <na>
[6] chr9 [21994138, 21994490] - | 127323 <na>
---
seqlengths:
chr1 chr2 ...
chrUn_gl000249
249250621 243199373 ...
38502
Next, I would like to find the coverage at each of these positions and
this is where I am getting stuck. I have tried to follow
example(applyPileups) but I get an error and am not sure that I am
calculating coverage at the base-level.
library(Rsamtools)
# create character list of BAM files (only use first file for now)
fls <- scan("SortedBams_tumor.txt",what='character')
calcInfo <- function(x) {
## information at each pile-up position
info <- apply(x[["seq"]], 2, function(y) {
y <- y[c("A", "C", "G", "T"),]
y <- y + 1L
# continuity
cvg <- colSums(y)
p <- y / cvg[col(y)]
h <- -colSums(p * log(p))
ifelse(cvg == 4L, NA, h)
})
list(seqnames=x[["seqnames"]], pos=x[["pos"]], info=info)
}
which <- ex1
param <- PileupParam(which=which, what=c("seq","qual"),
maxDepth=1000L, yieldBy="position", yieldAll=TRUE)
fl <- PileupFiles(fls[1],param=param)
res <- applyPileups(fl,calcInfo,param=param)
> Error: applyPileups: 'x' must be an array of at least two
dimensions
Is this correct route that I should be taking, or should I be going
about
this differently?
Thanks in advance,
Chris
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsamtools_1.10.2
[2] Biostrings_2.26.3
[3] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0
[4] GenomicFeatures_1.10.1
[5] GenomicRanges_1.10.6
[6] IRanges_1.16.4
[7] org.Hs.eg.db_2.8.0
[8] RSQLite_0.9-4
[9] DBI_0.2-5
[10] AnnotationDbi_1.20.3
[11] Biobase_2.18.0
[12] BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] BSgenome_1.26.1 RCurl_1.5-0 XML_3.2-0
biomaRt_2.14.0
[5] bitops_1.0-4.1 parallel_2.15.2 rtracklayer_1.18.2
stats4_2.15.2
[9] tools_2.15.2 zlibbioc_1.4.0
--
Christopher Cabanski, PhD
Postdoctoral Research Associate
Department of Medicine, Oncology Division / The Genome Institute
Washington University in St. Louis