Question: pileup coverage from BAM file
0
gravatar for Chris Cabanski
5.9 years ago by
Chris Cabanski30 wrote:
Hi, I have a bam file and I am trying to calculate the coverage at each exon position (base) within a gene. First, I find the exon boundaries for the gene of interest: library(org.Hs.eg.db) library(TxDb.Hsapiens.UCSC.hg19.knownGene) genesym <- c("CDKN2A") geneid <- select(org.Hs.eg.db, keys=genesym, keytype="SYMBOL",cols="ENTREZID") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene ex <- exonsBy(txdb, "gene") ex1 <- ex[[geneid$ENTREZID[1]]] > ex1 GRanges with 6 ranges and 2 metadata columns: seqnames ranges strand | exon_id exon_name <rle> <iranges> <rle> | <integer> <character> [1] chr9 [21967751, 21968241] - | 127318 <na> [2] chr9 [21968574, 21968770] - | 127319 <na> [3] chr9 [21970901, 21971207] - | 127320 <na> [4] chr9 [21974403, 21974826] - | 127321 <na> [5] chr9 [21974677, 21975132] - | 127322 <na> [6] chr9 [21994138, 21994490] - | 127323 <na> --- seqlengths: chr1 chr2 ... chrUn_gl000249 249250621 243199373 ... 38502 Next, I would like to find the coverage at each of these positions and this is where I am getting stuck. I have tried to follow example(applyPileups) but I get an error and am not sure that I am calculating coverage at the base-level. library(Rsamtools) # create character list of BAM files (only use first file for now) fls <- scan("SortedBams_tumor.txt",what='character') calcInfo <- function(x) { ## information at each pile-up position info <- apply(x[["seq"]], 2, function(y) { y <- y[c("A", "C", "G", "T"),] y <- y + 1L # continuity cvg <- colSums(y) p <- y / cvg[col(y)] h <- -colSums(p * log(p)) ifelse(cvg == 4L, NA, h) }) list(seqnames=x[["seqnames"]], pos=x[["pos"]], info=info) } which <- ex1 param <- PileupParam(which=which, what=c("seq","qual"), maxDepth=1000L, yieldBy="position", yieldAll=TRUE) fl <- PileupFiles(fls[1],param=param) res <- applyPileups(fl,calcInfo,param=param) > Error: applyPileups: 'x' must be an array of at least two dimensions Is this correct route that I should be taking, or should I be going about this differently? Thanks in advance, Chris > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsamtools_1.10.2 [2] Biostrings_2.26.3 [3] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 [4] GenomicFeatures_1.10.1 [5] GenomicRanges_1.10.6 [6] IRanges_1.16.4 [7] org.Hs.eg.db_2.8.0 [8] RSQLite_0.9-4 [9] DBI_0.2-5 [10] AnnotationDbi_1.20.3 [11] Biobase_2.18.0 [12] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] BSgenome_1.26.1 RCurl_1.5-0 XML_3.2-0 biomaRt_2.14.0 [5] bitops_1.0-4.1 parallel_2.15.2 rtracklayer_1.18.2 stats4_2.15.2 [9] tools_2.15.2 zlibbioc_1.4.0 -- Christopher Cabanski, PhD Postdoctoral Research Associate Department of Medicine, Oncology Division / The Genome Institute Washington University in St. Louis
coverage • 949 views
ADD COMMENTlink written 5.9 years ago by Chris Cabanski30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 330 users visited in the last hour