pileup coverage from BAM file
0
0
Entering edit mode
@chris-cabanski-5801
Last seen 10.3 years ago
Hi, I have a bam file and I am trying to calculate the coverage at each exon position (base) within a gene. First, I find the exon boundaries for the gene of interest: library(org.Hs.eg.db) library(TxDb.Hsapiens.UCSC.hg19.knownGene) genesym <- c("CDKN2A") geneid <- select(org.Hs.eg.db, keys=genesym, keytype="SYMBOL",cols="ENTREZID") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene ex <- exonsBy(txdb, "gene") ex1 <- ex[[geneid$ENTREZID[1]]] > ex1 GRanges with 6 ranges and 2 metadata columns: seqnames ranges strand | exon_id exon_name <rle> <iranges> <rle> | <integer> <character> [1] chr9 [21967751, 21968241] - | 127318 <na> [2] chr9 [21968574, 21968770] - | 127319 <na> [3] chr9 [21970901, 21971207] - | 127320 <na> [4] chr9 [21974403, 21974826] - | 127321 <na> [5] chr9 [21974677, 21975132] - | 127322 <na> [6] chr9 [21994138, 21994490] - | 127323 <na> --- seqlengths: chr1 chr2 ... chrUn_gl000249 249250621 243199373 ... 38502 Next, I would like to find the coverage at each of these positions and this is where I am getting stuck. I have tried to follow example(applyPileups) but I get an error and am not sure that I am calculating coverage at the base-level. library(Rsamtools) # create character list of BAM files (only use first file for now) fls <- scan("SortedBams_tumor.txt",what='character') calcInfo <- function(x) { ## information at each pile-up position info <- apply(x[["seq"]], 2, function(y) { y <- y[c("A", "C", "G", "T"),] y <- y + 1L # continuity cvg <- colSums(y) p <- y / cvg[col(y)] h <- -colSums(p * log(p)) ifelse(cvg == 4L, NA, h) }) list(seqnames=x[["seqnames"]], pos=x[["pos"]], info=info) } which <- ex1 param <- PileupParam(which=which, what=c("seq","qual"), maxDepth=1000L, yieldBy="position", yieldAll=TRUE) fl <- PileupFiles(fls[1],param=param) res <- applyPileups(fl,calcInfo,param=param) > Error: applyPileups: 'x' must be an array of at least two dimensions Is this correct route that I should be taking, or should I be going about this differently? Thanks in advance, Chris > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsamtools_1.10.2 [2] Biostrings_2.26.3 [3] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 [4] GenomicFeatures_1.10.1 [5] GenomicRanges_1.10.6 [6] IRanges_1.16.4 [7] org.Hs.eg.db_2.8.0 [8] RSQLite_0.9-4 [9] DBI_0.2-5 [10] AnnotationDbi_1.20.3 [11] Biobase_2.18.0 [12] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] BSgenome_1.26.1 RCurl_1.5-0 XML_3.2-0 biomaRt_2.14.0 [5] bitops_1.0-4.1 parallel_2.15.2 rtracklayer_1.18.2 stats4_2.15.2 [9] tools_2.15.2 zlibbioc_1.4.0 -- Christopher Cabanski, PhD Postdoctoral Research Associate Department of Medicine, Oncology Division / The Genome Institute Washington University in St. Louis
Coverage Coverage • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 1342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6