Question

pileup coverage from BAM file

0

Entering edit mode

Chris Cabanski ▴ 30

@chris-cabanski-5801

Last seen 9.6 years ago

Hi, I have a bam file and I am trying to calculate the coverage at each exon position (base) within a gene. First, I find the exon boundaries for the gene of interest: library(org.Hs.eg.db) library(TxDb.Hsapiens.UCSC.hg19.knownGene) genesym <- c("CDKN2A") geneid <- select(org.Hs.eg.db, keys=genesym, keytype="SYMBOL",cols="ENTREZID") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene ex <- exonsBy(txdb, "gene") ex1 <- ex[[geneid$ENTREZID[1]]] > ex1 GRanges with 6 ranges and 2 metadata columns: seqnames ranges strand | exon_id exon_name <rle> <iranges> <rle> | <integer> <character> [1] chr9 [21967751, 21968241] - | 127318 <na> [2] chr9 [21968574, 21968770] - | 127319 <na> [3] chr9 [21970901, 21971207] - | 127320 <na> [4] chr9 [21974403, 21974826] - | 127321 <na> [5] chr9 [21974677, 21975132] - | 127322 <na> [6] chr9 [21994138, 21994490] - | 127323 <na> --- seqlengths: chr1 chr2 ... chrUn_gl000249 249250621 243199373 ... 38502 Next, I would like to find the coverage at each of these positions and this is where I am getting stuck. I have tried to follow example(applyPileups) but I get an error and am not sure that I am calculating coverage at the base-level. library(Rsamtools) # create character list of BAM files (only use first file for now) fls <- scan("SortedBams_tumor.txt",what='character') calcInfo <- function(x) { ## information at each pile-up position info <- apply(x[["seq"]], 2, function(y) { y <- y[c("A", "C", "G", "T"),] y <- y + 1L # continuity cvg <- colSums(y) p <- y / cvg[col(y)] h <- -colSums(p * log(p)) ifelse(cvg == 4L, NA, h) }) list(seqnames=x[["seqnames"]], pos=x[["pos"]], info=info) } which <- ex1 param <- PileupParam(which=which, what=c("seq","qual"), maxDepth=1000L, yieldBy="position", yieldAll=TRUE) fl <- PileupFiles(fls[1],param=param) res <- applyPileups(fl,calcInfo,param=param) > Error: applyPileups: 'x' must be an array of at least two dimensions Is this correct route that I should be taking, or should I be going about this differently? Thanks in advance, Chris > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsamtools_1.10.2 [2] Biostrings_2.26.3 [3] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 [4] GenomicFeatures_1.10.1 [5] GenomicRanges_1.10.6 [6] IRanges_1.16.4 [7] org.Hs.eg.db_2.8.0 [8] RSQLite_0.9-4 [9] DBI_0.2-5 [10] AnnotationDbi_1.20.3 [11] Biobase_2.18.0 [12] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] BSgenome_1.26.1 RCurl_1.5-0 XML_3.2-0 biomaRt_2.14.0 [5] bitops_1.0-4.1 parallel_2.15.2 rtracklayer_1.18.2 stats4_2.15.2 [9] tools_2.15.2 zlibbioc_1.4.0 -- Christopher Cabanski, PhD Postdoctoral Research Associate Department of Medicine, Oncology Division / The Genome Institute Washington University in St. Louis

Coverage Coverage • 1.5k views

ADD COMMENT • link 11.1 years ago Chris Cabanski ▴ 30