Re(surrecting): [Bioc-sig-seq] Rsamtools countBam labeling
0
0
Entering edit mode
@martin-morgan-1513
Last seen 3 days ago
United States
On 03/31/2011 12:49 PM, Cook, Malcolm wrote: > Martin, > > Hmmm, Thanks .... I think I'm getting it.... > > Following your lead, I can directly re-order cnt in the OriginalOrder as: > >> cnt[sort(unlist(split(values(which2), seqnames(which2)))$OriginalOrder,index.return=TRUE)$ix,] > space start end width file records nucleotides > 2 seq2 100 1000 901 ex1.bam 1169 41235 > 1 seq1 1000 2000 1001 ex1.bam 612 21549 > 3 seq2 1000 2000 1001 ex1.bam 642 22640 > > > ... which can usefully(?) be abstracted as > > countBamWhich<- function (file,which,index=file,...) { > ### wrapper for countBam that reorders its results to aggree with > ### 'which', a required ScanBamParam. ... params are additional > ### ScanBamParam options. > ### > ### ASSUMES: internal implementation detail of ScanBamParam. c.f. > ### http://permalink.gmane.org/gmane.science.biology.informatics.con ductor/34208) > param<- ScanBamParam(which=which,...) > values(which)[['OriginalOrder']]<- 1:length(which) > CBW = countBam(file,index,param=ScanBamParam(which=which)) > CBW[sort(unlist(split(values(which), > seqnames(which)))$OriginalOrder,index.return=TRUE)$ix,] > } Hi Malcolm -- maybe the return value is a GRanges with counts? It's probably better to split a simple vector and use that to impose order on the result, rather than splitting values(). The return value could then be cbind'ed as needed... countBam_GRanges <- function(file, index=file, ..., param=GRanges()) { if (0L == length(param)) stop("'param' must have length() != 0") ## FIXME: not sure about 'drop' argument to split? splt <- split(seq_len(length(param)), as.factor(seqnames(param))) o <- order(unlist(splt, use.names=FALSE)) cnt <- countBam(file, index, ..., param=ScanBamParam(which=param)) df <- as(cnt[o, c("records", "nucleotides")], "DataFrame") values(param) <- df param } counts <- countBam_GRanges(fl, param=which1) values(which1) <- cbind(values(which1), values(counts)) Martin > > allowing me to write > >> countBamWhich(fl, which2) > space start end width file records nucleotides > 2 seq2 100 1000 901 ex1.bam 1169 41235 > 1 seq1 1000 2000 1001 ex1.bam 612 21549 > 3 seq2 1000 2000 1001 ex1.bam 642 22640 > > All in favor? > > ~ Malcolm > > > > >> -----Original Message----- >> From: Martin Morgan [mailto:mtmorgan at fhcrc.org] >> Sent: Thursday, March 31, 2011 11:41 AM >> To: Cook, Malcolm >> Cc: 'myoung at wehi.EDU.AU'; 'bioconductor at r-project.org'; >> 'Bioc-sig-sequencing at r-project.org' >> Subject: Re: [BioC] Re(surrecting): [Bioc-sig-seq] Rsamtools >> countBam labeling >> >> On 03/31/2011 08:32 AM, Cook, Malcolm wrote: >>> Matt et. al., >>> >>> I wonder if a satisfactory resolution to the issue of "the order >>> changes between the GRanges object and the countBam data.frame >>> >>> >> http://www.mail-archive.com/bioc-sig-sequencing at r-project.org/msg01144 >>> .html >>> >>> I am presented with the same issue and poised to tackle it >> but wondr >>> if a generic solution emerged from you inquiries& efforts. >> >> Hi Malcolm -- >> >> For a reproducible example, >> >> library(Rsamtools) >> example(countBam) >> which1<- as(which, "GRanges") >> ## which2 might be where your data actually starts >> which2<- which1[c(2,1,3)] >> values(which2)[["OriginalOrder"]]<- 1:3 >> param<- ScanBamParam(which=which2) >> cnt<- countBam(fl, param=param) >> >> What happens is that ScanBamParam converts its argument to an >> IRangesList, using split(ranges(which2), seqnames(which2)). >> So do the same for the values and unlist >> >> cntVals<- unlist(split(values(which2), seqnames(which2))) >> >> then cbind coerced values >> >> cbind(cnt, as.data.frame(cntVals)) >> >> with >> >> > which2 >> GRanges with 3 ranges and 1 elementMetadata value >> seqnames ranges strand | OriginalOrder >> <rle> <iranges> <rle> |<integer> >> [1] seq2 [ 100, 1000] * | 1 >> [2] seq1 [1000, 2000] * | 2 >> [3] seq2 [1000, 2000] * | 3 >> >> seqlengths >> seq1 seq2 >> NA NA >> > cbind(cnt, as.data.frame(cntVals)) >> space start end width file records nucleotides OriginalOrder >> 1 seq1 1000 2000 1001 ex1.bam 612 21549 2 >> 2 seq2 100 1000 901 ex1.bam 1169 41235 1 >> 3 seq2 1000 2000 1001 ex1.bam 642 22640 3 >> >> Martin >> >>> >>> Thanks, >>> >>> Malcolm Cook Stowers Institute for Medical Research - >> Bioinformatics >>> Kansas City, Missouri USA >>> >>> >>> _______________________________________________ >> Bioconductor mailing >>> list Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>> archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
Cancer Cancer • 805 views
ADD COMMENT

Login before adding your answer.

Traffic: 575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6