reducing hits from countGenomicOverlaps()
0
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
On Wed, Oct 26, 2011 at 3:07 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 10/25/2011 05:35 PM, Robert Castelo wrote: > >> dear list, >> >> the following three lines allow one to count overlaps of aligned >> short-reads with annotations: >> >> aln <- readGappedAlignments("**somebamfile.bam") >> txdb <- makeTranscriptFromUCSC(genome=**"hg19", tablename="ensGene") >> ensGenes <- exonsBy(txdb, by="gene") >> ov <- countGenomicOverlaps(aln, ensGenes) >> >> then i want to get read-counts per gene and the first thing that comes >> to my head is doing: >> >> counts <- sapply(ov, function(x) sum(values(x)[["hits"]])) >> >> which goes through every gene and adds up the "hits" of its exons. >> however, this latter step of "just adding" takes longer than the actual >> calculation of the hits with countGenomicOverlaps() and i guess that >> there are more efficient ways to approach this, probably something >> around "reducing the hits value column". i've been looking at rdapply() >> and reduce() and googled too, but couldn't find anything, so i look >> forward to your suggestions. >> > > Hi Robert -- A strategy here is along the lines (untested!) of > > hits <- values(unlist(ov)))[["hits"]] > At least in devel, we can now do this: counts <- sum(relist(hits, ov)) > genes <- rep(names(ov), elementLengths(ov)) > counts <- sapply(split(hits, genes), sum) > > but you'll want to make sure that this is a conceptually sensible way of > counting hits per gene. > > The countGenomicOverlaps function has been heavily updated to > summarizeOverlaps in 'devel'; you might want to check that out, or at least > the vignette for summarizeOverlaps at > > http://bioconductor.org/**packages/devel/bioc/html/**GenomicRanges.h tml<http: bioconductor.org="" packages="" devel="" bioc="" html="" genomicranges.htm="" l=""> > > Martin > > > >> thanks!! >> robert. >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.**science.biology.informatics.**conduct or<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
Cancer Cancer • 517 views
ADD COMMENT

Login before adding your answer.

Traffic: 485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6