Entering edit mode
Mike and Alejandro,
disjointExons() is now in GenomicFeatures 1.13.18. It's documented on
the same man page as ?transcripts, ?exons, ?cds ,etc.
The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a
few performance modifications. Primary changes were a helper function
with a more vectorized approach to create 'geneNames', 'transcripts'
and
reducing the number of times metadata columns were added to the
GRanges.
> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
user system elapsed
8.205 0.028 8.397
> system.time(disjointExons(txdb, TRUE, TRUE))
user system elapsed
5.596 0.032 5.764
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE))
user system elapsed
37.286 0.228 37.954
> system.time(disjointExons(txdb, TRUE, TRUE))
user system elapsed
28.130 0.228 28.822
The multiple requests on the mailing list for a function of this type
made me think it should be included in one of the infrastructure
packages. The goal of adding this to GenomicFeatures was to increase
visibility and make it more readily available to packages that don't
depend on DEXSeq. I hope having the function in two places doesn't add
a
layer of confusion. I have a suite of unit tests to ensure
disjointExons
and prepareAnnotationForDEXSeq are in sync.
Let me know if you have any questions or would like any changes made
to
the man page.
Thanks,
Valerie
On 06/11/13 12:30, Valerie Obenchain wrote:
> Great. I'll go ahead and put this in GenomicFeatures with credit to
you
> and Alejandro.
>
> Thanks.
> Valerie
>
> On 06/11/2013 09:39 AM, Michael Love wrote:
>> hi Valerie,
>>
>> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain
>> <vobencha at="" fhcrc.org=""> wrote:
>>> Hi Mike (Love),
>>>
>>> Would you be interested in contributing a disjointExons() to
>>> GenomicFeatures? I think this extraction would be useful to many.
>>> Maybe we
>>> also want a disjointExonsBy() but the only 'by' would be genes
...?
>>>
>>> Valerie
>>>
>>
>> I'd be happy to contribute any of this code in parathyroidSE.
>> Alejandro Reyes has formalized this part of the code into the
>> following function in DEXSeq >= 1.6.0:
>>
>>> prepareAnnotationForDEXSeq
>> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts
=
>> TRUE)
>> {
>> stopifnot(is(transcriptDb, "TranscriptDb"))
>> exonsByGene <- exonsBy(transcriptDb, by = "gene")
>> exonicParts <- disjoin(unlist(exonsByGene))
>> if (!aggregateGenes) {
>> overlaps <- findOverlaps(exonicParts, exonsByGene)
>> geneNames <- names(exonsByGene)[subjectHits(overlaps)]
>> .....
>>
>> best,
>>
>> Mike
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor