using a gtf file to map reads
1
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Mike and Alejandro, disjointExons() is now in GenomicFeatures 1.13.18. It's documented on the same man page as ?transcripts, ?exons, ?cds ,etc. The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a few performance modifications. Primary changes were a helper function with a more vectorized approach to create 'geneNames', 'transcripts' and reducing the number of times metadata columns were added to the GRanges. > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) user system elapsed 8.205 0.028 8.397 > system.time(disjointExons(txdb, TRUE, TRUE)) user system elapsed 5.596 0.032 5.764 > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) user system elapsed 37.286 0.228 37.954 > system.time(disjointExons(txdb, TRUE, TRUE)) user system elapsed 28.130 0.228 28.822 The multiple requests on the mailing list for a function of this type made me think it should be included in one of the infrastructure packages. The goal of adding this to GenomicFeatures was to increase visibility and make it more readily available to packages that don't depend on DEXSeq. I hope having the function in two places doesn't add a layer of confusion. I have a suite of unit tests to ensure disjointExons and prepareAnnotationForDEXSeq are in sync. Let me know if you have any questions or would like any changes made to the man page. Thanks, Valerie On 06/11/13 12:30, Valerie Obenchain wrote: > Great. I'll go ahead and put this in GenomicFeatures with credit to you > and Alejandro. > > Thanks. > Valerie > > On 06/11/2013 09:39 AM, Michael Love wrote: >> hi Valerie, >> >> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain >> <vobencha at="" fhcrc.org=""> wrote: >>> Hi Mike (Love), >>> >>> Would you be interested in contributing a disjointExons() to >>> GenomicFeatures? I think this extraction would be useful to many. >>> Maybe we >>> also want a disjointExonsBy() but the only 'by' would be genes ...? >>> >>> Valerie >>> >> >> I'd be happy to contribute any of this code in parathyroidSE. >> Alejandro Reyes has formalized this part of the code into the >> following function in DEXSeq >= 1.6.0: >> >>> prepareAnnotationForDEXSeq >> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts = >> TRUE) >> { >> stopifnot(is(transcriptDb, "TranscriptDb")) >> exonsByGene <- exonsBy(transcriptDb, by = "gene") >> exonicParts <- disjoin(unlist(exonsByGene)) >> if (!aggregateGenes) { >> overlaps <- findOverlaps(exonicParts, exonsByGene) >> geneNames <- names(exonsByGene)[subjectHits(overlaps)] >> ..... >> >> best, >> >> Mike >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
GO GenomicFeatures DEXSeq GO GenomicFeatures DEXSeq • 963 views
ADD COMMENT
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 5 months ago
Novartis Institutes for BioMedical Reseā€¦
Thanks Valerie! I will make sure to change all the documentation to use "disjointExons" instead. Best regards, Alejandro > Mike and Alejandro, > > disjointExons() is now in GenomicFeatures 1.13.18. It's documented on > the same man page as ?transcripts, ?exons, ?cds ,etc. > > The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a > few performance modifications. Primary changes were a helper function > with a more vectorized approach to create 'geneNames', 'transcripts' > and reducing the number of times metadata columns were added to the > GRanges. > > > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene > > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) > user system elapsed > 8.205 0.028 8.397 > > system.time(disjointExons(txdb, TRUE, TRUE)) > user system elapsed > 5.596 0.032 5.764 > > > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene > > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) > user system elapsed > 37.286 0.228 37.954 > > system.time(disjointExons(txdb, TRUE, TRUE)) > user system elapsed > 28.130 0.228 28.822 > > The multiple requests on the mailing list for a function of this type > made me think it should be included in one of the infrastructure > packages. The goal of adding this to GenomicFeatures was to increase > visibility and make it more readily available to packages that don't > depend on DEXSeq. I hope having the function in two places doesn't add > a layer of confusion. I have a suite of unit tests to ensure > disjointExons and prepareAnnotationForDEXSeq are in sync. > > Let me know if you have any questions or would like any changes made > to the man page. > > Thanks, > Valerie > > > > On 06/11/13 12:30, Valerie Obenchain wrote: >> Great. I'll go ahead and put this in GenomicFeatures with credit to you >> and Alejandro. >> >> Thanks. >> Valerie >> >> On 06/11/2013 09:39 AM, Michael Love wrote: >>> hi Valerie, >>> >>> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain >>> <vobencha at="" fhcrc.org=""> wrote: >>>> Hi Mike (Love), >>>> >>>> Would you be interested in contributing a disjointExons() to >>>> GenomicFeatures? I think this extraction would be useful to many. >>>> Maybe we >>>> also want a disjointExonsBy() but the only 'by' would be genes ...? >>>> >>>> Valerie >>>> >>> >>> I'd be happy to contribute any of this code in parathyroidSE. >>> Alejandro Reyes has formalized this part of the code into the >>> following function in DEXSeq >= 1.6.0: >>> >>>> prepareAnnotationForDEXSeq >>> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts = >>> TRUE) >>> { >>> stopifnot(is(transcriptDb, "TranscriptDb")) >>> exonsByGene <- exonsBy(transcriptDb, by = "gene") >>> exonicParts <- disjoin(unlist(exonsByGene)) >>> if (!aggregateGenes) { >>> overlaps <- findOverlaps(exonicParts, exonsByGene) >>> geneNames <- names(exonsByGene)[subjectHits(overlaps)] >>> ..... >>> >>> best, >>> >>> Mike >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
hi Valerie, That's great, thanks. After the next release, I will change the parathyroidSE vignette to use disjointExons() and the new arguments for summarizeOverlaps(): 1. use inter.feature=FALSE for the exon counting rather than the custom counting mode 2. use fragments=TRUE rather than counting pairs and singletons separately best, Mike On Mon, Jul 1, 2013 at 10:20 AM, Alejandro Reyes <alejandro.reyes at="" embl.de=""> wrote: > Thanks Valerie! > > I will make sure to change all the documentation to use "disjointExons" > instead. > > Best regards, > Alejandro > > > >> Mike and Alejandro, >> >> disjointExons() is now in GenomicFeatures 1.13.18. It's documented on the >> same man page as ?transcripts, ?exons, ?cds ,etc. >> >> The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a few >> performance modifications. Primary changes were a helper function with a >> more vectorized approach to create 'geneNames', 'transcripts' and reducing >> the number of times metadata columns were added to the GRanges. >> >> > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene >> > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) >> user system elapsed >> 8.205 0.028 8.397 >> > system.time(disjointExons(txdb, TRUE, TRUE)) >> user system elapsed >> 5.596 0.032 5.764 >> >> > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene >> > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) >> user system elapsed >> 37.286 0.228 37.954 >> > system.time(disjointExons(txdb, TRUE, TRUE)) >> user system elapsed >> 28.130 0.228 28.822 >> >> The multiple requests on the mailing list for a function of this type made >> me think it should be included in one of the infrastructure packages. The >> goal of adding this to GenomicFeatures was to increase visibility and make >> it more readily available to packages that don't depend on DEXSeq. I hope >> having the function in two places doesn't add a layer of confusion. I have a >> suite of unit tests to ensure disjointExons and prepareAnnotationForDEXSeq >> are in sync. >> >> Let me know if you have any questions or would like any changes made to >> the man page. >> >> Thanks, >> Valerie >> >> >> >> On 06/11/13 12:30, Valerie Obenchain wrote: >>> >>> Great. I'll go ahead and put this in GenomicFeatures with credit to you >>> and Alejandro. >>> >>> Thanks. >>> Valerie >>> >>> On 06/11/2013 09:39 AM, Michael Love wrote: >>>> >>>> hi Valerie, >>>> >>>> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain >>>> <vobencha at="" fhcrc.org=""> wrote: >>>>> >>>>> Hi Mike (Love), >>>>> >>>>> Would you be interested in contributing a disjointExons() to >>>>> GenomicFeatures? I think this extraction would be useful to many. >>>>> Maybe we >>>>> also want a disjointExonsBy() but the only 'by' would be genes ...? >>>>> >>>>> Valerie >>>>> >>>> >>>> I'd be happy to contribute any of this code in parathyroidSE. >>>> Alejandro Reyes has formalized this part of the code into the >>>> following function in DEXSeq >= 1.6.0: >>>> >>>>> prepareAnnotationForDEXSeq >>>> >>>> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts = >>>> TRUE) >>>> { >>>> stopifnot(is(transcriptDb, "TranscriptDb")) >>>> exonsByGene <- exonsBy(transcriptDb, by = "gene") >>>> exonicParts <- disjoin(unlist(exonsByGene)) >>>> if (!aggregateGenes) { >>>> overlaps <- findOverlaps(exonicParts, exonsByGene) >>>> geneNames <- names(exonsByGene)[subjectHits(overlaps)] >>>> ..... >>>> >>>> best, >>>> >>>> Mike >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >
ADD REPLY

Login before adding your answer.

Traffic: 825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6