using a gtf file to map reads

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.9 years ago

United States

Mike and Alejandro, disjointExons() is now in GenomicFeatures 1.13.18. It's documented on the same man page as ?transcripts, ?exons, ?cds ,etc. The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a few performance modifications. Primary changes were a helper function with a more vectorized approach to create 'geneNames', 'transcripts' and reducing the number of times metadata columns were added to the GRanges. > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) user system elapsed 8.205 0.028 8.397 > system.time(disjointExons(txdb, TRUE, TRUE)) user system elapsed 5.596 0.032 5.764 > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) user system elapsed 37.286 0.228 37.954 > system.time(disjointExons(txdb, TRUE, TRUE)) user system elapsed 28.130 0.228 28.822 The multiple requests on the mailing list for a function of this type made me think it should be included in one of the infrastructure packages. The goal of adding this to GenomicFeatures was to increase visibility and make it more readily available to packages that don't depend on DEXSeq. I hope having the function in two places doesn't add a layer of confusion. I have a suite of unit tests to ensure disjointExons and prepareAnnotationForDEXSeq are in sync. Let me know if you have any questions or would like any changes made to the man page. Thanks, Valerie On 06/11/13 12:30, Valerie Obenchain wrote: > Great. I'll go ahead and put this in GenomicFeatures with credit to you > and Alejandro. > > Thanks. > Valerie > > On 06/11/2013 09:39 AM, Michael Love wrote: >> hi Valerie, >> >> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain >> <vobencha at="" fhcrc.org=""> wrote: >>> Hi Mike (Love), >>> >>> Would you be interested in contributing a disjointExons() to >>> GenomicFeatures? I think this extraction would be useful to many. >>> Maybe we >>> also want a disjointExonsBy() but the only 'by' would be genes ...? >>> >>> Valerie >>> >> >> I'd be happy to contribute any of this code in parathyroidSE. >> Alejandro Reyes has formalized this part of the code into the >> following function in DEXSeq >= 1.6.0: >> >>> prepareAnnotationForDEXSeq >> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts = >> TRUE) >> { >> stopifnot(is(transcriptDb, "TranscriptDb")) >> exonsByGene <- exonsBy(transcriptDb, by = "gene") >> exonicParts <- disjoin(unlist(exonsByGene)) >> if (!aggregateGenes) { >> overlaps <- findOverlaps(exonicParts, exonsByGene) >> geneNames <- names(exonsByGene)[subjectHits(overlaps)] >> ..... >> >> best, >> >> Mike >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

GO GenomicFeatures DEXSeq GO GenomicFeatures DEXSeq • 962 views

ADD COMMENT • link updated 11.5 years ago by Alejandro Reyes ★ 1.9k • written 11.5 years ago by Valerie Obenchain ★ 6.8k

0

Entering edit mode

Alejandro Reyes ★ 1.9k

@alejandro-reyes-5124

Last seen 5 months ago

Novartis Institutes for BioMedical Rese…

Thanks Valerie! I will make sure to change all the documentation to use "disjointExons" instead. Best regards, Alejandro > Mike and Alejandro, > > disjointExons() is now in GenomicFeatures 1.13.18. It's documented on > the same man page as ?transcripts, ?exons, ?cds ,etc. > > The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a > few performance modifications. Primary changes were a helper function > with a more vectorized approach to create 'geneNames', 'transcripts' > and reducing the number of times metadata columns were added to the > GRanges. > > > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene > > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) > user system elapsed > 8.205 0.028 8.397 > > system.time(disjointExons(txdb, TRUE, TRUE)) > user system elapsed > 5.596 0.032 5.764 > > > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene > > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) > user system elapsed > 37.286 0.228 37.954 > > system.time(disjointExons(txdb, TRUE, TRUE)) > user system elapsed > 28.130 0.228 28.822 > > The multiple requests on the mailing list for a function of this type > made me think it should be included in one of the infrastructure > packages. The goal of adding this to GenomicFeatures was to increase > visibility and make it more readily available to packages that don't > depend on DEXSeq. I hope having the function in two places doesn't add > a layer of confusion. I have a suite of unit tests to ensure > disjointExons and prepareAnnotationForDEXSeq are in sync. > > Let me know if you have any questions or would like any changes made > to the man page. > > Thanks, > Valerie > > > > On 06/11/13 12:30, Valerie Obenchain wrote: >> Great. I'll go ahead and put this in GenomicFeatures with credit to you >> and Alejandro. >> >> Thanks. >> Valerie >> >> On 06/11/2013 09:39 AM, Michael Love wrote: >>> hi Valerie, >>> >>> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain >>> <vobencha at="" fhcrc.org=""> wrote: >>>> Hi Mike (Love), >>>> >>>> Would you be interested in contributing a disjointExons() to >>>> GenomicFeatures? I think this extraction would be useful to many. >>>> Maybe we >>>> also want a disjointExonsBy() but the only 'by' would be genes ...? >>>> >>>> Valerie >>>> >>> >>> I'd be happy to contribute any of this code in parathyroidSE. >>> Alejandro Reyes has formalized this part of the code into the >>> following function in DEXSeq >= 1.6.0: >>> >>>> prepareAnnotationForDEXSeq >>> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts = >>> TRUE) >>> { >>> stopifnot(is(transcriptDb, "TranscriptDb")) >>> exonsByGene <- exonsBy(transcriptDb, by = "gene") >>> exonicParts <- disjoin(unlist(exonsByGene)) >>> if (!aggregateGenes) { >>> overlaps <- findOverlaps(exonicParts, exonsByGene) >>> geneNames <- names(exonsByGene)[subjectHits(overlaps)] >>> ..... >>> >>> best, >>> >>> Mike >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 11.5 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

hi Valerie, That's great, thanks. After the next release, I will change the parathyroidSE vignette to use disjointExons() and the new arguments for summarizeOverlaps(): 1. use inter.feature=FALSE for the exon counting rather than the custom counting mode 2. use fragments=TRUE rather than counting pairs and singletons separately best, Mike On Mon, Jul 1, 2013 at 10:20 AM, Alejandro Reyes <alejandro.reyes at="" embl.de=""> wrote: > Thanks Valerie! > > I will make sure to change all the documentation to use "disjointExons" > instead. > > Best regards, > Alejandro > > > >> Mike and Alejandro, >> >> disjointExons() is now in GenomicFeatures 1.13.18. It's documented on the >> same man page as ?transcripts, ?exons, ?cds ,etc. >> >> The function is the same as DEXSeq::prepareAnnotationForDEXSeq with a few >> performance modifications. Primary changes were a helper function with a >> more vectorized approach to create 'geneNames', 'transcripts' and reducing >> the number of times metadata columns were added to the GRanges. >> >> > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene >> > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) >> user system elapsed >> 8.205 0.028 8.397 >> > system.time(disjointExons(txdb, TRUE, TRUE)) >> user system elapsed >> 5.596 0.032 5.764 >> >> > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene >> > system.time(prepareAnnotationForDEXSeq(txdb, TRUE, TRUE)) >> user system elapsed >> 37.286 0.228 37.954 >> > system.time(disjointExons(txdb, TRUE, TRUE)) >> user system elapsed >> 28.130 0.228 28.822 >> >> The multiple requests on the mailing list for a function of this type made >> me think it should be included in one of the infrastructure packages. The >> goal of adding this to GenomicFeatures was to increase visibility and make >> it more readily available to packages that don't depend on DEXSeq. I hope >> having the function in two places doesn't add a layer of confusion. I have a >> suite of unit tests to ensure disjointExons and prepareAnnotationForDEXSeq >> are in sync. >> >> Let me know if you have any questions or would like any changes made to >> the man page. >> >> Thanks, >> Valerie >> >> >> >> On 06/11/13 12:30, Valerie Obenchain wrote: >>> >>> Great. I'll go ahead and put this in GenomicFeatures with credit to you >>> and Alejandro. >>> >>> Thanks. >>> Valerie >>> >>> On 06/11/2013 09:39 AM, Michael Love wrote: >>>> >>>> hi Valerie, >>>> >>>> On Tue, Jun 11, 2013 at 6:13 PM, Valerie Obenchain >>>> <vobencha at="" fhcrc.org=""> wrote: >>>>> >>>>> Hi Mike (Love), >>>>> >>>>> Would you be interested in contributing a disjointExons() to >>>>> GenomicFeatures? I think this extraction would be useful to many. >>>>> Maybe we >>>>> also want a disjointExonsBy() but the only 'by' would be genes ...? >>>>> >>>>> Valerie >>>>> >>>> >>>> I'd be happy to contribute any of this code in parathyroidSE. >>>> Alejandro Reyes has formalized this part of the code into the >>>> following function in DEXSeq >= 1.6.0: >>>> >>>>> prepareAnnotationForDEXSeq >>>> >>>> function (transcriptDb, aggregateGenes = FALSE, includeTranscripts = >>>> TRUE) >>>> { >>>> stopifnot(is(transcriptDb, "TranscriptDb")) >>>> exonsByGene <- exonsBy(transcriptDb, by = "gene") >>>> exonicParts <- disjoin(unlist(exonsByGene)) >>>> if (!aggregateGenes) { >>>> overlaps <- findOverlaps(exonicParts, exonsByGene) >>>> geneNames <- names(exonsByGene)[subjectHits(overlaps)] >>>> ..... >>>> >>>> best, >>>> >>>> Mike >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >

ADD REPLY • link 11.5 years ago Michael Love 43k

Login before adding your answer.