autoplot transcriptDb error with some regions

0

Entering edit mode

Malcolm Cook ★ 1.6k

@malcolm-cook-6293

Last seen 11 months ago

United States

Tengfei & Herve, I too am afflicted with this error and hoping that the following reproducible example will hasten a patch. I am unsure but speculate that this error is raised for the same transcripts wherein makeTranscriptDbFromUCSC issues warning: In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) : UCSC data anomaly in 434 transcript(s): the cds cumulative length is not a multiple of 3 for transcripts 'HRA1' 'tP(UGG)A' 'snR18' 'tA(UGC)A' 'tL(CAA)A' 'tS(AGA)A' 'YAR061W' 'YAR062W' 'tP(UGG)Q' '15S_rRNA' 'tW(UCA)Q' 'tE(UUC)Q' 'tS(UGA)Q2' '21S_rRNA' 'tT(UGU)Q1' 'tC(GCA)Q' 'tH(GUG)Q' 'tL(UAA)Q' 'tQ(UUG)Q' 'tK(UUU)Q' 'tR(UCU)Q1' 'tG(UCC)Q' 'tD(GUC)Q' 'tS(GCU)Q1' 'tR(ACG)Q2' 'tA(UGC)Q' 'tI(GAU)Q' 'tY(GUA)Q' 'tN(GUU)Q' 'tM(CAU)Q1' 'tF(GAA)Q' 'tT(XXX)Q2' 'tV(UAC)Q' 'tM(CAU)Q2' 'RPM1' 'snR80' 'snR67' 'snR53' 'tG(GCC)E' 'tS(AGA)E' 'tM(CAU)E' 'RPR1' 'tQ(UUG)E2' 'tK(CUU)E1' 'tR(UCU)E' 'snR14' 'tE(UUC)E1' 'tH(GUG)E1' 'tQ(UUG)E1' 'tS(UGA)E' 'tA(UGC)E' 'SRG1' 'tE(UUC)E2' 'snR4' 'snR52' 'tH(GUG)E2' 'tK(CUU)E2' 'tV(AAC)E1' 'SCR1' 'tI(AAU)E1' 'tV(AAC)E2' ' [... truncated] Regarding which, the following thread may be of interest: https://stat.ethz.ch/pipermail/bioconductor/2010-July/034568.html https://stat.ethz.ch/pipermail/bioconductor/2012-March/044214.html http://permalink.gmane.org/gmane.science.biology.informatics.conductor /30105 In the last thread, Herve, you wonder: > Should we allow > the user to filter CDSs based on this status? Or should we import only > complete CDSs? Or we import all the CDSs but we store in the metadata > table of the TranscriptDb object (and then display this in the show > method) the fact that not all the CDSs are complete? In my case, a great workaround would be to provide option the drop (with warning) the incomplete ones. Or, somehow interrogate the tr.db for which have this problem so I may drop them myself. Tengfie, It would be great if any fix that works in the development version can be ported to the release branch as well. Cheers, ~Thanks, Malcolm library(ggbio) library(GenomicFeatures) tr.db<- makeTranscriptDbFromUCSC( ,genome='sacCer3' ,tablename='ensGene' ) tr.by.gn.grl<-transcriptsBy(tr.db,'gene') gn.gr<-unlist(range(tr.by.gn.grl),use.names=TRUE) a2<-geom_alignment(tr.db,which=gn.gr[2]) # this works! geom_alignment(tr.db,which=gn.gr['HRA1']) # this breaks gn.gr[1] sessionInfo() ## whose output is: In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) : UCSC data anomaly in 434 transcript(s): the cds cumulative length is not a multiple of 3 for transcripts 'HRA1' 'tP(UGG)A' 'snR18' 'tA(UGC)A' 'tL(CAA)A' 'tS(AGA)A' 'YAR061W' 'YAR062W' 'tP(UGG)Q' '15S_rRNA' 'tW(UCA)Q' 'tE(UUC)Q' 'tS(UGA)Q2' '21S_rRNA' 'tT(UGU)Q1' 'tC(GCA)Q' 'tH(GUG)Q' 'tL(UAA)Q' 'tQ(UUG)Q' 'tK(UUU)Q' 'tR(UCU)Q1' 'tG(UCC)Q' 'tD(GUC)Q' 'tS(GCU)Q1' 'tR(ACG)Q2' 'tA(UGC)Q' 'tI(GAU)Q' 'tY(GUA)Q' 'tN(GUU)Q' 'tM(CAU)Q1' 'tF(GAA)Q' 'tT(XXX)Q2' 'tV(UAC)Q' 'tM(CAU)Q2' 'RPM1' 'snR80' 'snR67' 'snR53' 'tG(GCC)E' 'tS(AGA)E' 'tM(CAU)E' 'RPR1' 'tQ(UUG)E2' 'tK(CUU)E1' 'tR(UCU)E' 'snR14' 'tE(UUC)E1' 'tH(GUG)E1' 'tQ(UUG)E1' 'tS(UGA)E' 'tA(UGC)E' 'SRG1' 'tE(UUC)E2' 'snR4' 'snR52' 'tH(GUG)E2' 'tK(CUU)E2' 'tV(AAC)E1' 'SCR1' 'tI(AAU)E1' 'tV(AAC)E2' ' [... truncated] Aggregating TranscriptDb... Parsing exons... Parsing cds... Parsing transcripts... Parsing utrs and aggregating... Done Constructing graphics... > > Aggregating TranscriptDb... Parsing exons... Parsing cds... Parsing transcripts... Parsing utrs and aggregating... Error in data.frame(tx_id = .nms, tx_name = .tx.nms, gene_id = .gid.nms, : arguments imply differing number of rows: 0, 1 > > > GRanges with 1 range and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> 15S_rRNA chrM [6546, 8194] + --- seqlengths: chrI chrII chrIII chrIV chrV chrVI chrVII chrVIII chrIX chrX chrXI chrXII chrXIII chrXIV chrXV chrXVI chrM 230218 813184 316620 1531933 576874 270161 1090940 562643 439888 745751 666816 1078177 924431 784333 1091291 948066 85779 > R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices datasets utils methods base other attached packages: [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 GenomicRanges_1.14.3 XVector_0.2.0 IRanges_1.20.4 BiocGenerics_0.8.0 ggbio_1.10.0 ggplot2_0.9.3.1 loaded via a namespace (and not attached): [1] biomaRt_2.18.0 Biostrings_2.30.0 biovizBase_1.10.0 bitops_1.0-6 BSgenome_1.30.0 cluster_1.14.4 colorspace_1.2-4 compiler_3.0.2 DBI_0.2-7 dichromat_2.0-0 digest_0.6.3 grid_3.0.2 gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.12-2 labeling_0.2 lattice_0.20-24 MASS_7.3-29 munsell_0.4.2 plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 [25] rpart_4.1-3 Rsamtools_1.14.1 RSQLite_0.11.4 rtracklayer_1.22.0 scales_0.2.3 stats4_3.0.2 stringr_0.6.2 tools_3.0.2 VariantAnnotation_1.8.2 XML_3.98-1.1 zlibbioc_1.8.0 > >-----Original Message----- >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Tengfei Yin >Sent: Friday, October 18, 2013 12:05 PM >To: Alejandro Reyes >Cc: bioconductor at r-project.org >Subject: Re: [BioC] autoplot transcriptDb error with some regions > >Hi Alejandro, > >Thanks for reporting, I believe that's a bug caused by my recent >modification in biovizBase package, I am working on that now, will keep you >updated. > >Best > >Tengfei > > >On Fri, Oct 18, 2013 at 12:43 PM, Alejandro Reyes ><alejandro.reyes at="" embl.de="">wrote: > >> Dear Tengfei Yin, >> >> Firstly, thanks for developing ggbio, it has been very useful for me! >> >> I am getting an error when using autoplot with some specific genomic >> regions in transcriptDb objects, here is an example: >> >> > suppressMessages( library(ggbio) ) >> > suppressMessages(library(**GenomicFeatures)) >> > tx <- makeTranscriptDbFromBiomart() >> >> Aggregating TranscriptDb... >> Parsing exons... >> Parsing cds... >> Parsing transcripts... >> Parsing utrs and aggregating... >> Done >> Constructing graphics... >> >> prueba <- GRanges( 16, IRanges( start=69598997, 69718569 ) ) >> autoplot( tx, prueba, group.selfish=TRUE, names.expr="") >> >> Aggregating TranscriptDb... >> Parsing exons... >> Parsing cds... >> Parsing transcripts... >> Parsing utrs and aggregating... >> Done >> Constructing graphics... >> >> So far, excellent, however, when I look into a smaller region I get an >> error message: >> >> > prueba <- GRanges( "16", IRanges(start=69718724, end=69720078 )) >> > autoplot( tx, prueba, group.selfish=TRUE, names.expr="") >> Aggregating TranscriptDb... >> Parsing exons... >> Parsing cds... >> Parsing transcripts... >> Parsing utrs and aggregating... >> Error in DataFrame(...) : different row counts implied by arguments >> >> I believe it has to do with recent modifications of ggbio, since I do not >> get the error message with older versions, e.g. 1.9.7. >> >> > sessionInfo() >> R Under development (unstable) (2013-07-01 r63121) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] ggbio_1.11.0 ggplot2_0.9.3.1 GenomicFeatures_1.15.0 >> [4] AnnotationDbi_1.23.28 Biobase_2.21.7 GenomicRanges_1.13.56 >> [7] XVector_0.1.4 IRanges_1.19.40 BiocGenerics_0.7.8 >> [10] BiocInstaller_1.13.1 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.17.3 Biostrings_2.29.19 biovizBase_1.9.4 >> [4] bitops_1.0-6 BSgenome_1.29.1 cluster_1.14.4 >> [7] colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 >> [10] digest_0.6.3 grid_3.1.0 gridExtra_0.9.1 >> [13] gtable_0.1.2 Hmisc_3.12-2 labeling_0.2 >> [16] lattice_0.20-24 MASS_7.3-29 munsell_0.4.2 >> [19] plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5 >> [22] RCurl_1.95-4.1 reshape2_1.2.2 rpart_4.1-3 >> [25] Rsamtools_1.13.53 RSQLite_0.11.4 rtracklayer_1.21.14 >> [28] scales_0.2.3 stats4_3.1.0 stringr_0.6.2 >> [31] tools_3.1.0 VariantAnnotation_1.7.57 XML_3.98-1.1 >> [34] zlibbioc_1.7.0 >> >> Best regards, >> Alejandro Reyes >> > > > >-- >Tengfei Yin, PhD >Seven Bridges Genomics >sbgenomics.com >625 Mt. Auburn St. Suite #208 >Cambridge, MA 02138 >(617) 866-0446 > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

TranscriptDb biovizBase ggbio TranscriptDb biovizBase ggbio • 2.1k views

ADD COMMENT • link updated 12.2 years ago by Tengfei Yin ▴ 490 • written 12.2 years ago by Malcolm Cook ★ 1.6k

0

Entering edit mode

Tengfei Yin ▴ 490

@tengfei-yin-6162

Last seen 11.4 years ago

Hi Malcolm, Thanks for reporting the issue and provide the attached example, I am looking into this and will keep you updated. cheers Tengfei On Mon, Nov 4, 2013 at 11:36 AM, Cook, Malcolm <mec@stowers.org> wrote: > Tengfei & Herve, > > I too am afflicted with this error and hoping that the following > reproducible example will hasten a patch. > > I am unsure but speculate that this error is raised for the same > transcripts wherein makeTranscriptDbFromUCSC issues warning: > In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) : > UCSC data anomaly in 434 transcript(s): the cds cumulative length is not > a multiple of 3 for transcripts 'HRA1' 'tP(UGG)A' 'snR18' 'tA(UGC)A' > 'tL(CAA)A' 'tS(AGA)A' 'YAR061W' 'YAR062W' 'tP(UGG)Q' '15S_rRNA' 'tW(UCA)Q' > 'tE(UUC)Q' 'tS(UGA)Q2' '21S_rRNA' 'tT(UGU)Q1' 'tC(GCA)Q' 'tH(GUG)Q' > 'tL(UAA)Q' 'tQ(UUG)Q' 'tK(UUU)Q' 'tR(UCU)Q1' 'tG(UCC)Q' 'tD(GUC)Q' > 'tS(GCU)Q1' 'tR(ACG)Q2' 'tA(UGC)Q' 'tI(GAU)Q' 'tY(GUA)Q' 'tN(GUU)Q' > 'tM(CAU)Q1' 'tF(GAA)Q' 'tT(XXX)Q2' 'tV(UAC)Q' 'tM(CAU)Q2' 'RPM1' 'snR80' > 'snR67' 'snR53' 'tG(GCC)E' 'tS(AGA)E' > 'tM(CAU)E' 'RPR1' 'tQ(UUG)E2' 'tK(CUU)E1' 'tR(UCU)E' 'snR14' 'tE(UUC)E1' > 'tH(GUG)E1' 'tQ(UUG)E1' 'tS(UGA)E' 'tA(UGC)E' 'SRG1' 'tE(UUC)E2' 'snR4' > 'snR52' 'tH(GUG)E2' 'tK(CUU)E2' 'tV(AAC)E1' 'SCR1' 'tI(AAU)E1' 'tV(AAC)E2' > ' [... truncated] > > Regarding which, the following thread may be of interest: > https://stat.ethz.ch/pipermail/bioconductor/2010-July/034568.html > https://stat.ethz.ch/pipermail/bioconductor/2012-March/044214.html > > http://permalink.gmane.org/gmane.science.biology.informatics.conduct or/30105 > > In the last thread, Herve, you wonder: > > Should we allow > > the user to filter CDSs based on this status? Or should we import only > > complete CDSs? Or we import all the CDSs but we store in the metadata > > table of the TranscriptDb object (and then display this in the show > > method) the fact that not all the CDSs are complete? > > In my case, a great workaround would be to provide option the drop (with > warning) the incomplete ones. Or, somehow interrogate the tr.db for which > have this problem so I may drop them myself. > > Tengfie, It would be great if any fix that works in the development > version can be ported to the release branch as well. > > Cheers, > > ~Thanks, > > Malcolm > > library(ggbio) > library(GenomicFeatures) > > tr.db<- > makeTranscriptDbFromUCSC( > ,genome='sacCer3' > ,tablename='ensGene' > ) > > tr.by.gn.grl<-transcriptsBy(tr.db,'gene') > > gn.gr<-unlist(range(tr.by.gn.grl),use.names=TRUE) > > a2<-geom_alignment(tr.db,which=gn.gr[2]) # this works! > > geom_alignment(tr.db,which=gn.gr['HRA1']) # this breaks > > > gn.gr[1] > sessionInfo() > > > ## whose output is: > > > In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) : > UCSC data anomaly in 434 transcript(s): the cds cumulative length is not > a multiple of 3 for transcripts 'HRA1' 'tP(UGG)A' 'snR18' 'tA(UGC)A' > 'tL(CAA)A' 'tS(AGA)A' 'YAR061W' 'YAR062W' 'tP(UGG)Q' '15S_rRNA' 'tW(UCA)Q' > 'tE(UUC)Q' 'tS(UGA)Q2' '21S_rRNA' 'tT(UGU)Q1' 'tC(GCA)Q' 'tH(GUG)Q' > 'tL(UAA)Q' 'tQ(UUG)Q' 'tK(UUU)Q' 'tR(UCU)Q1' 'tG(UCC)Q' 'tD(GUC)Q' > 'tS(GCU)Q1' 'tR(ACG)Q2' 'tA(UGC)Q' 'tI(GAU)Q' 'tY(GUA)Q' 'tN(GUU)Q' > 'tM(CAU)Q1' 'tF(GAA)Q' 'tT(XXX)Q2' 'tV(UAC)Q' 'tM(CAU)Q2' 'RPM1' 'snR80' > 'snR67' 'snR53' 'tG(GCC)E' 'tS(AGA)E' > 'tM(CAU)E' 'RPR1' 'tQ(UUG)E2' 'tK(CUU)E1' 'tR(UCU)E' 'snR14' 'tE(UUC)E1' > 'tH(GUG)E1' 'tQ(UUG)E1' 'tS(UGA)E' 'tA(UGC)E' 'SRG1' 'tE(UUC)E2' 'snR4' > 'snR52' 'tH(GUG)E2' 'tK(CUU)E2' 'tV(AAC)E1' 'SCR1' 'tI(AAU)E1' 'tV(AAC)E2' > ' [... truncated] > > Aggregating TranscriptDb... > Parsing exons... > Parsing cds... > Parsing transcripts... > Parsing utrs and aggregating... > Done > Constructing graphics... > > > > Aggregating TranscriptDb... > Parsing exons... > Parsing cds... > Parsing transcripts... > Parsing utrs and aggregating... > Error in data.frame(tx_id = .nms, tx_name = .tx.nms, gene_id = .gid.nms, : > arguments imply differing number of rows: 0, 1 > > > > > GRanges with 1 range and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > 15S_rRNA chrM [6546, 8194] + > --- > seqlengths: > chrI chrII chrIII chrIV chrV chrVI chrVII chrVIII chrIX > chrX chrXI chrXII chrXIII chrXIV chrXV chrXVI chrM > 230218 813184 316620 1531933 576874 270161 1090940 562643 439888 > 745751 666816 1078177 924431 784333 1091291 948066 85779 > > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 > LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices datasets utils methods > base > > other attached packages: > [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 > GenomicRanges_1.14.3 XVector_0.2.0 IRanges_1.20.4 > BiocGenerics_0.8.0 ggbio_1.10.0 ggplot2_0.9.3.1 > > loaded via a namespace (and not attached): > [1] biomaRt_2.18.0 Biostrings_2.30.0 biovizBase_1.10.0 > bitops_1.0-6 BSgenome_1.30.0 cluster_1.14.4 > colorspace_1.2-4 compiler_3.0.2 DBI_0.2-7 > dichromat_2.0-0 digest_0.6.3 grid_3.0.2 > gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.12-2 > labeling_0.2 lattice_0.20-24 MASS_7.3-29 > munsell_0.4.2 plyr_1.8 proto_0.3-10 > RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 > [25] rpart_4.1-3 Rsamtools_1.14.1 RSQLite_0.11.4 > rtracklayer_1.22.0 scales_0.2.3 stats4_3.0.2 > stringr_0.6.2 tools_3.0.2 VariantAnnotation_1.8.2 > XML_3.98-1.1 zlibbioc_1.8.0 > > > > >-----Original Message----- > >From: bioconductor-bounces@r-project.org [mailto: > bioconductor-bounces@r-project.org] On Behalf Of Tengfei Yin > >Sent: Friday, October 18, 2013 12:05 PM > >To: Alejandro Reyes > >Cc: bioconductor@r-project.org > >Subject: Re: [BioC] autoplot transcriptDb error with some regions > > > >Hi Alejandro, > > > >Thanks for reporting, I believe that's a bug caused by my recent > >modification in biovizBase package, I am working on that now, will keep > you > >updated. > > > >Best > > > >Tengfei > > > > > >On Fri, Oct 18, 2013 at 12:43 PM, Alejandro Reyes > ><alejandro.reyes@embl.de>wrote: > > > >> Dear Tengfei Yin, > >> > >> Firstly, thanks for developing ggbio, it has been very useful for me! > >> > >> I am getting an error when using autoplot with some specific genomic > >> regions in transcriptDb objects, here is an example: > >> > >> > suppressMessages( library(ggbio) ) > >> > suppressMessages(library(**GenomicFeatures)) > >> > tx <- makeTranscriptDbFromBiomart() > >> > >> Aggregating TranscriptDb... > >> Parsing exons... > >> Parsing cds... > >> Parsing transcripts... > >> Parsing utrs and aggregating... > >> Done > >> Constructing graphics... > >> > >> prueba <- GRanges( 16, IRanges( start=69598997, 69718569 ) ) > >> autoplot( tx, prueba, group.selfish=TRUE, names.expr="") > >> > >> Aggregating TranscriptDb... > >> Parsing exons... > >> Parsing cds... > >> Parsing transcripts... > >> Parsing utrs and aggregating... > >> Done > >> Constructing graphics... > >> > >> So far, excellent, however, when I look into a smaller region I get an > >> error message: > >> > >> > prueba <- GRanges( "16", IRanges(start=69718724, end=69720078 )) > >> > autoplot( tx, prueba, group.selfish=TRUE, names.expr="") > >> Aggregating TranscriptDb... > >> Parsing exons... > >> Parsing cds... > >> Parsing transcripts... > >> Parsing utrs and aggregating... > >> Error in DataFrame(...) : different row counts implied by arguments > >> > >> I believe it has to do with recent modifications of ggbio, since I do > not > >> get the error message with older versions, e.g. 1.9.7. > >> > >> > sessionInfo() > >> R Under development (unstable) (2013-07-01 r63121) > >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] parallel stats graphics grDevices utils datasets methods > >> [8] base > >> > >> other attached packages: > >> [1] ggbio_1.11.0 ggplot2_0.9.3.1 GenomicFeatures_1.15.0 > >> [4] AnnotationDbi_1.23.28 Biobase_2.21.7 GenomicRanges_1.13.56 > >> [7] XVector_0.1.4 IRanges_1.19.40 BiocGenerics_0.7.8 > >> [10] BiocInstaller_1.13.1 > >> > >> loaded via a namespace (and not attached): > >> [1] biomaRt_2.17.3 Biostrings_2.29.19 biovizBase_1.9.4 > >> [4] bitops_1.0-6 BSgenome_1.29.1 cluster_1.14.4 > >> [7] colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 > >> [10] digest_0.6.3 grid_3.1.0 gridExtra_0.9.1 > >> [13] gtable_0.1.2 Hmisc_3.12-2 labeling_0.2 > >> [16] lattice_0.20-24 MASS_7.3-29 munsell_0.4.2 > >> [19] plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5 > >> [22] RCurl_1.95-4.1 reshape2_1.2.2 rpart_4.1-3 > >> [25] Rsamtools_1.13.53 RSQLite_0.11.4 rtracklayer_1.21.14 > >> [28] scales_0.2.3 stats4_3.1.0 stringr_0.6.2 > >> [31] tools_3.1.0 VariantAnnotation_1.7.57 XML_3.98-1.1 > >> [34] zlibbioc_1.7.0 > >> > >> Best regards, > >> Alejandro Reyes > >> > > > > > > > >-- > >Tengfei Yin, PhD > >Seven Bridges Genomics > >sbgenomics.com > >625 Mt. Auburn St. Suite #208 > >Cambridge, MA 02138 > >(617) 866-0446 > > > > [[alternative HTML version deleted]] > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@r-project.org > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Tengfei Yin, PhD Seven Bridges Genomics sbgenomics.com 625 Mt. Auburn St. Suite #208 Cambridge, MA 02138 (617) 866-0446 [[alternative HTML version deleted]]

ADD COMMENT • link 12.2 years ago Tengfei Yin ▴ 490

0

Entering edit mode

Hi Malcolm, It's fixed in released branch, 1.10.3, please update and try two days later, the bug you reported is not quite related to CDS parsing, but a bug around how biovizBase handle the empty object exception. Thanks Tengfei On Mon, Nov 4, 2013 at 5:02 PM, Tengfei Yin <tengfei.yin@sbgenomics.com>wrote: > Hi Malcolm, > > Thanks for reporting the issue and provide the attached example, I am > looking into this and will keep you updated. > > cheers > > Tengfei > > > On Mon, Nov 4, 2013 at 11:36 AM, Cook, Malcolm <mec@stowers.org> wrote: > >> Tengfei & Herve, >> >> I too am afflicted with this error and hoping that the following >> reproducible example will hasten a patch. >> >> I am unsure but speculate that this error is raised for the same >> transcripts wherein makeTranscriptDbFromUCSC issues warning: >> In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) : >> UCSC data anomaly in 434 transcript(s): the cds cumulative length is >> not a multiple of 3 for transcripts 'HRA1' 'tP(UGG)A' 'snR18' 'tA(UGC)A' >> 'tL(CAA)A' 'tS(AGA)A' 'YAR061W' 'YAR062W' 'tP(UGG)Q' '15S_rRNA' 'tW(UCA)Q' >> 'tE(UUC)Q' 'tS(UGA)Q2' '21S_rRNA' 'tT(UGU)Q1' 'tC(GCA)Q' 'tH(GUG)Q' >> 'tL(UAA)Q' 'tQ(UUG)Q' 'tK(UUU)Q' 'tR(UCU)Q1' 'tG(UCC)Q' 'tD(GUC)Q' >> 'tS(GCU)Q1' 'tR(ACG)Q2' 'tA(UGC)Q' 'tI(GAU)Q' 'tY(GUA)Q' 'tN(GUU)Q' >> 'tM(CAU)Q1' 'tF(GAA)Q' 'tT(XXX)Q2' 'tV(UAC)Q' 'tM(CAU)Q2' 'RPM1' 'snR80' >> 'snR67' 'snR53' 'tG(GCC)E' 'tS(AGA)E' >> 'tM(CAU)E' 'RPR1' 'tQ(UUG)E2' 'tK(CUU)E1' 'tR(UCU)E' 'snR14' >> 'tE(UUC)E1' 'tH(GUG)E1' 'tQ(UUG)E1' 'tS(UGA)E' 'tA(UGC)E' 'SRG1' >> 'tE(UUC)E2' 'snR4' 'snR52' 'tH(GUG)E2' 'tK(CUU)E2' 'tV(AAC)E1' 'SCR1' >> 'tI(AAU)E1' 'tV(AAC)E2' ' [... truncated] >> >> Regarding which, the following thread may be of interest: >> https://stat.ethz.ch/pipermail/bioconductor/2010-July/034568.html >> https://stat.ethz.ch/pipermail/bioconductor/2012-March/044214.html >> >> http://permalink.gmane.org/gmane.science.biology.informatics.conduc tor/30105 >> >> In the last thread, Herve, you wonder: >> > Should we allow >> > the user to filter CDSs based on this status? Or should we import only >> > complete CDSs? Or we import all the CDSs but we store in the metadata >> > table of the TranscriptDb object (and then display this in the show >> > method) the fact that not all the CDSs are complete? >> >> In my case, a great workaround would be to provide option the drop (with >> warning) the incomplete ones. Or, somehow interrogate the tr.db for which >> have this problem so I may drop them myself. >> >> Tengfie, It would be great if any fix that works in the development >> version can be ported to the release branch as well. >> >> Cheers, >> >> ~Thanks, >> >> Malcolm >> >> library(ggbio) >> library(GenomicFeatures) >> >> tr.db<- >> makeTranscriptDbFromUCSC( >> ,genome='sacCer3' >> ,tablename='ensGene' >> ) >> >> tr.by.gn.grl<-transcriptsBy(tr.db,'gene') >> >> gn.gr<-unlist(range(tr.by.gn.grl),use.names=TRUE) >> >> a2<-geom_alignment(tr.db,which=gn.gr[2]) # this works! >> >> geom_alignment(tr.db,which=gn.gr['HRA1']) # this breaks >> >> >> gn.gr[1] >> sessionInfo() >> >> >> ## whose output is: >> >> >> In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) : >> UCSC data anomaly in 434 transcript(s): the cds cumulative length is >> not a multiple of 3 for transcripts 'HRA1' 'tP(UGG)A' 'snR18' 'tA(UGC)A' >> 'tL(CAA)A' 'tS(AGA)A' 'YAR061W' 'YAR062W' 'tP(UGG)Q' '15S_rRNA' 'tW(UCA)Q' >> 'tE(UUC)Q' 'tS(UGA)Q2' '21S_rRNA' 'tT(UGU)Q1' 'tC(GCA)Q' 'tH(GUG)Q' >> 'tL(UAA)Q' 'tQ(UUG)Q' 'tK(UUU)Q' 'tR(UCU)Q1' 'tG(UCC)Q' 'tD(GUC)Q' >> 'tS(GCU)Q1' 'tR(ACG)Q2' 'tA(UGC)Q' 'tI(GAU)Q' 'tY(GUA)Q' 'tN(GUU)Q' >> 'tM(CAU)Q1' 'tF(GAA)Q' 'tT(XXX)Q2' 'tV(UAC)Q' 'tM(CAU)Q2' 'RPM1' 'snR80' >> 'snR67' 'snR53' 'tG(GCC)E' 'tS(AGA)E' >> 'tM(CAU)E' 'RPR1' 'tQ(UUG)E2' 'tK(CUU)E1' 'tR(UCU)E' 'snR14' >> 'tE(UUC)E1' 'tH(GUG)E1' 'tQ(UUG)E1' 'tS(UGA)E' 'tA(UGC)E' 'SRG1' >> 'tE(UUC)E2' 'snR4' 'snR52' 'tH(GUG)E2' 'tK(CUU)E2' 'tV(AAC)E1' 'SCR1' >> 'tI(AAU)E1' 'tV(AAC)E2' ' [... truncated] >> >> Aggregating TranscriptDb... >> Parsing exons... >> Parsing cds... >> Parsing transcripts... >> Parsing utrs and aggregating... >> Done >> Constructing graphics... >> > > >> Aggregating TranscriptDb... >> Parsing exons... >> Parsing cds... >> Parsing transcripts... >> Parsing utrs and aggregating... >> Error in data.frame(tx_id = .nms, tx_name = .tx.nms, gene_id = .gid.nms, >> : >> arguments imply differing number of rows: 0, 1 >> > > > >> GRanges with 1 range and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> 15S_rRNA chrM [6546, 8194] + >> --- >> seqlengths: >> chrI chrII chrIII chrIV chrV chrVI chrVII chrVIII >> chrIX chrX chrXI chrXII chrXIII chrXIV chrXV chrXVI chrM >> 230218 813184 316620 1531933 576874 270161 1090940 562643 >> 439888 745751 666816 1078177 924431 784333 1091291 948066 85779 >> > >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 >> LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C >> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices datasets utils methods >> base >> >> other attached packages: >> [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 >> GenomicRanges_1.14.3 XVector_0.2.0 IRanges_1.20.4 >> BiocGenerics_0.8.0 ggbio_1.10.0 ggplot2_0.9.3.1 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.18.0 Biostrings_2.30.0 biovizBase_1.10.0 >> bitops_1.0-6 BSgenome_1.30.0 cluster_1.14.4 >> colorspace_1.2-4 compiler_3.0.2 DBI_0.2-7 >> dichromat_2.0-0 digest_0.6.3 grid_3.0.2 >> gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.12-2 >> labeling_0.2 lattice_0.20-24 MASS_7.3-29 >> munsell_0.4.2 plyr_1.8 proto_0.3-10 >> RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 >> [25] rpart_4.1-3 Rsamtools_1.14.1 RSQLite_0.11.4 >> rtracklayer_1.22.0 scales_0.2.3 stats4_3.0.2 >> stringr_0.6.2 tools_3.0.2 VariantAnnotation_1.8.2 >> XML_3.98-1.1 zlibbioc_1.8.0 >> > >> >> >-----Original Message----- >> >From: bioconductor-bounces@r-project.org [mailto: >> bioconductor-bounces@r-project.org] On Behalf Of Tengfei Yin >> >Sent: Friday, October 18, 2013 12:05 PM >> >To: Alejandro Reyes >> >Cc: bioconductor@r-project.org >> >Subject: Re: [BioC] autoplot transcriptDb error with some regions >> > >> >Hi Alejandro, >> > >> >Thanks for reporting, I believe that's a bug caused by my recent >> >modification in biovizBase package, I am working on that now, will keep >> you >> >updated. >> > >> >Best >> > >> >Tengfei >> > >> > >> >On Fri, Oct 18, 2013 at 12:43 PM, Alejandro Reyes >> ><alejandro.reyes@embl.de>wrote: >> > >> >> Dear Tengfei Yin, >> >> >> >> Firstly, thanks for developing ggbio, it has been very useful for me! >> >> >> >> I am getting an error when using autoplot with some specific genomic >> >> regions in transcriptDb objects, here is an example: >> >> >> >> > suppressMessages( library(ggbio) ) >> >> > suppressMessages(library(**GenomicFeatures)) >> >> > tx <- makeTranscriptDbFromBiomart() >> >> >> >> Aggregating TranscriptDb... >> >> Parsing exons... >> >> Parsing cds... >> >> Parsing transcripts... >> >> Parsing utrs and aggregating... >> >> Done >> >> Constructing graphics... >> >> >> >> prueba <- GRanges( 16, IRanges( start=69598997, 69718569 ) ) >> >> autoplot( tx, prueba, group.selfish=TRUE, names.expr="") >> >> >> >> Aggregating TranscriptDb... >> >> Parsing exons... >> >> Parsing cds... >> >> Parsing transcripts... >> >> Parsing utrs and aggregating... >> >> Done >> >> Constructing graphics... >> >> >> >> So far, excellent, however, when I look into a smaller region I get an >> >> error message: >> >> >> >> > prueba <- GRanges( "16", IRanges(start=69718724, end=69720078 )) >> >> > autoplot( tx, prueba, group.selfish=TRUE, names.expr="") >> >> Aggregating TranscriptDb... >> >> Parsing exons... >> >> Parsing cds... >> >> Parsing transcripts... >> >> Parsing utrs and aggregating... >> >> Error in DataFrame(...) : different row counts implied by arguments >> >> >> >> I believe it has to do with recent modifications of ggbio, since I do >> not >> >> get the error message with older versions, e.g. 1.9.7. >> >> >> >> > sessionInfo() >> >> R Under development (unstable) (2013-07-01 r63121) >> >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> >> >> locale: >> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> >> >> attached base packages: >> >> [1] parallel stats graphics grDevices utils datasets methods >> >> [8] base >> >> >> >> other attached packages: >> >> [1] ggbio_1.11.0 ggplot2_0.9.3.1 GenomicFeatures_1.15.0 >> >> [4] AnnotationDbi_1.23.28 Biobase_2.21.7 GenomicRanges_1.13.56 >> >> [7] XVector_0.1.4 IRanges_1.19.40 BiocGenerics_0.7.8 >> >> [10] BiocInstaller_1.13.1 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] biomaRt_2.17.3 Biostrings_2.29.19 biovizBase_1.9.4 >> >> [4] bitops_1.0-6 BSgenome_1.29.1 cluster_1.14.4 >> >> [7] colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 >> >> [10] digest_0.6.3 grid_3.1.0 gridExtra_0.9.1 >> >> [13] gtable_0.1.2 Hmisc_3.12-2 labeling_0.2 >> >> [16] lattice_0.20-24 MASS_7.3-29 munsell_0.4.2 >> >> [19] plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5 >> >> [22] RCurl_1.95-4.1 reshape2_1.2.2 rpart_4.1-3 >> >> [25] Rsamtools_1.13.53 RSQLite_0.11.4 rtracklayer_1.21.14 >> >> [28] scales_0.2.3 stats4_3.1.0 stringr_0.6.2 >> >> [31] tools_3.1.0 VariantAnnotation_1.7.57 XML_3.98-1.1 >> >> [34] zlibbioc_1.7.0 >> >> >> >> Best regards, >> >> Alejandro Reyes >> >> >> > >> > >> > >> >-- >> >Tengfei Yin, PhD >> >Seven Bridges Genomics >> >sbgenomics.com >> >625 Mt. Auburn St. Suite #208 >> >Cambridge, MA 02138 >> >(617) 866-0446 >> > >> > [[alternative HTML version deleted]] >> > >> >_______________________________________________ >> >Bioconductor mailing list >> >Bioconductor@r-project.org >> >https://stat.ethz.ch/mailman/listinfo/bioconductor >> >Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > Tengfei Yin, PhD > Seven Bridges Genomics > sbgenomics.com > 625 Mt. Auburn St. Suite #208 > Cambridge, MA 02138 > (617) 866-0446 > -- Tengfei Yin, PhD Seven Bridges Genomics sbgenomics.com 625 Mt. Auburn St. Suite #208 Cambridge, MA 02138 (617) 866-0446 [[alternative HTML version deleted]]

ADD REPLY • link 12.2 years ago Tengfei Yin ▴ 490

Login before adding your answer.