Entering edit mode
Tengfei & Herve,
I too am afflicted with this error and hoping that the following
reproducible example will hasten a patch.
I am unsure but speculate that this error is raised for the same
transcripts wherein makeTranscriptDbFromUCSC issues warning:
In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) :
UCSC data anomaly in 434 transcript(s): the cds cumulative length is
not a multiple of 3 for transcripts 'HRA1' 'tP(UGG)A' 'snR18'
'tA(UGC)A' 'tL(CAA)A' 'tS(AGA)A' 'YAR061W' 'YAR062W' 'tP(UGG)Q'
'15S_rRNA' 'tW(UCA)Q' 'tE(UUC)Q' 'tS(UGA)Q2' '21S_rRNA' 'tT(UGU)Q1'
'tC(GCA)Q' 'tH(GUG)Q' 'tL(UAA)Q' 'tQ(UUG)Q' 'tK(UUU)Q' 'tR(UCU)Q1'
'tG(UCC)Q' 'tD(GUC)Q' 'tS(GCU)Q1' 'tR(ACG)Q2' 'tA(UGC)Q' 'tI(GAU)Q'
'tY(GUA)Q' 'tN(GUU)Q' 'tM(CAU)Q1' 'tF(GAA)Q' 'tT(XXX)Q2' 'tV(UAC)Q'
'tM(CAU)Q2' 'RPM1' 'snR80' 'snR67' 'snR53' 'tG(GCC)E' 'tS(AGA)E'
'tM(CAU)E' 'RPR1' 'tQ(UUG)E2' 'tK(CUU)E1' 'tR(UCU)E' 'snR14'
'tE(UUC)E1' 'tH(GUG)E1' 'tQ(UUG)E1' 'tS(UGA)E' 'tA(UGC)E' 'SRG1'
'tE(UUC)E2' 'snR4' 'snR52' 'tH(GUG)E2' 'tK(CUU)E2' 'tV(AAC)E1' 'SCR1'
'tI(AAU)E1' 'tV(AAC)E2' ' [... truncated]
Regarding which, the following thread may be of interest:
https://stat.ethz.ch/pipermail/bioconductor/2010-July/034568.html
https://stat.ethz.ch/pipermail/bioconductor/2012-March/044214.html
http://permalink.gmane.org/gmane.science.biology.informatics.conductor
/30105
In the last thread, Herve, you wonder:
> Should we allow
> the user to filter CDSs based on this status? Or should we import
only
> complete CDSs? Or we import all the CDSs but we store in the
metadata
> table of the TranscriptDb object (and then display this in the show
> method) the fact that not all the CDSs are complete?
In my case, a great workaround would be to provide option the drop
(with warning) the incomplete ones. Or, somehow interrogate the tr.db
for which have this problem so I may drop them myself.
Tengfie, It would be great if any fix that works in the development
version can be ported to the release branch as well.
Cheers,
~Thanks,
Malcolm
library(ggbio)
library(GenomicFeatures)
tr.db<-
makeTranscriptDbFromUCSC(
,genome='sacCer3'
,tablename='ensGene'
)
tr.by.gn.grl<-transcriptsBy(tr.db,'gene')
gn.gr<-unlist(range(tr.by.gn.grl),use.names=TRUE)
a2<-geom_alignment(tr.db,which=gn.gr[2]) # this works!
geom_alignment(tr.db,which=gn.gr['HRA1']) # this breaks
gn.gr[1]
sessionInfo()
## whose output is:
In .extractCdsLocsFromUCSCTxTable(ucsc_txtable, exon_locs) :
UCSC data anomaly in 434 transcript(s): the cds cumulative length is
not a multiple of 3 for transcripts 'HRA1' 'tP(UGG)A' 'snR18'
'tA(UGC)A' 'tL(CAA)A' 'tS(AGA)A' 'YAR061W' 'YAR062W' 'tP(UGG)Q'
'15S_rRNA' 'tW(UCA)Q' 'tE(UUC)Q' 'tS(UGA)Q2' '21S_rRNA' 'tT(UGU)Q1'
'tC(GCA)Q' 'tH(GUG)Q' 'tL(UAA)Q' 'tQ(UUG)Q' 'tK(UUU)Q' 'tR(UCU)Q1'
'tG(UCC)Q' 'tD(GUC)Q' 'tS(GCU)Q1' 'tR(ACG)Q2' 'tA(UGC)Q' 'tI(GAU)Q'
'tY(GUA)Q' 'tN(GUU)Q' 'tM(CAU)Q1' 'tF(GAA)Q' 'tT(XXX)Q2' 'tV(UAC)Q'
'tM(CAU)Q2' 'RPM1' 'snR80' 'snR67' 'snR53' 'tG(GCC)E' 'tS(AGA)E'
'tM(CAU)E' 'RPR1' 'tQ(UUG)E2' 'tK(CUU)E1' 'tR(UCU)E' 'snR14'
'tE(UUC)E1' 'tH(GUG)E1' 'tQ(UUG)E1' 'tS(UGA)E' 'tA(UGC)E' 'SRG1'
'tE(UUC)E2' 'snR4' 'snR52' 'tH(GUG)E2' 'tK(CUU)E2' 'tV(AAC)E1' 'SCR1'
'tI(AAU)E1' 'tV(AAC)E2' ' [... truncated]
Aggregating TranscriptDb...
Parsing exons...
Parsing cds...
Parsing transcripts...
Parsing utrs and aggregating...
Done
Constructing graphics...
> >
Aggregating TranscriptDb...
Parsing exons...
Parsing cds...
Parsing transcripts...
Parsing utrs and aggregating...
Error in data.frame(tx_id = .nms, tx_name = .tx.nms, gene_id =
.gid.nms, :
arguments imply differing number of rows: 0, 1
> > >
GRanges with 1 range and 0 metadata columns:
seqnames ranges strand
<rle> <iranges> <rle>
15S_rRNA chrM [6546, 8194] +
---
seqlengths:
chrI chrII chrIII chrIV chrV chrVI chrVII chrVIII
chrIX chrX chrXI chrXII chrXIII chrXIV chrXV chrXVI chrM
230218 813184 316620 1531933 576874 270161 1090940 562643
439888 745751 666816 1078177 924431 784333 1091291 948066 85779
>
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices datasets utils
methods base
other attached packages:
[1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0
GenomicRanges_1.14.3 XVector_0.2.0 IRanges_1.20.4
BiocGenerics_0.8.0 ggbio_1.10.0 ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] biomaRt_2.18.0 Biostrings_2.30.0 biovizBase_1.10.0
bitops_1.0-6 BSgenome_1.30.0 cluster_1.14.4
colorspace_1.2-4 compiler_3.0.2 DBI_0.2-7
dichromat_2.0-0 digest_0.6.3 grid_3.0.2
gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.12-2
labeling_0.2 lattice_0.20-24 MASS_7.3-29
munsell_0.4.2 plyr_1.8 proto_0.3-10
RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2
[25] rpart_4.1-3 Rsamtools_1.14.1 RSQLite_0.11.4
rtracklayer_1.22.0 scales_0.2.3 stats4_3.0.2
stringr_0.6.2 tools_3.0.2
VariantAnnotation_1.8.2 XML_3.98-1.1 zlibbioc_1.8.0
>
>-----Original Message-----
>From: bioconductor-bounces at r-project.org [mailto:bioconductor-
bounces at r-project.org] On Behalf Of Tengfei Yin
>Sent: Friday, October 18, 2013 12:05 PM
>To: Alejandro Reyes
>Cc: bioconductor at r-project.org
>Subject: Re: [BioC] autoplot transcriptDb error with some regions
>
>Hi Alejandro,
>
>Thanks for reporting, I believe that's a bug caused by my recent
>modification in biovizBase package, I am working on that now, will
keep you
>updated.
>
>Best
>
>Tengfei
>
>
>On Fri, Oct 18, 2013 at 12:43 PM, Alejandro Reyes
><alejandro.reyes at="" embl.de="">wrote:
>
>> Dear Tengfei Yin,
>>
>> Firstly, thanks for developing ggbio, it has been very useful for
me!
>>
>> I am getting an error when using autoplot with some specific
genomic
>> regions in transcriptDb objects, here is an example:
>>
>> > suppressMessages( library(ggbio) )
>> > suppressMessages(library(**GenomicFeatures))
>> > tx <- makeTranscriptDbFromBiomart()
>>
>> Aggregating TranscriptDb...
>> Parsing exons...
>> Parsing cds...
>> Parsing transcripts...
>> Parsing utrs and aggregating...
>> Done
>> Constructing graphics...
>>
>> prueba <- GRanges( 16, IRanges( start=69598997, 69718569 ) )
>> autoplot( tx, prueba, group.selfish=TRUE, names.expr="")
>>
>> Aggregating TranscriptDb...
>> Parsing exons...
>> Parsing cds...
>> Parsing transcripts...
>> Parsing utrs and aggregating...
>> Done
>> Constructing graphics...
>>
>> So far, excellent, however, when I look into a smaller region I
get an
>> error message:
>>
>> > prueba <- GRanges( "16", IRanges(start=69718724, end=69720078 ))
>> > autoplot( tx, prueba, group.selfish=TRUE, names.expr="")
>> Aggregating TranscriptDb...
>> Parsing exons...
>> Parsing cds...
>> Parsing transcripts...
>> Parsing utrs and aggregating...
>> Error in DataFrame(...) : different row counts implied by
arguments
>>
>> I believe it has to do with recent modifications of ggbio, since I
do not
>> get the error message with older versions, e.g. 1.9.7.
>>
>> > sessionInfo()
>> R Under development (unstable) (2013-07-01 r63121)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> [8] base
>>
>> other attached packages:
>> [1] ggbio_1.11.0 ggplot2_0.9.3.1 GenomicFeatures_1.15.0
>> [4] AnnotationDbi_1.23.28 Biobase_2.21.7 GenomicRanges_1.13.56
>> [7] XVector_0.1.4 IRanges_1.19.40 BiocGenerics_0.7.8
>> [10] BiocInstaller_1.13.1
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.17.3 Biostrings_2.29.19 biovizBase_1.9.4
>> [4] bitops_1.0-6 BSgenome_1.29.1 cluster_1.14.4
>> [7] colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0
>> [10] digest_0.6.3 grid_3.1.0 gridExtra_0.9.1
>> [13] gtable_0.1.2 Hmisc_3.12-2 labeling_0.2
>> [16] lattice_0.20-24 MASS_7.3-29 munsell_0.4.2
>> [19] plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5
>> [22] RCurl_1.95-4.1 reshape2_1.2.2 rpart_4.1-3
>> [25] Rsamtools_1.13.53 RSQLite_0.11.4 rtracklayer_1.21.14
>> [28] scales_0.2.3 stats4_3.1.0 stringr_0.6.2
>> [31] tools_3.1.0 VariantAnnotation_1.7.57
XML_3.98-1.1
>> [34] zlibbioc_1.7.0
>>
>> Best regards,
>> Alejandro Reyes
>>
>
>
>
>--
>Tengfei Yin, PhD
>Seven Bridges Genomics
>sbgenomics.com
>625 Mt. Auburn St. Suite #208
>Cambridge, MA 02138
>(617) 866-0446
>
> [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor