Question: GenomicFeatures makeTranscriptDbFromBiomart failure
0
gravatar for Tim Rayner
7.5 years ago by
Tim Rayner270
Tim Rayner270 wrote:
Hi, I'm trying to make a TranscriptDb from the Ensembl human Biomart, but I've run into a problem. As shown below, the equivalent operation for the mouse Biomart works fine: > # Mouse TranscriptDb created without a hitch: > txdb.Mm <- makeTranscriptDbFromBiomart(biomart='ensembl', dataset='mmusculus_gene_ensembl') Download and preprocess the 'transcripts' data frame ... OK Download and preprocess the 'chrominfo' data frame ... OK Download and preprocess the 'splicings' data frame ... OK Download and preprocess the 'genes' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TranscriptDb object ... OK > # Here's the problem: > txdb.Hs <- makeTranscriptDbFromBiomart(biomart='ensembl', dataset='hsapiens_gene_ensembl') Download and preprocess the 'transcripts' data frame ... OK Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped) Download and preprocess the 'splicings' data frame ... Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 800380 did not have 11 elements > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 Biobase_2.14.0 [4] GenomicRanges_1.6.1 IRanges_1.12.1 loaded via a namespace (and not attached): [1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5 RCurl_1.6-10 [5] RSQLite_0.10.0 XML_3.4-3 biomaRt_2.10.0 rtracklayer_1.14.1 [9] tools_2.14.0 zlibbioc_1.0.0 I don't know if this is an issue with the Biomart instance or the GenomicFeatures package. I was wondering if anyone had any suggestions as to how I might work around this? On a related note, would it be possible to add the ability to point makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one would, for example, by calling biomaRt::useMart(host='www.ensembl.org', ...))? It would probably be good to be able to pass through the 'archive' argument to useMart as well. Many thanks, Tim Rayner -- Bioinformatician Smith Lab, CIMR University of Cambridge
transcriptdb biomart • 815 views
ADD COMMENTlink modified 7.5 years ago by Michael Lawrence11k • written 7.5 years ago by Tim Rayner270
Answer: GenomicFeatures makeTranscriptDbFromBiomart failure
0
gravatar for Michael Lawrence
7.5 years ago by
United States
Michael Lawrence11k wrote:
On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner <tfrayner@gmail.com> wrote: > Hi, > > I'm trying to make a TranscriptDb from the Ensembl human Biomart, but > I've run into a problem. As shown below, the equivalent operation for > the mouse Biomart works fine: > > > # Mouse TranscriptDb created without a hitch: > > txdb.Mm <- makeTranscriptDbFromBiomart(biomart='ensembl', > dataset='mmusculus_gene_ensembl') > Download and preprocess the 'transcripts' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... OK > Download and preprocess the 'splicings' data frame ... OK > Download and preprocess the 'genes' data frame ... OK > Prepare the 'metadata' data frame ... OK > Make the TranscriptDb object ... OK > > > # Here's the problem: > > txdb.Hs <- makeTranscriptDbFromBiomart(biomart='ensembl', > dataset='hsapiens_gene_ensembl') > Download and preprocess the 'transcripts' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped) > Download and preprocess the 'splicings' data frame ... Error in > scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > line 800380 did not have 11 elements > > > sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 Biobase_2.14.0 > [4] GenomicRanges_1.6.1 IRanges_1.12.1 > > loaded via a namespace (and not attached): > [1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5 RCurl_1.6-10 > [5] RSQLite_0.10.0 XML_3.4-3 biomaRt_2.10.0 > rtracklayer_1.14.1 > [9] tools_2.14.0 zlibbioc_1.0.0 > > I don't know if this is an issue with the Biomart instance or the > GenomicFeatures package. I was wondering if anyone had any suggestions > as to how I might work around this? > > On a related note, would it be possible to add the ability to point > makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one > would, for example, by calling > biomaRt::useMart(host='www.ensembl.org', ...))? We've submitted a patch that does just this, as well as supporting an attribute prefix string for selecting alternative gene models. > It would probably be > good to be able to pass through the 'archive' argument to useMart as > well. > > Many thanks, > > Tim Rayner > > -- > Bioinformatician > Smith Lab, CIMR > University of Cambridge > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 7.5 years ago by Michael Lawrence11k
Hi Tim, There was a small bug last week for this method caused by a decision at ensembl to start supporting psuedoautosomal regions, but it was fixed last week and should be fixed with the version of GenomicFeatures reported here. I just ran your code locally 4 minutes ago and it still works here. The only difference I can see is that my GRanges package is one version higher than yours (GenomicRanges_1.6.2). Please update that package and then run it again and see if you have better luck with ensembl. The patch that Michael mentioned actually arrived at the exact moment that I was testing the bug fix above which means that it has a some conflicts I will have to resolve, but it should be added to devel very soon. Marc On 11/08/2011 03:55 AM, Michael Lawrence wrote: > On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner<tfrayner at="" gmail.com=""> wrote: > >> Hi, >> >> I'm trying to make a TranscriptDb from the Ensembl human Biomart, but >> I've run into a problem. As shown below, the equivalent operation for >> the mouse Biomart works fine: >> >>> # Mouse TranscriptDb created without a hitch: >>> txdb.Mm<- makeTranscriptDbFromBiomart(biomart='ensembl', >> dataset='mmusculus_gene_ensembl') >> Download and preprocess the 'transcripts' data frame ... OK >> Download and preprocess the 'chrominfo' data frame ... OK >> Download and preprocess the 'splicings' data frame ... OK >> Download and preprocess the 'genes' data frame ... OK >> Prepare the 'metadata' data frame ... OK >> Make the TranscriptDb object ... OK >> >>> # Here's the problem: >>> txdb.Hs<- makeTranscriptDbFromBiomart(biomart='ensembl', >> dataset='hsapiens_gene_ensembl') >> Download and preprocess the 'transcripts' data frame ... OK >> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped) >> Download and preprocess the 'splicings' data frame ... Error in >> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : >> line 800380 did not have 11 elements >> >>> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 Biobase_2.14.0 >> [4] GenomicRanges_1.6.1 IRanges_1.12.1 >> >> loaded via a namespace (and not attached): >> [1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5 RCurl_1.6-10 >> [5] RSQLite_0.10.0 XML_3.4-3 biomaRt_2.10.0 >> rtracklayer_1.14.1 >> [9] tools_2.14.0 zlibbioc_1.0.0 >> >> I don't know if this is an issue with the Biomart instance or the >> GenomicFeatures package. I was wondering if anyone had any suggestions >> as to how I might work around this? >> >> On a related note, would it be possible to add the ability to point >> makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one >> would, for example, by calling >> biomaRt::useMart(host='www.ensembl.org', ...))? > > We've submitted a patch that does just this, as well as supporting an > attribute prefix string for selecting alternative gene models. > > >> It would probably be >> good to be able to pass through the 'archive' argument to useMart as >> well. >> >> Many thanks, >> >> Tim Rayner >> >> -- >> Bioinformatician >> Smith Lab, CIMR >> University of Cambridge >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 7.5 years ago by Marc Carlson7.2k
Hi Marc, Thanks very much for looking into this, and also to Michael for providing the patch. I've upgraded my GRanges package and the code now runs with a couple of warnings: > txdb.Hs2 <- makeTranscriptDbFromBiomart(biomart='ensembl', dataset='hsapiens_gene_ensembl') Download and preprocess the 'transcripts' data frame ... OK Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped) Download and preprocess the 'splicings' data frame ... OK Download and preprocess the 'genes' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TranscriptDb object ... OK Warning messages: 1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste(labels, : duplicated levels will not be allowed in factors anymore 2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste(labels, : duplicated levels will not be allowed in factors anymore 3: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) : chromosome lengths and circularity flags are not available for this TranscriptDb object So I think the problem is basically fixed. I wonder if perhaps the issue was caused by truncated data transfers; I observed several similar failures earlier yesterday afternoon, but in each case the problem seemed to occur at a different point in the process. Thanks again, Tim On 8 November 2011 20:16, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: > Hi Tim, > > There was a small bug last week for this method caused by a decision at > ensembl to start supporting psuedoautosomal regions, but it was fixed last > week and should be fixed with the version of GenomicFeatures reported here. > ?I just ran your code locally 4 minutes ago and it still works here. ?The > only difference I can see is that my GRanges package is one version higher > than yours (GenomicRanges_1.6.2). ?Please update that package and then run > it again and see if you have better luck with ensembl. > > The patch that Michael mentioned actually arrived at the exact moment that I > was testing the bug fix above which means that it has a some conflicts I > will have to resolve, but it should be added to devel very soon. > > > ?Marc > > > > On 11/08/2011 03:55 AM, Michael Lawrence wrote: >> >> On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner<tfrayner at="" gmail.com=""> ?wrote: >> >>> Hi, >>> >>> I'm trying to make a TranscriptDb from the Ensembl human Biomart, but >>> I've run into a problem. As shown below, the equivalent operation for >>> the mouse Biomart works fine: >>> >>>> # Mouse TranscriptDb created without a hitch: >>>> txdb.Mm<- makeTranscriptDbFromBiomart(biomart='ensembl', >>> >>> dataset='mmusculus_gene_ensembl') >>> Download and preprocess the 'transcripts' data frame ... OK >>> Download and preprocess the 'chrominfo' data frame ... OK >>> Download and preprocess the 'splicings' data frame ... OK >>> Download and preprocess the 'genes' data frame ... OK >>> Prepare the 'metadata' data frame ... OK >>> Make the TranscriptDb object ... OK >>> >>>> # Here's the problem: >>>> txdb.Hs<- makeTranscriptDbFromBiomart(biomart='ensembl', >>> >>> dataset='hsapiens_gene_ensembl') >>> Download and preprocess the 'transcripts' data frame ... OK >>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> >>> ?skipped) >>> Download and preprocess the 'splicings' data frame ... Error in >>> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: >>> ?line 800380 did not have 11 elements >>> >>>> sessionInfo() >>> >>> R version 2.14.0 (2011-10-31) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>> >>> other attached packages: >>> [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 ?Biobase_2.14.0 >>> [4] GenomicRanges_1.6.1 ? IRanges_1.12.1 >>> >>> loaded via a namespace (and not attached): >>> ?[1] BSgenome_1.22.0 ? ?Biostrings_2.22.0 ?DBI_0.2-5 >>> ?RCurl_1.6-10 >>> ?[5] RSQLite_0.10.0 ? ? XML_3.4-3 ? ? ? ? ?biomaRt_2.10.0 >>> rtracklayer_1.14.1 >>> ?[9] tools_2.14.0 ? ? ? zlibbioc_1.0.0 >>> >>> I don't know if this is an issue with the Biomart instance or the >>> GenomicFeatures package. I was wondering if anyone had any suggestions >>> as to how I might work around this? >>> >>> On a related note, would it be possible to add the ability to point >>> makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one >>> would, for example, by calling >>> biomaRt::useMart(host='www.ensembl.org', ...))? >> >> We've submitted a patch that does just this, as well as supporting an >> attribute prefix string for selecting alternative gene models. >> >> >>> It would probably be >>> good to be able to pass through the 'archive' argument to useMart as >>> well. >>> >>> Many thanks, >>> >>> Tim Rayner >>> >>> -- >>> Bioinformatician >>> Smith Lab, CIMR >>> University of Cambridge >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLYlink written 7.5 years ago by Tim Rayner270
Hi, On 11-11-09 03:33 AM, Tim Rayner wrote: > Hi Marc, > > Thanks very much for looking into this, and also to Michael for > providing the patch. I've upgraded my GRanges package and the code now > runs with a couple of warnings: > >> txdb.Hs2<- makeTranscriptDbFromBiomart(biomart='ensembl', dataset='hsapiens_gene_ensembl') > Download and preprocess the 'transcripts' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped) > Download and preprocess the 'splicings' data frame ... OK > Download and preprocess the 'genes' data frame ... OK > Prepare the 'metadata' data frame ... OK > Make the TranscriptDb object ... OK > Warning messages: > 1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) > else paste(labels, : > duplicated levels will not be allowed in factors anymore > 2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) > else paste(labels, : > duplicated levels will not be allowed in factors anymore > 3: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) : > chromosome lengths and circularity flags are not available for this > TranscriptDb object The 2 first warnings + the fact that downloading the chrominfo failed is not looking good. Didn't use to be like that. We'll investigate on our side and report later. Cheers, H. > > So I think the problem is basically fixed. I wonder if perhaps the > issue was caused by truncated data transfers; I observed several > similar failures earlier yesterday afternoon, but in each case the > problem seemed to occur at a different point in the process. > > Thanks again, > > Tim > > On 8 November 2011 20:16, Marc Carlson<mcarlson at="" fhcrc.org=""> wrote: >> Hi Tim, >> >> There was a small bug last week for this method caused by a decision at >> ensembl to start supporting psuedoautosomal regions, but it was fixed last >> week and should be fixed with the version of GenomicFeatures reported here. >> I just ran your code locally 4 minutes ago and it still works here. The >> only difference I can see is that my GRanges package is one version higher >> than yours (GenomicRanges_1.6.2). Please update that package and then run >> it again and see if you have better luck with ensembl. >> >> The patch that Michael mentioned actually arrived at the exact moment that I >> was testing the bug fix above which means that it has a some conflicts I >> will have to resolve, but it should be added to devel very soon. >> >> >> Marc >> >> >> >> On 11/08/2011 03:55 AM, Michael Lawrence wrote: >>> >>> On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner<tfrayner at="" gmail.com=""> wrote: >>> >>>> Hi, >>>> >>>> I'm trying to make a TranscriptDb from the Ensembl human Biomart, but >>>> I've run into a problem. As shown below, the equivalent operation for >>>> the mouse Biomart works fine: >>>> >>>>> # Mouse TranscriptDb created without a hitch: >>>>> txdb.Mm<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>> >>>> dataset='mmusculus_gene_ensembl') >>>> Download and preprocess the 'transcripts' data frame ... OK >>>> Download and preprocess the 'chrominfo' data frame ... OK >>>> Download and preprocess the 'splicings' data frame ... OK >>>> Download and preprocess the 'genes' data frame ... OK >>>> Prepare the 'metadata' data frame ... OK >>>> Make the TranscriptDb object ... OK >>>> >>>>> # Here's the problem: >>>>> txdb.Hs<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>> >>>> dataset='hsapiens_gene_ensembl') >>>> Download and preprocess the 'transcripts' data frame ... OK >>>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> >>>> skipped) >>>> Download and preprocess the 'splicings' data frame ... Error in >>>> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : >>>> line 800380 did not have 11 elements >>>> >>>>> sessionInfo() >>>> >>>> R version 2.14.0 (2011-10-31) >>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>> >>>> locale: >>>> [1] C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 Biobase_2.14.0 >>>> [4] GenomicRanges_1.6.1 IRanges_1.12.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5 >>>> RCurl_1.6-10 >>>> [5] RSQLite_0.10.0 XML_3.4-3 biomaRt_2.10.0 >>>> rtracklayer_1.14.1 >>>> [9] tools_2.14.0 zlibbioc_1.0.0 >>>> >>>> I don't know if this is an issue with the Biomart instance or the >>>> GenomicFeatures package. I was wondering if anyone had any suggestions >>>> as to how I might work around this? >>>> >>>> On a related note, would it be possible to add the ability to point >>>> makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one >>>> would, for example, by calling >>>> biomaRt::useMart(host='www.ensembl.org', ...))? >>> >>> We've submitted a patch that does just this, as well as supporting an >>> attribute prefix string for selecting alternative gene models. >>> >>> >>>> It would probably be >>>> good to be able to pass through the 'archive' argument to useMart as >>>> well. >>>> >>>> Many thanks, >>>> >>>> Tim Rayner >>>> >>>> -- >>>> Bioinformatician >>>> Smith Lab, CIMR >>>> University of Cambridge >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 7.5 years ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 105 users visited in the last hour