GenomicFeatures makeTranscriptDbFromBiomart failure
2
0
Entering edit mode
Malcolm Cook ★ 1.6k
@malcolm-cook-6293
Last seen 1 day ago
United States
Hi, > The patch that Michael mentioned actually arrived at the exact moment > that I was testing the bug fix above which means that it has a some > conflicts I will have to resolve, but it should be added to devel very soon. I am inquiring after the status of the patch that allows 'host' argument to makeTranscriptDbFromBiomart, which I would like to call as makeTranscriptDbFromBiomart(biomart = "fungi_mart_12" ,host="fungi.ensembl.org" ,dataset = "spombe_eg_gene" ) Do you have any suggestions for how to proceed with my aim? Thanks! Malcolm Cook - Stowers Institute
• 1.3k views
ADD COMMENT
0
Entering edit mode
Cory Barr ▴ 60
@cory-barr-4429
Last seen 9.6 years ago
The patch needs some minor fixing. I'll send a corrected version to Marc today. -Cory On Tuesday, January 3, 2012, Malcolm Cook <mec@stowers.org> wrote: > Hi, > >> The patch that Michael mentioned actually arrived at the exact moment >> that I was testing the bug fix above which means that it has a some >> conflicts I will have to resolve, but it should be added to devel very soon. > > I am inquiring after the status of the patch that allows 'host' argument to > makeTranscriptDbFromBiomart, which I would like to call as > > > makeTranscriptDbFromBiomart(biomart = "fungi_mart_12" > ,host="fungi.ensembl.org" > ,dataset = "spombe_eg_gene" > ) > > > Do you have any suggestions for how to proceed with my aim? > > Thanks! > > Malcolm Cook - Stowers Institute > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 2 hours ago
Seattle, WA, United States
Hi Tim, On 11/09/2011 10:27 AM, Hervé Pagès wrote: > Hi, > > On 11-11-09 03:33 AM, Tim Rayner wrote: >> Hi Marc, >> >> Thanks very much for looking into this, and also to Michael for >> providing the patch. I've upgraded my GRanges package and the code now >> runs with a couple of warnings: >> >>> txdb.Hs2<- makeTranscriptDbFromBiomart(biomart='ensembl', >>> dataset='hsapiens_gene_ensembl') >> Download and preprocess the 'transcripts' data frame ... OK >> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> >> skipped) >> Download and preprocess the 'splicings' data frame ... OK >> Download and preprocess the 'genes' data frame ... OK >> Prepare the 'metadata' data frame ... OK >> Make the TranscriptDb object ... OK >> Warning messages: >> 1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) >> else paste(labels, : >> duplicated levels will not be allowed in factors anymore >> 2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) >> else paste(labels, : >> duplicated levels will not be allowed in factors anymore >> 3: In .normargChrominfo(chrominfo, transcripts$tx_chrom, >> splicings$exon_chrom) : >> chromosome lengths and circularity flags are not available for this >> TranscriptDb object > > The 2 first warnings + the fact that downloading the chrominfo failed > is not looking good. Didn't use to be like that. We'll investigate on > our side and report later. The problem that was preventing makeTranscriptDbFromBiomart() to fetch the 'chrominfo' data frame (i.e. chromosome lengths) from Ensembl has been fixed. Make sure you update to the latest version of GenomicFeatures (v 1.6.5 in BioC release, v 1.7.8 in BioC devel). Available via biocLite(). The warnings about duplicated levels still need to be investigated. Cheers, H. > > Cheers, > H. > >> >> So I think the problem is basically fixed. I wonder if perhaps the >> issue was caused by truncated data transfers; I observed several >> similar failures earlier yesterday afternoon, but in each case the >> problem seemed to occur at a different point in the process. >> >> Thanks again, >> >> Tim >> >> On 8 November 2011 20:16, Marc Carlson<mcarlson at="" fhcrc.org=""> wrote: >>> Hi Tim, >>> >>> There was a small bug last week for this method caused by a decision at >>> ensembl to start supporting psuedoautosomal regions, but it was fixed >>> last >>> week and should be fixed with the version of GenomicFeatures reported >>> here. >>> I just ran your code locally 4 minutes ago and it still works here. The >>> only difference I can see is that my GRanges package is one version >>> higher >>> than yours (GenomicRanges_1.6.2). Please update that package and then >>> run >>> it again and see if you have better luck with ensembl. >>> >>> The patch that Michael mentioned actually arrived at the exact moment >>> that I >>> was testing the bug fix above which means that it has a some conflicts I >>> will have to resolve, but it should be added to devel very soon. >>> >>> >>> Marc >>> >>> >>> >>> On 11/08/2011 03:55 AM, Michael Lawrence wrote: >>>> >>>> On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner<tfrayner at="" gmail.com=""> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm trying to make a TranscriptDb from the Ensembl human Biomart, but >>>>> I've run into a problem. As shown below, the equivalent operation for >>>>> the mouse Biomart works fine: >>>>> >>>>>> # Mouse TranscriptDb created without a hitch: >>>>>> txdb.Mm<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>>> >>>>> dataset='mmusculus_gene_ensembl') >>>>> Download and preprocess the 'transcripts' data frame ... OK >>>>> Download and preprocess the 'chrominfo' data frame ... OK >>>>> Download and preprocess the 'splicings' data frame ... OK >>>>> Download and preprocess the 'genes' data frame ... OK >>>>> Prepare the 'metadata' data frame ... OK >>>>> Make the TranscriptDb object ... OK >>>>> >>>>>> # Here's the problem: >>>>>> txdb.Hs<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>>> >>>>> dataset='hsapiens_gene_ensembl') >>>>> Download and preprocess the 'transcripts' data frame ... OK >>>>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> >>>>> skipped) >>>>> Download and preprocess the 'splicings' data frame ... Error in >>>>> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : >>>>> line 800380 did not have 11 elements >>>>> >>>>>> sessionInfo() >>>>> >>>>> R version 2.14.0 (2011-10-31) >>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>>> >>>>> locale: >>>>> [1] C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 Biobase_2.14.0 >>>>> [4] GenomicRanges_1.6.1 IRanges_1.12.1 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5 >>>>> RCurl_1.6-10 >>>>> [5] RSQLite_0.10.0 XML_3.4-3 biomaRt_2.10.0 >>>>> rtracklayer_1.14.1 >>>>> [9] tools_2.14.0 zlibbioc_1.0.0 >>>>> >>>>> I don't know if this is an issue with the Biomart instance or the >>>>> GenomicFeatures package. I was wondering if anyone had any suggestions >>>>> as to how I might work around this? >>>>> >>>>> On a related note, would it be possible to add the ability to point >>>>> makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one >>>>> would, for example, by calling >>>>> biomaRt::useMart(host='www.ensembl.org', ...))? >>>> >>>> We've submitted a patch that does just this, as well as supporting an >>>> attribute prefix string for selecting alternative gene models. >>>> >>>> >>>>> It would probably be >>>>> good to be able to pass through the 'archive' argument to useMart as >>>>> well. >>>>> >>>>> Many thanks, >>>>> >>>>> Tim Rayner >>>>> >>>>> -- >>>>> Bioinformatician >>>>> Smith Lab, CIMR >>>>> University of Cambridge >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Tim, the lastest version of makeTranscriptDbFriomBiomart should let you specify the host argument. -Cory 2012/1/4 Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Tim, > > > On 11/09/2011 10:27 AM, Hervé Pagès wrote: >> >> Hi, >> >> On 11-11-09 03:33 AM, Tim Rayner wrote: >>> >>> Hi Marc, >>> >>> Thanks very much for looking into this, and also to Michael for >>> providing the patch. I've upgraded my GRanges package and the code now >>> runs with a couple of warnings: >>> >>>> txdb.Hs2<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>> dataset='hsapiens_gene_ensembl') >>> >>> Download and preprocess the 'transcripts' data frame ... OK >>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> >>> skipped) >>> Download and preprocess the 'splicings' data frame ... OK >>> Download and preprocess the 'genes' data frame ... OK >>> Prepare the 'metadata' data frame ... OK >>> Make the TranscriptDb object ... OK >>> Warning messages: >>> 1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) >>> else paste(labels, : >>> duplicated levels will not be allowed in factors anymore >>> 2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) >>> else paste(labels, : >>> duplicated levels will not be allowed in factors anymore >>> 3: In .normargChrominfo(chrominfo, transcripts$tx_chrom, >>> splicings$exon_chrom) : >>> chromosome lengths and circularity flags are not available for this >>> TranscriptDb object >> >> >> The 2 first warnings + the fact that downloading the chrominfo failed >> is not looking good. Didn't use to be like that. We'll investigate on >> our side and report later. > > > The problem that was preventing makeTranscriptDbFromBiomart() to > fetch the 'chrominfo' data frame (i.e. chromosome lengths) from > Ensembl has been fixed. Make sure you update to the latest version > of GenomicFeatures (v 1.6.5 in BioC release, v 1.7.8 in BioC > devel). Available via biocLite(). > > The warnings about duplicated levels still need to be investigated. > > Cheers, > > H. > >> >> Cheers, >> H. >> >>> >>> So I think the problem is basically fixed. I wonder if perhaps the >>> issue was caused by truncated data transfers; I observed several >>> similar failures earlier yesterday afternoon, but in each case the >>> problem seemed to occur at a different point in the process. >>> >>> Thanks again, >>> >>> Tim >>> >>> On 8 November 2011 20:16, Marc Carlson<mcarlson at="" fhcrc.org=""> wrote: >>>> >>>> Hi Tim, >>>> >>>> There was a small bug last week for this method caused by a decision at >>>> ensembl to start supporting psuedoautosomal regions, but it was fixed >>>> last >>>> week and should be fixed with the version of GenomicFeatures reported >>>> here. >>>> I just ran your code locally 4 minutes ago and it still works here. The >>>> only difference I can see is that my GRanges package is one version >>>> higher >>>> than yours (GenomicRanges_1.6.2). Please update that package and then >>>> run >>>> it again and see if you have better luck with ensembl. >>>> >>>> The patch that Michael mentioned actually arrived at the exact moment >>>> that I >>>> was testing the bug fix above which means that it has a some conflicts I >>>> will have to resolve, but it should be added to devel very soon. >>>> >>>> >>>> Marc >>>> >>>> >>>> >>>> On 11/08/2011 03:55 AM, Michael Lawrence wrote: >>>>> >>>>> >>>>> On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner<tfrayner at="" gmail.com=""> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm trying to make a TranscriptDb from the Ensembl human Biomart, but >>>>>> I've run into a problem. As shown below, the equivalent operation for >>>>>> the mouse Biomart works fine: >>>>>> >>>>>>> # Mouse TranscriptDb created without a hitch: >>>>>>> txdb.Mm<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>>>> >>>>>> >>>>>> dataset='mmusculus_gene_ensembl') >>>>>> Download and preprocess the 'transcripts' data frame ... OK >>>>>> Download and preprocess the 'chrominfo' data frame ... OK >>>>>> Download and preprocess the 'splicings' data frame ... OK >>>>>> Download and preprocess the 'genes' data frame ... OK >>>>>> Prepare the 'metadata' data frame ... OK >>>>>> Make the TranscriptDb object ... OK >>>>>> >>>>>>> # Here's the problem: >>>>>>> txdb.Hs<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>>>> >>>>>> >>>>>> dataset='hsapiens_gene_ensembl') >>>>>> Download and preprocess the 'transcripts' data frame ... OK >>>>>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> >>>>>> skipped) >>>>>> Download and preprocess the 'splicings' data frame ... Error in >>>>>> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : >>>>>> line 800380 did not have 11 elements >>>>>> >>>>>>> sessionInfo() >>>>>> >>>>>> >>>>>> R version 2.14.0 (2011-10-31) >>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>>>> >>>>>> locale: >>>>>> [1] C >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 Biobase_2.14.0 >>>>>> [4] GenomicRanges_1.6.1 IRanges_1.12.1 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5 >>>>>> RCurl_1.6-10 >>>>>> [5] RSQLite_0.10.0 XML_3.4-3 biomaRt_2.10.0 >>>>>> rtracklayer_1.14.1 >>>>>> [9] tools_2.14.0 zlibbioc_1.0.0 >>>>>> >>>>>> I don't know if this is an issue with the Biomart instance or the >>>>>> GenomicFeatures package. I was wondering if anyone had any suggestions >>>>>> as to how I might work around this? >>>>>> >>>>>> On a related note, would it be possible to add the ability to point >>>>>> makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one >>>>>> would, for example, by calling >>>>>> biomaRt::useMart(host='www.ensembl.org', ...))? >>>>> >>>>> >>>>> We've submitted a patch that does just this, as well as supporting an >>>>> attribute prefix string for selecting alternative gene models. >>>>> >>>>> >>>>>> It would probably be >>>>>> good to be able to pass through the 'archive' argument to useMart as >>>>>> well. >>>>>> >>>>>> Many thanks, >>>>>> >>>>>> Tim Rayner >>>>>> >>>>>> -- >>>>>> Bioinformatician >>>>>> Smith Lab, CIMR >>>>>> University of Cambridge >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: ?(206) 667-5791 > Fax: ? ?(206) 667-1319 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Herv?, Thanks very much for fixing this. I can confirm that GenomicFeatures 1.6.5 works on our Linux server. Interestingly, the warnings about duplicated levels have also now disappeared in that case. I initially ran into a new problem with GenomicFeatures 1.6.6 (see the bug report and session info below); however, when I reinstalled GenomicFeatures using type='source' the error went away. Cheers, Tim ## Successfully running on Linux: > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.6.5 AnnotationDbi_1.16.10 Biobase_2.14.0 [4] GenomicRanges_1.6.4 IRanges_1.12.5 loaded via a namespace (and not attached): [1] biomaRt_2.10.0 Biostrings_2.22.0 BSgenome_1.22.0 DBI_0.2-5 [5] RCurl_1.8-0 RSQLite_0.11.1 rtracklayer_1.14.4 tools_2.14.1 [9] XML_3.6-2 zlibbioc_1.0.0 ## Strange problem with the pre-built package on Mac OS X? (GenomicFeatures 1.6.6) > library(GenomicFeatures) Loading required package: IRanges Attaching package: ?IRanges? The following object(s) are masked from ?package:base?: cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, setdiff, table, union Loading required package: GenomicRanges Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. Attaching package: ?Biobase? The following object(s) are masked from ?package:IRanges?: updateObject Warning message: package ?GenomicFeatures? was built under R version 2.14.1 > makeTranscriptDbFromBiomart( biomart= circ_seqs= dataset= transcript_ids= > txdb <- makeTranscriptDbFromBiomart(biomart='ensembl', dataset='hsapiens_gene_ensembl') Download and preprocess the 'transcripts' data frame ... OK Download and preprocess the 'chrominfo' data frame ... OK Download and preprocess the 'splicings' data frame ... OK Download and preprocess the 'genes' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TranscriptDb object ... Error in callSuper(...) : could not find function "initRefFields" > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.6.6 AnnotationDbi_1.16.10 Biobase_2.14.0 [4] GenomicRanges_1.6.4 IRanges_1.12.5 loaded via a namespace (and not attached): [1] biomaRt_2.10.0 Biostrings_2.22.0 BSgenome_1.22.0 DBI_0.2-5 [5] RCurl_1.8-0 RSQLite_0.11.1 rtracklayer_1.14.4 tools_2.14.0 [9] XML_3.6-2 zlibbioc_1.0.0 2012/1/5 Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Tim, > > > On 11/09/2011 10:27 AM, Hervé Pagès wrote: >> >> Hi, >> >> On 11-11-09 03:33 AM, Tim Rayner wrote: >>> >>> Hi Marc, >>> >>> Thanks very much for looking into this, and also to Michael for >>> providing the patch. I've upgraded my GRanges package and the code now >>> runs with a couple of warnings: >>> >>>> txdb.Hs2<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>> dataset='hsapiens_gene_ensembl') >>> >>> Download and preprocess the 'transcripts' data frame ... OK >>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> >>> skipped) >>> Download and preprocess the 'splicings' data frame ... OK >>> Download and preprocess the 'genes' data frame ... OK >>> Prepare the 'metadata' data frame ... OK >>> Make the TranscriptDb object ... OK >>> Warning messages: >>> 1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) >>> else paste(labels, : >>> duplicated levels will not be allowed in factors anymore >>> 2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) >>> else paste(labels, : >>> duplicated levels will not be allowed in factors anymore >>> 3: In .normargChrominfo(chrominfo, transcripts$tx_chrom, >>> splicings$exon_chrom) : >>> chromosome lengths and circularity flags are not available for this >>> TranscriptDb object >> >> >> The 2 first warnings + the fact that downloading the chrominfo failed >> is not looking good. Didn't use to be like that. We'll investigate on >> our side and report later. > > > The problem that was preventing makeTranscriptDbFromBiomart() to > fetch the 'chrominfo' data frame (i.e. chromosome lengths) from > Ensembl has been fixed. Make sure you update to the latest version > of GenomicFeatures (v 1.6.5 in BioC release, v 1.7.8 in BioC > devel). Available via biocLite(). > > The warnings about duplicated levels still need to be investigated. > > Cheers, > > H. > >> >> Cheers, >> H. >> >>> >>> So I think the problem is basically fixed. I wonder if perhaps the >>> issue was caused by truncated data transfers; I observed several >>> similar failures earlier yesterday afternoon, but in each case the >>> problem seemed to occur at a different point in the process. >>> >>> Thanks again, >>> >>> Tim >>> >>> On 8 November 2011 20:16, Marc Carlson<mcarlson at="" fhcrc.org=""> wrote: >>>> >>>> Hi Tim, >>>> >>>> There was a small bug last week for this method caused by a decision at >>>> ensembl to start supporting psuedoautosomal regions, but it was fixed >>>> last >>>> week and should be fixed with the version of GenomicFeatures reported >>>> here. >>>> I just ran your code locally 4 minutes ago and it still works here. The >>>> only difference I can see is that my GRanges package is one version >>>> higher >>>> than yours (GenomicRanges_1.6.2). Please update that package and then >>>> run >>>> it again and see if you have better luck with ensembl. >>>> >>>> The patch that Michael mentioned actually arrived at the exact moment >>>> that I >>>> was testing the bug fix above which means that it has a some conflicts I >>>> will have to resolve, but it should be added to devel very soon. >>>> >>>> >>>> Marc >>>> >>>> >>>> >>>> On 11/08/2011 03:55 AM, Michael Lawrence wrote: >>>>> >>>>> >>>>> On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner<tfrayner at="" gmail.com=""> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm trying to make a TranscriptDb from the Ensembl human Biomart, but >>>>>> I've run into a problem. As shown below, the equivalent operation for >>>>>> the mouse Biomart works fine: >>>>>> >>>>>>> # Mouse TranscriptDb created without a hitch: >>>>>>> txdb.Mm<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>>>> >>>>>> >>>>>> dataset='mmusculus_gene_ensembl') >>>>>> Download and preprocess the 'transcripts' data frame ... OK >>>>>> Download and preprocess the 'chrominfo' data frame ... OK >>>>>> Download and preprocess the 'splicings' data frame ... OK >>>>>> Download and preprocess the 'genes' data frame ... OK >>>>>> Prepare the 'metadata' data frame ... OK >>>>>> Make the TranscriptDb object ... OK >>>>>> >>>>>>> # Here's the problem: >>>>>>> txdb.Hs<- makeTranscriptDbFromBiomart(biomart='ensembl', >>>>>> >>>>>> >>>>>> dataset='hsapiens_gene_ensembl') >>>>>> Download and preprocess the 'transcripts' data frame ... OK >>>>>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> >>>>>> skipped) >>>>>> Download and preprocess the 'splicings' data frame ... Error in >>>>>> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : >>>>>> line 800380 did not have 11 elements >>>>>> >>>>>>> sessionInfo() >>>>>> >>>>>> >>>>>> R version 2.14.0 (2011-10-31) >>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>>>> >>>>>> locale: >>>>>> [1] C >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 Biobase_2.14.0 >>>>>> [4] GenomicRanges_1.6.1 IRanges_1.12.1 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5 >>>>>> RCurl_1.6-10 >>>>>> [5] RSQLite_0.10.0 XML_3.4-3 biomaRt_2.10.0 >>>>>> rtracklayer_1.14.1 >>>>>> [9] tools_2.14.0 zlibbioc_1.0.0 >>>>>> >>>>>> I don't know if this is an issue with the Biomart instance or the >>>>>> GenomicFeatures package. I was wondering if anyone had any suggestions >>>>>> as to how I might work around this? >>>>>> >>>>>> On a related note, would it be possible to add the ability to point >>>>>> makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one >>>>>> would, for example, by calling >>>>>> biomaRt::useMart(host='www.ensembl.org', ...))? >>>>> >>>>> >>>>> We've submitted a patch that does just this, as well as supporting an >>>>> attribute prefix string for selecting alternative gene models. >>>>> >>>>> >>>>>> It would probably be >>>>>> good to be able to pass through the 'archive' argument to useMart as >>>>>> well. >>>>>> >>>>>> Many thanks, >>>>>> >>>>>> Tim Rayner >>>>>> >>>>>> -- >>>>>> Bioinformatician >>>>>> Smith Lab, CIMR >>>>>> University of Cambridge >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: ?(206) 667-5791 > Fax: ? ?(206) 667-1319
ADD REPLY
0
Entering edit mode
FWIW: I too just needed to reinstall GenomicFeatures using source to squash the error: > Make the TranscriptDb object ... Error in callSuper(...) : could not > find function "initRefFields" ~Malcolm > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- > bounces at r-project.org] On Behalf Of Tim Rayner > Sent: Monday, January 09, 2012 7:59 AM > To: Hervé Pagès > Cc: bioconductor at r-project.org > Subject: Re: [BioC] GenomicFeatures makeTranscriptDbFromBiomart failure > > Hi Herv?, > > Thanks very much for fixing this. I can confirm that GenomicFeatures > 1.6.5 works on our Linux server. Interestingly, the warnings about > duplicated levels have also now disappeared in that case. > > I initially ran into a new problem with GenomicFeatures 1.6.6 (see the > bug report and session info below); however, when I reinstalled > GenomicFeatures using type='source' the error went away. > > Cheers, > > Tim > > ## Successfully running on Linux: > > sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.6.5 AnnotationDbi_1.16.10 Biobase_2.14.0 > [4] GenomicRanges_1.6.4 IRanges_1.12.5 > > loaded via a namespace (and not attached): > [1] biomaRt_2.10.0 Biostrings_2.22.0 BSgenome_1.22.0 DBI_0.2-5 > [5] RCurl_1.8-0 RSQLite_0.11.1 rtracklayer_1.14.4 tools_2.14.1 > [9] XML_3.6-2 zlibbioc_1.0.0 > > > ## Strange problem with the pre-built package on Mac OS X? > (GenomicFeatures 1.6.6) > > library(GenomicFeatures) > Loading required package: IRanges > > Attaching package: 'IRanges' > > The following object(s) are masked from 'package:base': > > cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int, > pmin, pmin.int, rbind, rep.int, setdiff, table, union > > Loading required package: GenomicRanges > Loading required package: AnnotationDbi > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation("pkgname")'. > > > Attaching package: 'Biobase' > > The following object(s) are masked from 'package:IRanges': > > updateObject > > Warning message: > package 'GenomicFeatures' was built under R version 2.14.1 > > makeTranscriptDbFromBiomart( > biomart= circ_seqs= dataset= transcript_ids= > > txdb <- makeTranscriptDbFromBiomart(biomart='ensembl', > dataset='hsapiens_gene_ensembl') > Download and preprocess the 'transcripts' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... OK > Download and preprocess the 'splicings' data frame ... OK > Download and preprocess the 'genes' data frame ... OK > Prepare the 'metadata' data frame ... OK > Make the TranscriptDb object ... Error in callSuper(...) : could not > find function "initRefFields" > > sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.6.6 AnnotationDbi_1.16.10 Biobase_2.14.0 > [4] GenomicRanges_1.6.4 IRanges_1.12.5 > > loaded via a namespace (and not attached): > [1] biomaRt_2.10.0 Biostrings_2.22.0 BSgenome_1.22.0 DBI_0.2-5 > [5] RCurl_1.8-0 RSQLite_0.11.1 rtracklayer_1.14.4 tools_2.14.0 > [9] XML_3.6-2 zlibbioc_1.0.0 > > 2012/1/5 Hervé Pagès <hpages at="" fhcrc.org="">: > > Hi Tim, > > > > > > On 11/09/2011 10:27 AM, Hervé Pagès wrote: > >> > >> Hi, > >> > >> On 11-11-09 03:33 AM, Tim Rayner wrote: > >>> > >>> Hi Marc, > >>> > >>> Thanks very much for looking into this, and also to Michael for > >>> providing the patch. I've upgraded my GRanges package and the code > now > >>> runs with a couple of warnings: > >>> > >>>> txdb.Hs2<- makeTranscriptDbFromBiomart(biomart='ensembl', > >>>> dataset='hsapiens_gene_ensembl') > >>> > >>> Download and preprocess the 'transcripts' data frame ... OK > >>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> > >>> skipped) > >>> Download and preprocess the 'splicings' data frame ... OK > >>> Download and preprocess the 'genes' data frame ... OK > >>> Prepare the 'metadata' data frame ... OK > >>> Make the TranscriptDb object ... OK > >>> Warning messages: > >>> 1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) > >>> else paste(labels, : > >>> duplicated levels will not be allowed in factors anymore > >>> 2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) > >>> else paste(labels, : > >>> duplicated levels will not be allowed in factors anymore > >>> 3: In .normargChrominfo(chrominfo, transcripts$tx_chrom, > >>> splicings$exon_chrom) : > >>> chromosome lengths and circularity flags are not available for this > >>> TranscriptDb object > >> > >> > >> The 2 first warnings + the fact that downloading the chrominfo failed > >> is not looking good. Didn't use to be like that. We'll investigate on > >> our side and report later. > > > > > > The problem that was preventing makeTranscriptDbFromBiomart() to > > fetch the 'chrominfo' data frame (i.e. chromosome lengths) from > > Ensembl has been fixed. Make sure you update to the latest version > > of GenomicFeatures (v 1.6.5 in BioC release, v 1.7.8 in BioC > > devel). Available via biocLite(). > > > > The warnings about duplicated levels still need to be investigated. > > > > Cheers, > > > > H. > > > >> > >> Cheers, > >> H. > >> > >>> > >>> So I think the problem is basically fixed. I wonder if perhaps the > >>> issue was caused by truncated data transfers; I observed several > >>> similar failures earlier yesterday afternoon, but in each case the > >>> problem seemed to occur at a different point in the process. > >>> > >>> Thanks again, > >>> > >>> Tim > >>> > >>> On 8 November 2011 20:16, Marc Carlson<mcarlson at="" fhcrc.org=""> wrote: > >>>> > >>>> Hi Tim, > >>>> > >>>> There was a small bug last week for this method caused by a decision at > >>>> ensembl to start supporting psuedoautosomal regions, but it was fixed > >>>> last > >>>> week and should be fixed with the version of GenomicFeatures > reported > >>>> here. > >>>> I just ran your code locally 4 minutes ago and it still works here. The > >>>> only difference I can see is that my GRanges package is one version > >>>> higher > >>>> than yours (GenomicRanges_1.6.2). Please update that package and > then > >>>> run > >>>> it again and see if you have better luck with ensembl. > >>>> > >>>> The patch that Michael mentioned actually arrived at the exact > moment > >>>> that I > >>>> was testing the bug fix above which means that it has a some conflicts I > >>>> will have to resolve, but it should be added to devel very soon. > >>>> > >>>> > >>>> Marc > >>>> > >>>> > >>>> > >>>> On 11/08/2011 03:55 AM, Michael Lawrence wrote: > >>>>> > >>>>> > >>>>> On Tue, Nov 8, 2011 at 3:19 AM, Tim Rayner<tfrayner at="" gmail.com=""> > wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I'm trying to make a TranscriptDb from the Ensembl human Biomart, > but > >>>>>> I've run into a problem. As shown below, the equivalent operation > for > >>>>>> the mouse Biomart works fine: > >>>>>> > >>>>>>> # Mouse TranscriptDb created without a hitch: > >>>>>>> txdb.Mm<- makeTranscriptDbFromBiomart(biomart='ensembl', > >>>>>> > >>>>>> > >>>>>> dataset='mmusculus_gene_ensembl') > >>>>>> Download and preprocess the 'transcripts' data frame ... OK > >>>>>> Download and preprocess the 'chrominfo' data frame ... OK > >>>>>> Download and preprocess the 'splicings' data frame ... OK > >>>>>> Download and preprocess the 'genes' data frame ... OK > >>>>>> Prepare the 'metadata' data frame ... OK > >>>>>> Make the TranscriptDb object ... OK > >>>>>> > >>>>>>> # Here's the problem: > >>>>>>> txdb.Hs<- makeTranscriptDbFromBiomart(biomart='ensembl', > >>>>>> > >>>>>> > >>>>>> dataset='hsapiens_gene_ensembl') > >>>>>> Download and preprocess the 'transcripts' data frame ... OK > >>>>>> Download and preprocess the 'chrominfo' data frame ... FAILED! (=> > >>>>>> skipped) > >>>>>> Download and preprocess the 'splicings' data frame ... Error in > >>>>>> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > >>>>>> line 800380 did not have 11 elements > >>>>>> > >>>>>>> sessionInfo() > >>>>>> > >>>>>> > >>>>>> R version 2.14.0 (2011-10-31) > >>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > >>>>>> > >>>>>> locale: > >>>>>> [1] C > >>>>>> > >>>>>> attached base packages: > >>>>>> [1] stats graphics grDevices utils datasets methods base > >>>>>> > >>>>>> other attached packages: > >>>>>> [1] GenomicFeatures_1.6.1 AnnotationDbi_1.16.0 Biobase_2.14.0 > >>>>>> [4] GenomicRanges_1.6.1 IRanges_1.12.1 > >>>>>> > >>>>>> loaded via a namespace (and not attached): > >>>>>> [1] BSgenome_1.22.0 Biostrings_2.22.0 DBI_0.2-5 > >>>>>> RCurl_1.6-10 > >>>>>> [5] RSQLite_0.10.0 XML_3.4-3 biomaRt_2.10.0 > >>>>>> rtracklayer_1.14.1 > >>>>>> [9] tools_2.14.0 zlibbioc_1.0.0 > >>>>>> > >>>>>> I don't know if this is an issue with the Biomart instance or the > >>>>>> GenomicFeatures package. I was wondering if anyone had any > suggestions > >>>>>> as to how I might work around this? > >>>>>> > >>>>>> On a related note, would it be possible to add the ability to point > >>>>>> makeTranscriptDbFromBiomart() at alternate Biomart hosts (as one > >>>>>> would, for example, by calling > >>>>>> biomaRt::useMart(host='www.ensembl.org', ...))? > >>>>> > >>>>> > >>>>> We've submitted a patch that does just this, as well as supporting an > >>>>> attribute prefix string for selecting alternative gene models. > >>>>> > >>>>> > >>>>>> It would probably be > >>>>>> good to be able to pass through the 'archive' argument to useMart > as > >>>>>> well. > >>>>>> > >>>>>> Many thanks, > >>>>>> > >>>>>> Tim Rayner > >>>>>> > >>>>>> -- > >>>>>> Bioinformatician > >>>>>> Smith Lab, CIMR > >>>>>> University of Cambridge > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioconductor mailing list > >>>>>> Bioconductor at r-project.org > >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>>>> Search the archives: > >>>>>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> _______________________________________________ > >>>>> Bioconductor mailing list > >>>>> Bioconductor at r-project.org > >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>>> Search the archives: > >>>>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor at r-project.org > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > >> > > > > > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpages at fhcrc.org > > Phone: ?(206) 667-5791 > > Fax: ? ?(206) 667-1319 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 829 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6