biomaRt 3'UTR coordinates
2
0
Entering edit mode
@iain-gallagher-2532
Last seen 8.7 years ago
United Kingdom
Hello list. I'm using the following script to try and retrieve the 3'UTR start and end coordinates from Ensembl. rm(list=ls()) library(biomaRt) #read in probes called present on affy array (CPH in this script) present <- read.table('cph_present_probes.txt', header=F, sep='\t') present<-as.character(present[,1]) #present is a set of transcript ids #get DB connection to retrieve required info ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl") #get 3'utr coords utr_coords<-getBM(attributes=c('ensembl_gene_id', 'sequence_3utr_start', 'sequence_3utr_end'), filters='ensembl_transcript_id', values=present, mart=ensmart) Running the script gives the following error. V1 1 Query ERROR: caught BioMart::Exception::Usage: Attribute 3utr_start NOT FOUND Error in getBM(attributes = c("ensembl_gene_id", "sequence_3utr_start", : Number of columns in the query result doesn't equal number of attributes in query. This is probably an internal error, please report. Presumably some transcripts have more than 1 3'UTR (hence the number of columns difference described above) Can anyone suggest a solution? Either a way to retrieve the start and end coords of the 3'UTRs or the length of the 3'UTRs (my real objective). I have a separate script which will download the 3'UTR sequences and then count the characters but the datasets are large and that process seems somewhat laborious if the information is directly available. Thanks Iain > sessionInfo() R version 2.8.0 (2008-10-20) x86_64-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_1.16.0 loaded via a namespace (and not attached): [1] RCurl_0.91-0 XML_1.95-3
affy affy • 3.0k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 10 days ago
EMBL European Molecular Biology Laborat…
Dear Iain thank you for providing this feedback! In order to do something about it, can you provide us with a reproducible example? You could do this, for example, by defining the content of your vector "present" in the script, rather than reading a file from your file system that nobody else can see, or by putting it on a webserver and use a file connection to its URL in your call to read.table. Best wishes Wolfgang ---------------------------------------------------- Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber Iain Gallagher ha scritto: > Hello list. > > I'm using the following script to try and retrieve the 3'UTR start and end coordinates from Ensembl. > > rm(list=ls()) > library(biomaRt) > > #read in probes called present on affy array (CPH in this script) > > present <- read.table('cph_present_probes.txt', header=F, sep='\t') > present<-as.character(present[,1]) > > #present is a set of transcript ids > > #get DB connection to retrieve required info > > ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl") > > #get 3'utr coords > > utr_coords<-getBM(attributes=c('ensembl_gene_id', 'sequence_3utr_start', 'sequence_3utr_end'), filters='ensembl_transcript_id', values=present, mart=ensmart) > > Running the script gives the following error. > > V1 > 1 Query ERROR: caught BioMart::Exception::Usage: Attribute 3utr_start NOT FOUND > Error in getBM(attributes = c("ensembl_gene_id", "sequence_3utr_start", : > Number of columns in the query result doesn't equal number of attributes in query. This is probably an internal error, please report. > > Presumably some transcripts have more than 1 3'UTR (hence the number of columns difference described above) > > Can anyone suggest a solution? Either a way to retrieve the start and end coords of the 3'UTRs or the length of the 3'UTRs (my real objective). > > I have a separate script which will download the 3'UTR sequences and then count the characters but the datasets are large and that process seems somewhat laborious if the information is directly available. > > Thanks > > Iain > >> sessionInfo() > R version 2.8.0 (2008-10-20) > x86_64-pc-linux-gnu > > locale: > LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_ GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_1.16.0 > > loaded via a namespace (and not attached): > [1] RCurl_0.91-0 XML_1.95-3 >
ADD COMMENT
0
Entering edit mode
@iain-gallagher-2532
Last seen 8.7 years ago
United Kingdom
Hi Wolfgang. Sorry. I should have enclosed a portion of the file or at least a clearer explanation of what it contained. It is simply a list of ENST ids the first few of which I have detailed below. ENST00000000233 ENST00000000412 ENST00000000442 ENST00000001008 ENST00000002125 ENST00000002165 ENST00000002501 ENST00000002829 ENST00000003100 ENST00000003302 ENST00000003583 ENST00000003607 ENST00000003912 For you information here is the reply I received from the ENSEMBL help desk regading this problem. ''Currently these attributes are not available from BioMart. They have been dropped when we moved to an automated Mart building process a few months ago. However, as many people have asked for these attributes, they have been added again to our v52 release which, if everything goes according to plan, should go live coming week. '' So hopefully at some point next week I'll be able to carry out the query. Thanks Iain --- On Sat, 6/12/08, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: > From: Wolfgang Huber <huber at="" ebi.ac.uk=""> > Subject: Re: [BioC] biomaRt 3'UTR coordinates > To: iaingallagher at btopenworld.com > Cc: Bioconductor at stat.math.ethz.ch > Date: Saturday, 6 December, 2008, 2:11 PM > Dear Iain > > thank you for providing this feedback! In order to do > something about > it, can you provide us with a reproducible example? > > You could do this, for example, by defining the content of > your vector > "present" in the script, rather than reading a > file from your file > system that nobody else can see, or by putting it on a > webserver and use > a file connection to its URL in your call to read.table. > > Best wishes > Wolfgang > > ---------------------------------------------------- > Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber > > Iain Gallagher ha scritto: > > Hello list. > > > > I'm using the following script to try and retrieve > the 3'UTR start and end coordinates from Ensembl. > > > > rm(list=ls()) > > library(biomaRt) > > > > #read in probes called present on affy array (CPH in > this script) > > > > present <- > read.table('cph_present_probes.txt', header=F, > sep='\t') > > present<-as.character(present[,1]) > > > > #present is a set of transcript ids > > > > #get DB connection to retrieve required info > > > > ensmart=useMart("ensembl", > dataset="hsapiens_gene_ensembl") > > > > #get 3'utr coords > > > > > utr_coords<-getBM(attributes=c('ensembl_gene_id', > 'sequence_3utr_start', 'sequence_3utr_end'), > filters='ensembl_transcript_id', values=present, > mart=ensmart) > > > > Running the script gives the following error. > > > > > V1 > > 1 Query ERROR: caught BioMart::Exception::Usage: > Attribute 3utr_start NOT FOUND > > Error in getBM(attributes = > c("ensembl_gene_id", > "sequence_3utr_start", : > > Number of columns in the query result doesn't > equal number of attributes in query. This is probably an > internal error, please report. > > > > Presumably some transcripts have more than 1 3'UTR > (hence the number of columns difference described above) > > > > Can anyone suggest a solution? Either a way to > retrieve the start and end coords of the 3'UTRs or the > length of the 3'UTRs (my real objective). > > > > I have a separate script which will download the > 3'UTR sequences and then count the characters but the > datasets are large and that process seems somewhat laborious > if the information is directly available. > > > > Thanks > > > > Iain > > > >> sessionInfo() > > R version 2.8.0 (2008-10-20) > > x86_64-pc-linux-gnu > > > > locale: > > > LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_ GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDEN TIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets > methods base > > > > other attached packages: > > [1] biomaRt_1.16.0 > > > > loaded via a namespace (and not attached): > > [1] RCurl_0.91-0 XML_1.95-3 > >
ADD COMMENT

Login before adding your answer.

Traffic: 568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6