help with biomaRt bioconductor - Filter upstream_flank NOT FOUND problem
1
0
Entering edit mode
Tom Hait ▴ 10
@tom-hait-5441
Last seen 9.6 years ago
Hello, I'm a student in bioinformatics in Tel Aviv University. I'm working with you biomaRt API in order to generate automatically FASTA sequences downloading. I experienced some problem, here is my code: #open biomart libaray library(biomaRt) #open data set of human human = useDataset("hsapiens_gene_ensembl",mart=ensembl) #select the attributes that we want from the data set attr<-c('ensembl_gene_id','ensembl_transcript_id', 'external_gene_id','chromosome_name','strand','transcript_start') #downloading the map between transcript id and transcript name tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) #save in a TSV format (the file is saved in txt) write.table(tmpgene,"Z:/tomhait/organisms/human/transcript_names.txt", row.names=FALSE, quote=FALSE) #collect all sequences with upstream flank 3000 bases based on the first column (ensembl_id) of tmpgene i<-1 for(id1 in tmpgene[,2]){ #retrieve sequence sequence<-getSequence(id=id1, type='ensembl_transcript_id',seqType='transcript_flank',upstream = 3000, mart = human) #check if sequence was retrieved sLengths <- with(sequence, nchar(as.character(transcript_flank))) #writing to a new file in "Z:/tomhait/organisms/human/mart_export_new.txt" #you can change it to "mart_export_new.txt" and it will create a new file in R directory if(length(sLengths) > 0){ x<-sequence[,1] y<-y<-strsplit(gsub("([[:alnum:]]{60})", "\\1 ", x), " ")[[1]] title<-paste(paste(">",tmpgene[i,1],sep=""),tmpgene[i,2],tmpgene[i,3 ],tmpgene[i,4],tmpgene[i,5],tmpgene[i,6], sep="|") write(title,file="Z:/tomhait/organisms/human/mart_export_new.txt",nc olumns = 1, append=TRUE,sep="") write(y,file="Z:/tomhait/organisms/human/mart_export_new.txt",ncolumns = 1, append=TRUE,sep="\n") write("\n",file="Z:/tomhait/organisms/human/mart_export_new.txt",nco lumns = 1, append=TRUE,sep="\n") } i<-i+1 } I got the message: Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT FOUND Could you please help me to solve this problem? Best Regards, Tom Hait. [[alternative HTML version deleted]]
biomaRt biomaRt • 2.0k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Dear Steffen / List, below is a more compact code example that reproduces Tom's problem. I am rather confused by the fact that the problem seemed to occur stochastically! ------------------- library(biomaRt) options(error=recover) ensembl = useMart("ensembl") human = useDataset("hsapiens_gene_ensembl",mart=ensembl) attr = c('ensembl_gene_id','ensembl_transcript_id', 'external_gene_id','chromosome_name','strand','transcript_start') bmres = getBM(attr, 'biotype', values = 'protein_coding', human) for(id in bmres[,"ensembl_transcript_id"]){ sequence = getSequence(id=id, type='ensembl_transcript_id', seqType='transcript_flank',upstream = 3000, mart = human) sl = with(sequence, nchar(as.character(transcript_flank))) cat(id, sl, "\n") } ------------------- One running this once, I got ...(lots of lines) ENST00000520540 3000 ENST00000519310 3000 ENST00000442920 3000 Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT FOUND The next time, the same error already occurred in the very first iteration of the for-loop, for id="ENST00000539570". The next time, in the third iteration for id="ENST00000510508". Any idea what is going on here? Further comments: - for *Steffen*: The documentation and the code of 'getSequence' do not seem to match each other (e.g. the description of argument 'seqType'), MySQL mode is mentioned but afaIu is not supported any more -> perhaps some maintenance would be nice to users. - for *Tom*: Making these queries (such as getSequence) within a for-loop is bad practice, since it needlessly clogs the network and the BioMart webservers. Please use R's vector-capabilities, e.g. ------------------------ sequence = getSequence(id=bmres[,"ensembl_transcript_id"], type='ensembl_transcript_id', seqType='transcript_flank', upstream = 3000, mart = human) sl = with(sequence, nchar(as.character(transcript_flank))) ------------------------- Best wishes Wolfgang Tom Hait scripsit 08/06/2012 12:37 PM: > Hello, > > I'm a student in bioinformatics in Tel Aviv University. > I'm working with you biomaRt API in order to generate automatically FASTA > sequences downloading. > I experienced some problem, here is my code: > > #open biomart libaray > library(biomaRt) > #open data set of human > human = useDataset("hsapiens_gene_ensembl",mart=ensembl) > #select the attributes that we want from the data set > attr<-c('ensembl_gene_id','ensembl_transcript_id', > 'external_gene_id','chromosome_name','strand','transcript_start') > #downloading the map between transcript id and transcript name > tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) > #save in a TSV format (the file is saved in txt) > write.table(tmpgene,"Z:/tomhait/organisms/human/transcript_names.txt", > row.names=FALSE, quote=FALSE) > #collect all sequences with upstream flank 3000 bases based on the first > column (ensembl_id) of tmpgene > i<-1 > for(id1 in tmpgene[,2]){ > #retrieve sequence > sequence<-getSequence(id=id1, > type='ensembl_transcript_id',seqType='transcript_flank',upstream = 3000, > mart = human) > #check if sequence was retrieved > sLengths <- with(sequence, nchar(as.character(transcript_flank))) > > #writing to a new file in "Z:/tomhait/organisms/human/mart_export_new.txt" > #you can change it to "mart_export_new.txt" and it will create a new file > in R directory > if(length(sLengths) > 0){ > x<-sequence[,1] > y<-y<-strsplit(gsub("([[:alnum:]]{60})", "\\1 ", x), " ")[[1]] > title<-paste(paste(">",tmpgene[i,1],sep=""),tmpgene[i,2],tmpgene[ i,3],tmpgene[i,4],tmpgene[i,5],tmpgene[i,6], > sep="|") > write(title,file="Z:/tomhait/organisms/human/mart_export_new.txt" ,ncolumns > = 1, append=TRUE,sep="") > write(y,file="Z:/tomhait/organisms/human/mart_export_new.txt",ncolumns = > 1, append=TRUE,sep="\n") > write("\n",file="Z:/tomhait/organisms/human/mart_export_new.txt", ncolumns > = 1, append=TRUE,sep="\n") > } > i<-i+1 > } > > I got the message: > Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : > Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT > FOUND > > Could you please help me to solve this problem? > > Best Regards, > > Tom Hait. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD COMMENT
0
Entering edit mode
Oops, I forgot sessionInfo() for my previous post, here it is: R Under development (unstable) (2012-08-07 r60182) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=la_AU.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.13.2 fortunes_1.5-0 loaded via a namespace (and not attached): [1] RCurl_1.91-1 XML_3.9-4 Wolfgang Huber scripsit 08/07/2012 11:08 AM: > Dear Steffen / List, > below is a more compact code example that reproduces Tom's problem. I am > rather confused by the fact that the problem seemed to occur > stochastically! > > ------------------- > library(biomaRt) > options(error=recover) > ensembl = useMart("ensembl") > human = useDataset("hsapiens_gene_ensembl",mart=ensembl) > attr = c('ensembl_gene_id','ensembl_transcript_id', > 'external_gene_id','chromosome_name','strand','transcript_start') > bmres = getBM(attr, 'biotype', values = 'protein_coding', human) > > for(id in bmres[,"ensembl_transcript_id"]){ > sequence = getSequence(id=id, type='ensembl_transcript_id', > seqType='transcript_flank',upstream = 3000, > mart = human) > sl = with(sequence, nchar(as.character(transcript_flank))) > cat(id, sl, "\n") > } > ------------------- > > One running this once, I got > ...(lots of lines) > ENST00000520540 3000 > ENST00000519310 3000 > ENST00000442920 3000 > Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : > Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank > NOT FOUND > > The next time, the same error already occurred in the very first > iteration of the for-loop, for id="ENST00000539570". The next time, in > the third iteration for id="ENST00000510508". > > Any idea what is going on here? > > > Further comments: > - for *Steffen*: The documentation and the code of 'getSequence' do not > seem to match each other (e.g. the description of argument 'seqType'), > MySQL mode is mentioned but afaIu is not supported any more -> perhaps > some maintenance would be nice to users. > - for *Tom*: Making these queries (such as getSequence) within a > for-loop is bad practice, since it needlessly clogs the network and the > BioMart webservers. Please use R's vector-capabilities, e.g. > > ------------------------ > sequence = getSequence(id=bmres[,"ensembl_transcript_id"], > type='ensembl_transcript_id', seqType='transcript_flank', > upstream = 3000, mart = human) > sl = with(sequence, nchar(as.character(transcript_flank))) > ------------------------- > > Best wishes > Wolfgang > > > Tom Hait scripsit 08/06/2012 12:37 PM: >> Hello, >> >> I'm a student in bioinformatics in Tel Aviv University. >> I'm working with you biomaRt API in order to generate automatically FASTA >> sequences downloading. >> I experienced some problem, here is my code: >> >> #open biomart libaray >> library(biomaRt) >> #open data set of human >> human = useDataset("hsapiens_gene_ensembl",mart=ensembl) >> #select the attributes that we want from the data set >> attr<-c('ensembl_gene_id','ensembl_transcript_id', >> 'external_gene_id','chromosome_name','strand','transcript_start') >> #downloading the map between transcript id and transcript name >> tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) >> #save in a TSV format (the file is saved in txt) >> write.table(tmpgene,"Z:/tomhait/organisms/human/transcript_names.txt", >> row.names=FALSE, quote=FALSE) >> #collect all sequences with upstream flank 3000 bases based on the first >> column (ensembl_id) of tmpgene >> i<-1 >> for(id1 in tmpgene[,2]){ >> #retrieve sequence >> sequence<-getSequence(id=id1, >> type='ensembl_transcript_id',seqType='transcript_flank',upstream = 3000, >> mart = human) >> #check if sequence was retrieved >> sLengths <- with(sequence, nchar(as.character(transcript_flank))) >> >> #writing to a new file in >> "Z:/tomhait/organisms/human/mart_export_new.txt" >> #you can change it to "mart_export_new.txt" and it will create a new file >> in R directory >> if(length(sLengths) > 0){ >> x<-sequence[,1] >> y<-y<-strsplit(gsub("([[:alnum:]]{60})", "\\1 ", x), " ")[[1]] >> >> title<-paste(paste(">",tmpgene[i,1],sep=""),tmpgene[i,2],tmpgene[i, 3],tmpgene[i,4],tmpgene[i,5],tmpgene[i,6], >> >> sep="|") >> >> write(title,file="Z:/tomhait/organisms/human/mart_export_new.txt",n columns >> >> = 1, append=TRUE,sep="") >> >> write(y,file="Z:/tomhait/organisms/human/mart_export_new.txt",ncolumns = >> 1, append=TRUE,sep="\n") >> >> write("\n",file="Z:/tomhait/organisms/human/mart_export_new.txt",nc olumns >> = 1, append=TRUE,sep="\n") >> } >> i<-i+1 >> } >> >> I got the message: >> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : >> Query ERROR: caught BioMart::Exception::Usage: Filter >> upstream_flank NOT >> FOUND >> >> Could you please help me to solve this problem? >> >> Best Regards, >> >> Tom Hait. >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD REPLY
0
Entering edit mode
Thanks for the code example Wolfgang, The stochasticity suggests the problem is on the BioMart server side, I'll contact them to see if they can look into it. Regards, Steffen On Tue, Aug 7, 2012 at 2:08 AM, Wolfgang Huber <whuber@embl.de> wrote: > Dear Steffen / List, > below is a more compact code example that reproduces Tom's problem. I am > rather confused by the fact that the problem seemed to occur stochastically! > > ------------------- > library(biomaRt) > options(error=recover) > ensembl = useMart("ensembl") > > human = useDataset("hsapiens_gene_**ensembl",mart=ensembl) > attr = c('ensembl_gene_id','ensembl_**transcript_id', > > 'external_gene_id','**chromosome_name','strand','** > transcript_start') > bmres = getBM(attr, 'biotype', values = 'protein_coding', human) > > for(id in bmres[,"ensembl_transcript_id"**]){ > sequence = getSequence(id=id, type='ensembl_transcript_id', > > seqType='transcript_flank',**upstream = 3000, > mart = human) > sl = with(sequence, nchar(as.character(transcript_**flank))) > cat(id, sl, "\n") > } > ------------------- > > One running this once, I got > ...(lots of lines) > ENST00000520540 3000 > ENST00000519310 3000 > ENST00000442920 3000 > > Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : > Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT > FOUND > > The next time, the same error already occurred in the very first iteration > of the for-loop, for id="ENST00000539570". The next time, in the third > iteration for id="ENST00000510508". > > Any idea what is going on here? > > > Further comments: > - for *Steffen*: The documentation and the code of 'getSequence' do not > seem to match each other (e.g. the description of argument 'seqType'), > MySQL mode is mentioned but afaIu is not supported any more -> perhaps some > maintenance would be nice to users. > - for *Tom*: Making these queries (such as getSequence) within a for-loop > is bad practice, since it needlessly clogs the network and the BioMart > webservers. Please use R's vector-capabilities, e.g. > > ------------------------ > sequence = getSequence(id=bmres[,"**ensembl_transcript_id"], > type='ensembl_transcript_id', seqType='transcript_flank', > > upstream = 3000, mart = human) > sl = with(sequence, nchar(as.character(transcript_**flank))) > ------------------------- > > Best wishes > Wolfgang > > > Tom Hait scripsit 08/06/2012 12:37 PM: > > Hello, >> >> I'm a student in bioinformatics in Tel Aviv University. >> I'm working with you biomaRt API in order to generate automatically FASTA >> sequences downloading. >> I experienced some problem, here is my code: >> >> #open biomart libaray >> library(biomaRt) >> #open data set of human >> human = useDataset("hsapiens_gene_**ensembl",mart=ensembl) >> #select the attributes that we want from the data set >> attr<-c('ensembl_gene_id','**ensembl_transcript_id', >> 'external_gene_id','**chromosome_name','strand','**transcript_start') >> #downloading the map between transcript id and transcript name >> tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) >> #save in a TSV format (the file is saved in txt) >> write.table(tmpgene,"Z:/**tomhait/organisms/human/** >> transcript_names.txt", >> row.names=FALSE, quote=FALSE) >> #collect all sequences with upstream flank 3000 bases based on the first >> column (ensembl_id) of tmpgene >> i<-1 >> for(id1 in tmpgene[,2]){ >> #retrieve sequence >> sequence<-getSequence(id=id1, >> type='ensembl_transcript_id',**seqType='transcript_flank',**upstream = >> 3000, >> mart = human) >> #check if sequence was retrieved >> sLengths <- with(sequence, nchar(as.character(transcript_**flank))) >> >> #writing to a new file in "Z:/tomhait/organisms/human/** >> mart_export_new.txt" >> #you can change it to "mart_export_new.txt" and it will create a new file >> in R directory >> if(length(sLengths) > 0){ >> x<-sequence[,1] >> y<-y<-strsplit(gsub("([[:**alnum:]]{60})", "\\1 ", x), " ")[[1]] >> title<-paste(paste(">",**tmpgene[i,1],sep=""),tmpgene[** >> i,2],tmpgene[i,3],tmpgene[i,4]**,tmpgene[i,5],tmpgene[i,6], >> sep="|") >> write(title,file="Z:/tomhait/**organisms/human/mart_export_** >> new.txt",ncolumns >> = 1, append=TRUE,sep="") >> write(y,file="Z:/tomhait/**organisms/human/mart_export_**new.txt ",ncolumns >> = >> 1, append=TRUE,sep="\n") >> write("\n",file="Z:/tomhait/**organisms/human/mart_export_** >> new.txt",ncolumns >> = 1, append=TRUE,sep="\n") >> } >> i<-i+1 >> } >> >> I got the message: >> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : >> Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank >> NOT >> FOUND >> >> Could you please help me to solve this problem? >> >> Best Regards, >> >> Tom Hait. >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > > -- > Best wishes > Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/**units/genome_biology/huber<http: www.="" embl.de="" research="" units="" genome_biology="" huber=""> > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6