help with biomaRt bioconductor - Filter upstream_flank NOT FOUND problem
0
0
Entering edit mode
@stefan-kroeger-5517
Last seen 11.2 years ago
2012/8/9 Steffen Durinck <durinck.steffen at="" gene.com="">: > Thanks for the code example Wolfgang, > > The stochasticity suggests the problem is on the BioMart server side, I'll > contact them to see if they can look into it. Could anybody fix the problem or got responds from the helpdesk? Best Stefan > > On Tue, Aug 7, 2012 at 2:08 AM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > >> Dear Steffen / List, >> below is a more compact code example that reproduces Tom's problem. I am >> rather confused by the fact that the problem seemed to occur stochastically! >> >> ------------------- >> library(biomaRt) >> options(error=recover) >> ensembl = useMart("ensembl") >> >> human = useDataset("hsapiens_gene_**ensembl",mart=ensembl) >> attr = c('ensembl_gene_id','ensembl_**transcript_id', >> >> 'external_gene_id','**chromosome_name','strand','** >> transcript_start') >> bmres = getBM(attr, 'biotype', values = 'protein_coding', human) >> >> for(id in bmres[,"ensembl_transcript_id"**]){ >> sequence = getSequence(id=id, type='ensembl_transcript_id', >> >> seqType='transcript_flank',**upstream = 3000, >> mart = human) >> sl = with(sequence, nchar(as.character(transcript_**flank))) >> cat(id, sl, "\n") >> } >> ------------------- >> >> One running this once, I got >> ...(lots of lines) >> ENST00000520540 3000 >> ENST00000519310 3000 >> ENST00000442920 3000 >> >> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : >> Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT >> FOUND >> >> The next time, the same error already occurred in the very first iteration >> of the for-loop, for id="ENST00000539570". The next time, in the third >> iteration for id="ENST00000510508". >> >> Any idea what is going on here? >> >> >> Further comments: >> - for *Steffen*: The documentation and the code of 'getSequence' do not >> seem to match each other (e.g. the description of argument 'seqType'), >> MySQL mode is mentioned but afaIu is not supported any more -> perhaps some >> maintenance would be nice to users. >> - for *Tom*: Making these queries (such as getSequence) within a for-loop >> is bad practice, since it needlessly clogs the network and the BioMart >> webservers. Please use R's vector-capabilities, e.g. >> >> ------------------------ >> sequence = getSequence(id=bmres[,"**ensembl_transcript_id"], >> type='ensembl_transcript_id', seqType='transcript_flank', >> >> upstream = 3000, mart = human) >> sl = with(sequence, nchar(as.character(transcript_**flank))) >> ------------------------- >> >> Best wishes >> Wolfgang >> >> >> Tom Hait scripsit 08/06/2012 12:37 PM: >> >> Hello, >>> >>> I'm a student in bioinformatics in Tel Aviv University. >>> I'm working with you biomaRt API in order to generate automatically FASTA >>> sequences downloading. >>> I experienced some problem, here is my code: >>> >>> #open biomart libaray >>> library(biomaRt) >>> #open data set of human >>> human = useDataset("hsapiens_gene_**ensembl",mart=ensembl) >>> #select the attributes that we want from the data set >>> attr<-c('ensembl_gene_id','**ensembl_transcript_id', >>> 'external_gene_id','**chromosome_name','strand','**transcript_start') >>> #downloading the map between transcript id and transcript name >>> tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) >>> #save in a TSV format (the file is saved in txt) >>> write.table(tmpgene,"Z:/**tomhait/organisms/human/** >>> transcript_names.txt", >>> row.names=FALSE, quote=FALSE) >>> #collect all sequences with upstream flank 3000 bases based on the first >>> column (ensembl_id) of tmpgene >>> i<-1 >>> for(id1 in tmpgene[,2]){ >>> #retrieve sequence >>> sequence<-getSequence(id=id1, >>> type='ensembl_transcript_id',**seqType='transcript_flank',**upstream = >>> 3000, >>> mart = human) >>> #check if sequence was retrieved >>> sLengths <- with(sequence, nchar(as.character(transcript_**flank))) >>> >>> #writing to a new file in "Z:/tomhait/organisms/human/** >>> mart_export_new.txt" >>> #you can change it to "mart_export_new.txt" and it will create a new file >>> in R directory >>> if(length(sLengths) > 0){ >>> x<-sequence[,1] >>> y<-y<-strsplit(gsub("([[:**alnum:]]{60})", "\\1 ", x), " ")[[1]] >>> title<-paste(paste(">",**tmpgene[i,1],sep=""),tmpgene[** >>> i,2],tmpgene[i,3],tmpgene[i,4]**,tmpgene[i,5],tmpgene[i,6], >>> sep="|") >>> write(title,file="Z:/tomhait/**organisms/human/mart_export_** >>> new.txt",ncolumns >>> = 1, append=TRUE,sep="") >>> write(y,file="Z:/tomhait/**organisms/human/mart_export_**new.tx t",ncolumns >>> = >>> 1, append=TRUE,sep="\n") >>> write("\n",file="Z:/tomhait/**organisms/human/mart_export_** >>> new.txt",ncolumns >>> = 1, append=TRUE,sep="\n") >>> } >>> i<-i+1 >>> } >>> >>> I got the message: >>> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : >>> Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank >>> NOT >>> FOUND >>> >>> Could you please help me to solve this problem? >>> >>> Best Regards, >>> >>> Tom Hait. >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >>> >> >> -- >> Best wishes >> Wolfgang >> >> Wolfgang Huber >> EMBL >> http://www.embl.de/research/**units/genome_biology/huber<http: www="" .embl.de="" research="" units="" genome_biology="" huber=""> >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Network biomaRt Network biomaRt • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 1265 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6