help with biomaRt bioconductor - Filter upstream

help with biomaRt bioconductor - Filter upstream_flank NOT FOUND problem

0

Entering edit mode

Stefan Kroeger ▴ 10

@stefan-kroeger-5517

Last seen 11.4 years ago

2012/8/9 Steffen Durinck <durinck.steffen at="" gene.com="">: > Thanks for the code example Wolfgang, > > The stochasticity suggests the problem is on the BioMart server side, I'll > contact them to see if they can look into it. Could anybody fix the problem or got responds from the helpdesk? Best Stefan > > On Tue, Aug 7, 2012 at 2:08 AM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > >> Dear Steffen / List, >> below is a more compact code example that reproduces Tom's problem. I am >> rather confused by the fact that the problem seemed to occur stochastically! >> >> ------------------- >> library(biomaRt) >> options(error=recover) >> ensembl = useMart("ensembl") >> >> human = useDataset("hsapiens_gene_**ensembl",mart=ensembl) >> attr = c('ensembl_gene_id','ensembl_**transcript_id', >> >> 'external_gene_id','**chromosome_name','strand','** >> transcript_start') >> bmres = getBM(attr, 'biotype', values = 'protein_coding', human) >> >> for(id in bmres[,"ensembl_transcript_id"**]){ >> sequence = getSequence(id=id, type='ensembl_transcript_id', >> >> seqType='transcript_flank',**upstream = 3000, >> mart = human) >> sl = with(sequence, nchar(as.character(transcript_**flank))) >> cat(id, sl, "\n") >> } >> ------------------- >> >> One running this once, I got >> ...(lots of lines) >> ENST00000520540 3000 >> ENST00000519310 3000 >> ENST00000442920 3000 >> >> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : >> Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT >> FOUND >> >> The next time, the same error already occurred in the very first iteration >> of the for-loop, for id="ENST00000539570". The next time, in the third >> iteration for id="ENST00000510508". >> >> Any idea what is going on here? >> >> >> Further comments: >> - for *Steffen*: The documentation and the code of 'getSequence' do not >> seem to match each other (e.g. the description of argument 'seqType'), >> MySQL mode is mentioned but afaIu is not supported any more -> perhaps some >> maintenance would be nice to users. >> - for *Tom*: Making these queries (such as getSequence) within a for-loop >> is bad practice, since it needlessly clogs the network and the BioMart >> webservers. Please use R's vector-capabilities, e.g. >> >> ------------------------ >> sequence = getSequence(id=bmres[,"**ensembl_transcript_id"], >> type='ensembl_transcript_id', seqType='transcript_flank', >> >> upstream = 3000, mart = human) >> sl = with(sequence, nchar(as.character(transcript_**flank))) >> ------------------------- >> >> Best wishes >> Wolfgang >> >> >> Tom Hait scripsit 08/06/2012 12:37 PM: >> >> Hello, >>> >>> I'm a student in bioinformatics in Tel Aviv University. >>> I'm working with you biomaRt API in order to generate automatically FASTA >>> sequences downloading. >>> I experienced some problem, here is my code: >>> >>> #open biomart libaray >>> library(biomaRt) >>> #open data set of human >>> human = useDataset("hsapiens_gene_**ensembl",mart=ensembl) >>> #select the attributes that we want from the data set >>> attr<-c('ensembl_gene_id','**ensembl_transcript_id', >>> 'external_gene_id','**chromosome_name','strand','**transcript_start') >>> #downloading the map between transcript id and transcript name >>> tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) >>> #save in a TSV format (the file is saved in txt) >>> write.table(tmpgene,"Z:/**tomhait/organisms/human/** >>> transcript_names.txt", >>> row.names=FALSE, quote=FALSE) >>> #collect all sequences with upstream flank 3000 bases based on the first >>> column (ensembl_id) of tmpgene >>> i<-1 >>> for(id1 in tmpgene[,2]){ >>> #retrieve sequence >>> sequence<-getSequence(id=id1, >>> type='ensembl_transcript_id',**seqType='transcript_flank',**upstream = >>> 3000, >>> mart = human) >>> #check if sequence was retrieved >>> sLengths <- with(sequence, nchar(as.character(transcript_**flank))) >>> >>> #writing to a new file in "Z:/tomhait/organisms/human/** >>> mart_export_new.txt" >>> #you can change it to "mart_export_new.txt" and it will create a new file >>> in R directory >>> if(length(sLengths) > 0){ >>> x<-sequence[,1] >>> y<-y<-strsplit(gsub("([[:**alnum:]]{60})", "\\1 ", x), " ")[[1]] >>> title<-paste(paste(">",**tmpgene[i,1],sep=""),tmpgene[** >>> i,2],tmpgene[i,3],tmpgene[i,4]**,tmpgene[i,5],tmpgene[i,6], >>> sep="|") >>> write(title,file="Z:/tomhait/**organisms/human/mart_export_** >>> new.txt",ncolumns >>> = 1, append=TRUE,sep="") >>> write(y,file="Z:/tomhait/**organisms/human/mart_export_**new.tx t",ncolumns >>> = >>> 1, append=TRUE,sep="\n") >>> write("\n",file="Z:/tomhait/**organisms/human/mart_export_** >>> new.txt",ncolumns >>> = 1, append=TRUE,sep="\n") >>> } >>> i<-i+1 >>> } >>> >>> I got the message: >>> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : >>> Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank >>> NOT >>> FOUND >>> >>> Could you please help me to solve this problem? >>> >>> Best Regards, >>> >>> Tom Hait. >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >>> >> >> -- >> Best wishes >> Wolfgang >> >> Wolfgang Huber >> EMBL >> http://www.embl.de/research/**units/genome_biology/huber<http: www="" .embl.de="" research="" units="" genome_biology="" huber=""> >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Network biomaRt Network biomaRt • 1.2k views

ADD COMMENT • link 13.3 years ago Stefan Kroeger ▴ 10

Login before adding your answer.