help with biomaRt bioconductor - Filter upstream

help with biomaRt bioconductor - Filter upstream_flank NOT FOUND problem

0

Entering edit mode

Tom Hait ▴ 10

@tom-hait-5441

Last seen 11.4 years ago

Hello, I'm a student in bioinformatics in Tel Aviv University. I'm working with you biomaRt API in order to generate automatically FASTA sequences downloading. I experienced some problem, here is my code: #open biomart libaray library(biomaRt) #open data set of human human = useDataset("hsapiens_gene_ensembl",mart=ensembl) #select the attributes that we want from the data set attr<-c('ensembl_gene_id','ensembl_transcript_id', 'external_gene_id','chromosome_name','strand','transcript_start') #downloading the map between transcript id and transcript name tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) #save in a TSV format (the file is saved in txt) write.table(tmpgene,"Z:/tomhait/organisms/human/transcript_names.txt", row.names=FALSE, quote=FALSE) #collect all sequences with upstream flank 3000 bases based on the first column (ensembl_id) of tmpgene i<-1 for(id1 in tmpgene[,2]){ #retrieve sequence sequence<-getSequence(id=id1, type='ensembl_transcript_id',seqType='transcript_flank',upstream = 3000, mart = human) #check if sequence was retrieved sLengths <- with(sequence, nchar(as.character(transcript_flank))) #writing to a new file in "Z:/tomhait/organisms/human/mart_export_new.txt" #you can change it to "mart_export_new.txt" and it will create a new file in R directory if(length(sLengths) > 0){ x<-sequence[,1] y<-y<-strsplit(gsub("([[:alnum:]]{60})", "\\1 ", x), " ")[[1]] title<-paste(paste(">",tmpgene[i,1],sep=""),tmpgene[i,2],tmpgene[i,3 ],tmpgene[i,4],tmpgene[i,5],tmpgene[i,6], sep="|") write(title,file="Z:/tomhait/organisms/human/mart_export_new.txt",nc olumns = 1, append=TRUE,sep="") write(y,file="Z:/tomhait/organisms/human/mart_export_new.txt",ncolumns = 1, append=TRUE,sep="\n") write("\n",file="Z:/tomhait/organisms/human/mart_export_new.txt",nco lumns = 1, append=TRUE,sep="\n") } i<-i+1 } I got the message: Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT FOUND Could you please help me to solve this problem? Best Regards, Tom Hait. [[alternative HTML version deleted]]

biomaRt biomaRt • 2.4k views

ADD COMMENT • link updated 13.5 years ago by Wolfgang Huber ★ 13k • written 13.5 years ago by Tom Hait ▴ 10

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 4 months ago

EMBL European Molecular Biology Laborat…

Dear Steffen / List, below is a more compact code example that reproduces Tom's problem. I am rather confused by the fact that the problem seemed to occur stochastically! ------------------- library(biomaRt) options(error=recover) ensembl = useMart("ensembl") human = useDataset("hsapiens_gene_ensembl",mart=ensembl) attr = c('ensembl_gene_id','ensembl_transcript_id', 'external_gene_id','chromosome_name','strand','transcript_start') bmres = getBM(attr, 'biotype', values = 'protein_coding', human) for(id in bmres[,"ensembl_transcript_id"]){ sequence = getSequence(id=id, type='ensembl_transcript_id', seqType='transcript_flank',upstream = 3000, mart = human) sl = with(sequence, nchar(as.character(transcript_flank))) cat(id, sl, "\n") } ------------------- One running this once, I got ...(lots of lines) ENST00000520540 3000 ENST00000519310 3000 ENST00000442920 3000 Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT FOUND The next time, the same error already occurred in the very first iteration of the for-loop, for id="ENST00000539570". The next time, in the third iteration for id="ENST00000510508". Any idea what is going on here? Further comments: - for *Steffen*: The documentation and the code of 'getSequence' do not seem to match each other (e.g. the description of argument 'seqType'), MySQL mode is mentioned but afaIu is not supported any more -> perhaps some maintenance would be nice to users. - for *Tom*: Making these queries (such as getSequence) within a for-loop is bad practice, since it needlessly clogs the network and the BioMart webservers. Please use R's vector-capabilities, e.g. ------------------------ sequence = getSequence(id=bmres[,"ensembl_transcript_id"], type='ensembl_transcript_id', seqType='transcript_flank', upstream = 3000, mart = human) sl = with(sequence, nchar(as.character(transcript_flank))) ------------------------- Best wishes Wolfgang Tom Hait scripsit 08/06/2012 12:37 PM: > Hello, > > I'm a student in bioinformatics in Tel Aviv University. > I'm working with you biomaRt API in order to generate automatically FASTA > sequences downloading. > I experienced some problem, here is my code: > > #open biomart libaray > library(biomaRt) > #open data set of human > human = useDataset("hsapiens_gene_ensembl",mart=ensembl) > #select the attributes that we want from the data set > attr<-c('ensembl_gene_id','ensembl_transcript_id', > 'external_gene_id','chromosome_name','strand','transcript_start') > #downloading the map between transcript id and transcript name > tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) > #save in a TSV format (the file is saved in txt) > write.table(tmpgene,"Z:/tomhait/organisms/human/transcript_names.txt", > row.names=FALSE, quote=FALSE) > #collect all sequences with upstream flank 3000 bases based on the first > column (ensembl_id) of tmpgene > i<-1 > for(id1 in tmpgene[,2]){ > #retrieve sequence > sequence<-getSequence(id=id1, > type='ensembl_transcript_id',seqType='transcript_flank',upstream = 3000, > mart = human) > #check if sequence was retrieved > sLengths <- with(sequence, nchar(as.character(transcript_flank))) > > #writing to a new file in "Z:/tomhait/organisms/human/mart_export_new.txt" > #you can change it to "mart_export_new.txt" and it will create a new file > in R directory > if(length(sLengths) > 0){ > x<-sequence[,1] > y<-y<-strsplit(gsub("([[:alnum:]]{60})", "\\1 ", x), " ")[[1]] > title<-paste(paste(">",tmpgene[i,1],sep=""),tmpgene[i,2],tmpgene[ i,3],tmpgene[i,4],tmpgene[i,5],tmpgene[i,6], > sep="|") > write(title,file="Z:/tomhait/organisms/human/mart_export_new.txt" ,ncolumns > = 1, append=TRUE,sep="") > write(y,file="Z:/tomhait/organisms/human/mart_export_new.txt",ncolumns = > 1, append=TRUE,sep="\n") > write("\n",file="Z:/tomhait/organisms/human/mart_export_new.txt", ncolumns > = 1, append=TRUE,sep="\n") > } > i<-i+1 > } > > I got the message: > Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : > Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT > FOUND > > Could you please help me to solve this problem? > > Best Regards, > > Tom Hait. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 13.5 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Oops, I forgot sessionInfo() for my previous post, here it is: R Under development (unstable) (2012-08-07 r60182) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=la_AU.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.13.2 fortunes_1.5-0 loaded via a namespace (and not attached): [1] RCurl_1.91-1 XML_3.9-4 Wolfgang Huber scripsit 08/07/2012 11:08 AM: > Dear Steffen / List, > below is a more compact code example that reproduces Tom's problem. I am > rather confused by the fact that the problem seemed to occur > stochastically! > > ------------------- > library(biomaRt) > options(error=recover) > ensembl = useMart("ensembl") > human = useDataset("hsapiens_gene_ensembl",mart=ensembl) > attr = c('ensembl_gene_id','ensembl_transcript_id', > 'external_gene_id','chromosome_name','strand','transcript_start') > bmres = getBM(attr, 'biotype', values = 'protein_coding', human) > > for(id in bmres[,"ensembl_transcript_id"]){ > sequence = getSequence(id=id, type='ensembl_transcript_id', > seqType='transcript_flank',upstream = 3000, > mart = human) > sl = with(sequence, nchar(as.character(transcript_flank))) > cat(id, sl, "\n") > } > ------------------- > > One running this once, I got > ...(lots of lines) > ENST00000520540 3000 > ENST00000519310 3000 > ENST00000442920 3000 > Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : > Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank > NOT FOUND > > The next time, the same error already occurred in the very first > iteration of the for-loop, for id="ENST00000539570". The next time, in > the third iteration for id="ENST00000510508". > > Any idea what is going on here? > > > Further comments: > - for *Steffen*: The documentation and the code of 'getSequence' do not > seem to match each other (e.g. the description of argument 'seqType'), > MySQL mode is mentioned but afaIu is not supported any more -> perhaps > some maintenance would be nice to users. > - for *Tom*: Making these queries (such as getSequence) within a > for-loop is bad practice, since it needlessly clogs the network and the > BioMart webservers. Please use R's vector-capabilities, e.g. > > ------------------------ > sequence = getSequence(id=bmres[,"ensembl_transcript_id"], > type='ensembl_transcript_id', seqType='transcript_flank', > upstream = 3000, mart = human) > sl = with(sequence, nchar(as.character(transcript_flank))) > ------------------------- > > Best wishes > Wolfgang > > > Tom Hait scripsit 08/06/2012 12:37 PM: >> Hello, >> >> I'm a student in bioinformatics in Tel Aviv University. >> I'm working with you biomaRt API in order to generate automatically FASTA >> sequences downloading. >> I experienced some problem, here is my code: >> >> #open biomart libaray >> library(biomaRt) >> #open data set of human >> human = useDataset("hsapiens_gene_ensembl",mart=ensembl) >> #select the attributes that we want from the data set >> attr<-c('ensembl_gene_id','ensembl_transcript_id', >> 'external_gene_id','chromosome_name','strand','transcript_start') >> #downloading the map between transcript id and transcript name >> tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) >> #save in a TSV format (the file is saved in txt) >> write.table(tmpgene,"Z:/tomhait/organisms/human/transcript_names.txt", >> row.names=FALSE, quote=FALSE) >> #collect all sequences with upstream flank 3000 bases based on the first >> column (ensembl_id) of tmpgene >> i<-1 >> for(id1 in tmpgene[,2]){ >> #retrieve sequence >> sequence<-getSequence(id=id1, >> type='ensembl_transcript_id',seqType='transcript_flank',upstream = 3000, >> mart = human) >> #check if sequence was retrieved >> sLengths <- with(sequence, nchar(as.character(transcript_flank))) >> >> #writing to a new file in >> "Z:/tomhait/organisms/human/mart_export_new.txt" >> #you can change it to "mart_export_new.txt" and it will create a new file >> in R directory >> if(length(sLengths) > 0){ >> x<-sequence[,1] >> y<-y<-strsplit(gsub("([[:alnum:]]{60})", "\\1 ", x), " ")[[1]] >> >> title<-paste(paste(">",tmpgene[i,1],sep=""),tmpgene[i,2],tmpgene[i, 3],tmpgene[i,4],tmpgene[i,5],tmpgene[i,6], >> >> sep="|") >> >> write(title,file="Z:/tomhait/organisms/human/mart_export_new.txt",n columns >> >> = 1, append=TRUE,sep="") >> >> write(y,file="Z:/tomhait/organisms/human/mart_export_new.txt",ncolumns = >> 1, append=TRUE,sep="\n") >> >> write("\n",file="Z:/tomhait/organisms/human/mart_export_new.txt",nc olumns >> = 1, append=TRUE,sep="\n") >> } >> i<-i+1 >> } >> >> I got the message: >> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : >> Query ERROR: caught BioMart::Exception::Usage: Filter >> upstream_flank NOT >> FOUND >> >> Could you please help me to solve this problem? >> >> Best Regards, >> >> Tom Hait. >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 13.5 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Thanks for the code example Wolfgang, The stochasticity suggests the problem is on the BioMart server side, I'll contact them to see if they can look into it. Regards, Steffen On Tue, Aug 7, 2012 at 2:08 AM, Wolfgang Huber <whuber@embl.de> wrote: > Dear Steffen / List, > below is a more compact code example that reproduces Tom's problem. I am > rather confused by the fact that the problem seemed to occur stochastically! > > ------------------- > library(biomaRt) > options(error=recover) > ensembl = useMart("ensembl") > > human = useDataset("hsapiens_gene_**ensembl",mart=ensembl) > attr = c('ensembl_gene_id','ensembl_**transcript_id', > > 'external_gene_id','**chromosome_name','strand','** > transcript_start') > bmres = getBM(attr, 'biotype', values = 'protein_coding', human) > > for(id in bmres[,"ensembl_transcript_id"**]){ > sequence = getSequence(id=id, type='ensembl_transcript_id', > > seqType='transcript_flank',**upstream = 3000, > mart = human) > sl = with(sequence, nchar(as.character(transcript_**flank))) > cat(id, sl, "\n") > } > ------------------- > > One running this once, I got > ...(lots of lines) > ENST00000520540 3000 > ENST00000519310 3000 > ENST00000442920 3000 > > Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : > Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT > FOUND > > The next time, the same error already occurred in the very first iteration > of the for-loop, for id="ENST00000539570". The next time, in the third > iteration for id="ENST00000510508". > > Any idea what is going on here? > > > Further comments: > - for *Steffen*: The documentation and the code of 'getSequence' do not > seem to match each other (e.g. the description of argument 'seqType'), > MySQL mode is mentioned but afaIu is not supported any more -> perhaps some > maintenance would be nice to users. > - for *Tom*: Making these queries (such as getSequence) within a for-loop > is bad practice, since it needlessly clogs the network and the BioMart > webservers. Please use R's vector-capabilities, e.g. > > ------------------------ > sequence = getSequence(id=bmres[,"**ensembl_transcript_id"], > type='ensembl_transcript_id', seqType='transcript_flank', > > upstream = 3000, mart = human) > sl = with(sequence, nchar(as.character(transcript_**flank))) > ------------------------- > > Best wishes > Wolfgang > > > Tom Hait scripsit 08/06/2012 12:37 PM: > > Hello, >> >> I'm a student in bioinformatics in Tel Aviv University. >> I'm working with you biomaRt API in order to generate automatically FASTA >> sequences downloading. >> I experienced some problem, here is my code: >> >> #open biomart libaray >> library(biomaRt) >> #open data set of human >> human = useDataset("hsapiens_gene_**ensembl",mart=ensembl) >> #select the attributes that we want from the data set >> attr<-c('ensembl_gene_id','**ensembl_transcript_id', >> 'external_gene_id','**chromosome_name','strand','**transcript_start') >> #downloading the map between transcript id and transcript name >> tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human) >> #save in a TSV format (the file is saved in txt) >> write.table(tmpgene,"Z:/**tomhait/organisms/human/** >> transcript_names.txt", >> row.names=FALSE, quote=FALSE) >> #collect all sequences with upstream flank 3000 bases based on the first >> column (ensembl_id) of tmpgene >> i<-1 >> for(id1 in tmpgene[,2]){ >> #retrieve sequence >> sequence<-getSequence(id=id1, >> type='ensembl_transcript_id',**seqType='transcript_flank',**upstream = >> 3000, >> mart = human) >> #check if sequence was retrieved >> sLengths <- with(sequence, nchar(as.character(transcript_**flank))) >> >> #writing to a new file in "Z:/tomhait/organisms/human/** >> mart_export_new.txt" >> #you can change it to "mart_export_new.txt" and it will create a new file >> in R directory >> if(length(sLengths) > 0){ >> x<-sequence[,1] >> y<-y<-strsplit(gsub("([[:**alnum:]]{60})", "\\1 ", x), " ")[[1]] >> title<-paste(paste(">",**tmpgene[i,1],sep=""),tmpgene[** >> i,2],tmpgene[i,3],tmpgene[i,4]**,tmpgene[i,5],tmpgene[i,6], >> sep="|") >> write(title,file="Z:/tomhait/**organisms/human/mart_export_** >> new.txt",ncolumns >> = 1, append=TRUE,sep="") >> write(y,file="Z:/tomhait/**organisms/human/mart_export_**new.txt ",ncolumns >> = >> 1, append=TRUE,sep="\n") >> write("\n",file="Z:/tomhait/**organisms/human/mart_export_** >> new.txt",ncolumns >> = 1, append=TRUE,sep="\n") >> } >> i<-i+1 >> } >> >> I got the message: >> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"), : >> Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank >> NOT >> FOUND >> >> Could you please help me to solve this problem? >> >> Best Regards, >> >> Tom Hait. >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > > -- > Best wishes > Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/**units/genome_biology/huber<http: www.="" embl.de="" research="" units="" genome_biology="" huber=""> > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD REPLY • link 13.5 years ago Steffen Durinck ▴ 540

Login before adding your answer.