BiomaRt error: ncol(result) == length(attributes) is not TRUE

0

Entering edit mode

Quin Wills ▴ 100

@quin-wills-2709

Last seen 11.4 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080327/ 18da1096/attachment.pl

• 2.5k views

ADD COMMENT • link 17.8 years ago Quin Wills ▴ 100

0

Entering edit mode

Quin Wills ▴ 100

@quin-wills-2709

Last seen 11.4 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080327/ 1251b386/attachment.pl

ADD COMMENT • link 17.8 years ago Quin Wills ▴ 100

0

Entering edit mode

Hi Quin, How long is your list of identifiers? It is not recommended to run a query like this in loops as this causes the web service to go out of sync at some point during the loop. biomaRt is made to perform batch queries. I would recommend to do your query as follows: genes <- getGene(id=ID, type="refseq_dna", mart=ensembl) This will give you a dataframe with the info for all the genes. If needed you can then loop over the result. If you feel like you really need to loop you could add Sys.sleep(1) in the loop. Cheers, Steffen Quin Wills wrote: > Hello all > > I'm running the most up to date R and biomaRt. > > I get the following error: > >Error: ncol(result) == length(attributes) is not TRUE > > for the following loop: > # 'ID' is a character vector of refseq IDs > #'gene', for the purposes of the argument here, is a list storing the output > > > ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl") > > for (i in 1:length(ID)) { > > gene[[i]] <- getGene(id=ID[i], type="refseq_dna", mart=ensembl) > > } > > The problem is not dependent on the get function used or the id type > used. I didn't have this problem yesterday on the same script. The error > also occurs randomly, breaking the loop at any particular point, > sometimes allowing thousands of loops to run. > > Could this be a problem with the server I'm pulling the information > from? It just seems too random to be my coding - especially considering > I didn't have this problem yesterday. > > I've had this before, ages ago, and would like to get to the bottom of > it. And wisdom? Thanks. > > > * * > > * * > > * * > > *Quin Wills* > *DPhil candidate* > > * * > > *Department of Statistics* > > *University** of Oxford*** > > *1 South Parks Road* > *Oxford*** > > *OX1 3TG > United Kingdom* > > > > *01865 285 394* > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- ---------------------------------------------------------------- Steffen Durinck, PhD Division of Biostatistics, University of California, Berkeley & Life Sciences Department, Lawrence Berkeley National Laboratory 1 cyclotron Rd, Berkeley CA, 94720, USA Tel: +1-510-486-5202

ADD REPLY • link 17.8 years ago Steffen ▴ 500

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080327/ 297bd78f/attachment.pl

ADD REPLY • link 17.8 years ago Quin Wills ▴ 100

0

Entering edit mode

Hi Quin, It can deal with a long vector of identifiers e,g, 30000 ids should work in one query and should be fast. Cheers, Steffen Quin Wills wrote: > Thank you Steffen for the really quick reply. > > Out of interest, I tried using Sys.sleep() but I still get the same > problem. www.ensembl.org is also running quite slowly this side for > queries, and I wonder if that might be somehow related. > > Sorry if this is in the manual - I didn't spot it. How many queries in > a batch do you think one should avoid going over? I've a fairly long > list of identifiers. > > Quin > > > Steffen wrote: >> Hi Quin, >> >> How long is your list of identifiers? It is not recommended to run a >> query like this in loops as this causes the web service to go out of >> sync at some point during the loop. >> biomaRt is made to perform batch queries. >> I would recommend to do your query as follows: >> >> genes <- getGene(id=ID, type="refseq_dna", mart=ensembl) >> >> This will give you a dataframe with the info for all the genes. If >> needed you can then loop over the result. >> If you feel like you really need to loop you could add Sys.sleep(1) >> in the loop. >> >> Cheers, >> Steffen >> >> Quin Wills wrote: >>> Hello all >>> >>> I'm running the most up to date R and biomaRt. >>> >>> I get the following error: >>> >Error: ncol(result) == length(attributes) is not TRUE >>> >>> for the following loop: >>> # 'ID' is a character vector of refseq IDs >>> #'gene', for the purposes of the argument here, is a list storing >>> the output >>> >>> > ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl") >>> > for (i in 1:length(ID)) { >>> > gene[[i]] <- getGene(id=ID[i], type="refseq_dna", >>> mart=ensembl) >>> > } >>> >>> The problem is not dependent on the get function used or the id type >>> used. I didn't have this problem yesterday on the same script. The >>> error also occurs randomly, breaking the loop at any particular >>> point, sometimes allowing thousands of loops to run. >>> >>> Could this be a problem with the server I'm pulling the information >>> from? It just seems too random to be my coding - especially >>> considering I didn't have this problem yesterday. >>> >>> I've had this before, ages ago, and would like to get to the bottom >>> of it. And wisdom? Thanks. >>> >>> >>> * * >>> >>> * * >>> >>> * * >>> >>> *Quin Wills* >>> *DPhil candidate* >>> >>> * * >>> >>> *Department of Statistics* >>> >>> *University** of Oxford*** >>> >>> *1 South Parks Road* >>> *Oxford*** >>> >>> *OX1 3TG >>> United Kingdom* >>> >>> >>> >>> *01865 285 394* >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> > > -- > > * * > > * * > > * * > > *Quin Wills* > *DPhil candidate* > > * * > > *Department of Statistics* > > *University** of Oxford*** > > *1 South Parks Road* > *Oxford*** > > *OX1 3TG > United Kingdom* > > > > *01865 285 394* > -- ---------------------------------------------------------------- Steffen Durinck, PhD Division of Biostatistics, University of California, Berkeley & Life Sciences Department, Lawrence Berkeley National Laboratory 1 cyclotron Rd, Berkeley CA, 94720, USA Tel: +1-510-486-5202

ADD REPLY • link 17.8 years ago Steffen ▴ 500

0

Entering edit mode

Dear Quin, In general it is sufficient the send the same mail only once to this list, there is no added benefit in looping over the send button, and indeed might collect you bad karma from all the people who have to clean their mailboxes. The manual page of the "getGene" function says Usage: getGene( id, type, mart) Arguments: id: *vector* of gene identifiers one wants to annotate I am not sure how this could be more clear. R is a well-developed and powerful language and many people have benefited from reading introductions such as this one (on CRAN): http://www.stats.bris.ac.uk/R/doc/manuals/R-intro.html -- Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber 27/03/2008 18:19 Quin Wills a ?crit > Thank you Steffen for the really quick reply. > > Out of interest, I tried using Sys.sleep() but I still get the same > problem. www.ensembl.org is also running quite slowly this side for > queries, and I wonder if that might be somehow related. > > Sorry if this is in the manual - I didn't spot it. How many queries in a > batch do you think one should avoid going over? I've a fairly long list > of identifiers. > > Quin > > > Steffen wrote: >> Hi Quin, >> >> How long is your list of identifiers? It is not recommended to run a >> query like this in loops as this causes the web service to go out of >> sync at some point during the loop. >> biomaRt is made to perform batch queries. >> I would recommend to do your query as follows: >> >> genes <- getGene(id=ID, type="refseq_dna", mart=ensembl) >> >> This will give you a dataframe with the info for all the genes. If >> needed you can then loop over the result. >> If you feel like you really need to loop you could add Sys.sleep(1) >> in the loop. >> >> Cheers, >> Steffen >> >> Quin Wills wrote: >>> Hello all >>> >>> I'm running the most up to date R and biomaRt. >>> >>> I get the following error: >>> >Error: ncol(result) == length(attributes) is not TRUE >>> >>> for the following loop: >>> # 'ID' is a character vector of refseq IDs >>> #'gene', for the purposes of the argument here, is a list storing the >>> output >>> >>> > ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl") >>> > for (i in 1:length(ID)) { >>> > gene[[i]] <- getGene(id=ID[i], type="refseq_dna", mart=ensembl) >>> > } >>> >>> The problem is not dependent on the get function used or the id type >>> used. I didn't have this problem yesterday on the same script. The >>> error also occurs randomly, breaking the loop at any particular >>> point, sometimes allowing thousands of loops to run. >>> >>> Could this be a problem with the server I'm pulling the information >>> from? It just seems too random to be my coding - especially >>> considering I didn't have this problem yesterday. >>> >>> I've had this before, ages ago, and would like to get to the bottom >>> of it. And wisdom? Thanks. >>> >>> >>> * * >>> >>> * * >>> >>> * * >>> >>> *Quin Wills* >>> *DPhil candidate* >>> >>> * * >>> >>> *Department of Statistics* >>> >>> *University** of Oxford*** >>> >>> *1 South Parks Road* >>> *Oxford*** >>> >>> *OX1 3TG >>> United Kingdom* >>> >>> >>> >>> *01865 285 394* >>> >>> >>> [[alternative HTML version deleted]]

ADD REPLY • link 17.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080328/ d0aa9ed6/attachment.pl

ADD REPLY • link 17.8 years ago Quin Wills ▴ 100

0

Entering edit mode

Dear Quin, I'm surprised you sometimes get different amounts of annotation and that sometimes you get clearly incorrect results with biomaRt. Could you give examples of this so we can further investigate this. There is no maximum on the length of identifiers and that is why this is not in the manual. Unless you really retrieve a lot of data like all cDNA sequences for all genes at once, you might see your query fail as the download of the data is to big, but then it should return nothing at all. Best, Steffen ----- Original Message ----- From: Quin Wills <wills@stats.ox.ac.uk> Date: Friday, March 28, 2008 3:15 am Subject: Re: [BioC] BiomaRt error: ncol(result) == length(attributes) is not TRUE To: Wolfgang Huber <huber at="" ebi.ac.uk=""> Cc: Steffen <sdurinck at="" lbl.gov="">, bioconductor at stat.math.ethz.ch > Thank you Steffen and Wolfgang > > I'm fairly au fait with R, and realise how tedious loops can be, > thanks. > For my needs, because of (i) the uncertain amount of annotation > returned > for some identifiers (ii) sometimes clearly incorrect annotation > returned, it's been sometimes easier to run loops. > > Apologies if I wasn't clear... my last uncertainty was not lack of > clarity on single/vector queries, rather how large a single query > one > can send, and that I can't find that info. Knowing that I can run > 3000 > queries in one go without server retributions is useful, thanks so > much > Steffen. I'm sorry if it is written somewhere - but perhaps an > indication of what size query goes beyond sensible in the help/manual? > > Apologies for the multiple email - it was a result of me receiving > a > 'failed to send' message this side- apparently it had sent. > > Quin > > > Wolfgang Huber wrote: > > Dear Quin, > > > > In general it is sufficient the send the same mail only once to this > > list, there is no added benefit in looping over the send button, and > > indeed might collect you bad karma from all the people who have > to clean > > their mailboxes. > > > > The manual page of the "getGene" function says > > > > Usage: > > getGene( id, type, mart) > > > > Arguments: > > id: *vector* of gene identifiers one wants to annotate > > > > I am not sure how this could be more clear. R is a well-developed > and> powerful language and many people have benefited from reading > > introductions such as this one (on CRAN): > > http://www.stats.bris.ac.uk/R/doc/manuals/R-intro.html > > > > > > -- > > * * > > * * > > * * > > *Quin Wills* > *DPhil candidate* > > * * > > *Department of Statistics* > > *University** of Oxford*** > > *1 South Parks Road* > *Oxford*** > > *OX1 3TG > United Kingdom* > > > > *01865 285 394* > >

ADD REPLY • link 17.8 years ago Steffen ▴ 500

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080328/ 8cd3b082/attachment.pl

ADD REPLY • link 17.8 years ago Quin Wills ▴ 100

0

Entering edit mode

Dear colleagues, We are pleased to invite you to the Symposium on Computational and Systems Biology: Potentials and Challenges that will be held in Vienna, Austria on Thursday, April 24th 2008. The Symposium is organized by the "Bioinformatics Integration Network", a project of the Austrian Genome Research Initiative GEN-AU. The following renowned bioinformaticians will share their view on the potentials and current challenges in Computational and Systems Biology: Ruedi Aebersold, Institute of Molecular Systems Biology, ETH Zurich, Switzerland Peer Bork, Structural and Computational Biology Programme, EMBL, Germany Frank Eisenhaber, Bioinformatics Institute, A*Star, Singapore Dmitrij Frishman, Department of Genome-Oriented Bioinformatics, TUM, Germany Jan Gorodkin, Genetics and Bioinformatics, University of Copenhagen, Denmark Carolin Kosiol, Computational and Biological Statistics, Cornell University, USA Martin J. Lercher, Department of Bioinformatics, University D?sseldorf, Germany David H. Mathews, Biostatistics & Comp. Biology, University of Rochester, USA Giulio Superti-Furga, Research Center for Molecular Medicine, Vienna, Austria Attendance is free, registration is required via http://bin.tugraz.at/Symposium2008 On behalf of the organizing committee Gerhard Thallinger ---------------------------------------------------------------------- -- Dr. Gerhard Thallinger E-mail: Gerhard.Thallinger at tugraz.at Institute for Genomics and Bioinformatics Web: http://genome.tugraz.at Graz University of Technology Tel: +43 316 873 5343 Petersgasse 14/V Fax: +43 316 873 105343 8010 Graz, Austria Map: http://genome.tugraz.at/Loc.html

ADD REPLY • link 17.8 years ago Gerhard Thallinger ▴ 180

0

Entering edit mode

Dear all, in the context of the "Bioinformatics Integration Network", which is coordinated by our institute we are organizing a Symposium on Computational and Systems Biology: Potentials and Challenges The following renowned bioinformaticians will share their view on the potentials and current challenges in Computational and Systems Biology: Ruedi Aebersold, Institute of Molecular Systems Biology, ETH Zurich, Switzerland Peer Bork, Structural and Computational Biology Programme, EMBL, Germany Frank Eisenhaber, Bioinformatics Institute, A*Star, Singapore Dmitrij Frishman, Department of Genome-Oriented Bioinformatics, TUM, Germany Jan Gorodkin, Genetics and Bioinformatics, University of Copenhagen, Denmark Carolin Kosiol, Computational and Biological Statistics, Cornell University, USA Martin J. Lercher, Department of Bioinformatics, University D?sseldorf, Germany David H. Mathews, Biostatistics & Comp. Biology, University of Rochester, USA Giulio Superti-Furga, Research Center for Molecular Medicine, Vienna, Austria Attendance is free, registration is required via http://bin.tugraz.at/Symposium2008 lG Gerhard ---------------------------------------------------------------------- -- Dr. Gerhard Thallinger E-mail: Gerhard.Thallinger at tugraz.at Institute for Genomics and Bioinformatics Web: http://genome.tugraz.at Graz University of Technology Tel: +43 316 873 5343 Petersgasse 14/V Fax: +43 316 873 105343 8010 Graz, Austria Map: http://genome.tugraz.at/Loc.html

ADD REPLY • link 17.8 years ago Gerhard Thallinger ▴ 180

0

Entering edit mode

Quin Wills ▴ 100

@quin-wills-2709

Last seen 11.4 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080327/ 6e8913fc/attachment.pl

ADD COMMENT • link 17.8 years ago Quin Wills ▴ 100

0

Entering edit mode

Quin Wills ▴ 100

@quin-wills-2709

Last seen 11.4 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080327/ c2d21e48/attachment.pl

ADD COMMENT • link 17.8 years ago Quin Wills ▴ 100

Login before adding your answer.