error/problem with biomaRt gene symbol query
2
0
Entering edit mode
@sergii-ivakhno-2537
Last seen 10.2 years ago
Dear All, I want to retrieve gene symbols for microarray probes such that I receive some output even if no gene spans the probe. I am using position information in biomaRt for this: genes=getBM(attributes = c("hgnc_symbol"), filters= c("chromosome_name","start","end"), values = list(rep(i,length(posnew)),posnew,posnew+10), mart = ensembl) Unfortunately, it seems that biomaRt does not provide NULL output for probes outside genes, so that it is not possible to assign resulting probes to gene names. posnew = position of array probes length(posnew ) = 24760 length(genes) = 336 (only !) I tried few tricks: 1) Explicitly specifying na.value = "no gene"; 2) Also trying to retrieve "chromosome_name", as this is bound to provide output for every value in posnew. genes=getBM(attributes = c("chromosome_location","hgnc_symbol"), filters= c("chromosome_name","start","end"), values = list(rep(i,length(posnew)),posnew,posnew+10), mart = ensembl,na.value = "no gene") The query returns error: 1 Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple attribute pages are not allowed Error in getBM(attributes = c("chromosome_location", "hgnc_symbol"), filters = c("chromosome_name", : Number of columns in the query result doesn't equal number of attributes in query. This is probably an internal error, please report. Would be grateful for suggestions - I realise that you can biomaRt within lapply loop to query one position at a time, but this proves to be too time consuming when you have 1 million probes. Many thanks! Sergii > sessionInfo() R version 2.7.0 (2008-04-22) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB .U TF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAM E= C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICA TI ON=C attached base packages: [1] splines grid tools stats graphics grDevices utils [8] datasets methods base other attached packages: [1] biomaRt_1.14.1 RCurl_0.9-4 snapCGH_1.8.0 [4] aCGH_1.14.0 sma_0.5.15 multtest_1.20.0 [7] cluster_1.11.10 GLAD_1.16.0 DNAcopy_1.14.0 [10] tilingArray_1.18.0 pixmap_0.4-7 geneplotter_1.18.0 [13] annotate_1.18.0 xtable_1.5-2 AnnotationDbi_1.2.1 [16] RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 [19] survival_2.34-1 vsn_3.6.0 lattice_0.17-6 [22] strucchange_1.3-3 sandwich_2.1-0 zoo_1.5-3 [25] RColorBrewer_1.0-2 affy_1.18.1 preprocessCore_1.2.0 [28] affyio_1.8.0 Biobase_2.0.1 limma_2.14.5 loaded via a namespace (and not attached): [1] KernSmooth_2.22-22 XML_1.96-0 ---------------------------------------------- Sergii Ivakhno PhD student Computational Biology Group Cancer Research UK Cambridge Research Institute Li Ka Shing Centre Robinson Way Cambridge CB2 0RE England +44 (0)1223 404293 (O) +44 (0)1223 404128 (F) http://www.compbio.group.cam.ac.uk <http: www.compbio.group.cam.ac.uk=""/> / This communication is from Cancer Research UK. Our website is at www.cancerresearchuk.org. We are a charity registered under number 1089464 and a company limited by guarantee registered in England & Wales under number 4325234. Our registered address is 61 Lincoln's Inn Fields, London WC2A 3PX. Our central telephone number is 020 7242 0200. This communication and any attachments contain information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of disclosure, distribution, copying or use of this communication or the information in it or in any attachments is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender and delete the email and destroy any copies of it. E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses. We do not accept liability for any such matters or their consequences. Anyone who communicates with us by e-mail is taken to accept the risks in doing so. [[alternative HTML version deleted]]
Microarray Cancer probe biomaRt ASSIGN Microarray Cancer probe biomaRt ASSIGN • 1.4k views
ADD COMMENT
0
Entering edit mode
Steffen ▴ 500
@steffen-2351
Last seen 10.2 years ago
HI Sergii, biomaRt will only return a result when there is a mapping for a probe to a gene. Can you do: genes=getBM(attributes = c("hgnc_symbol","ensembl_gene_id","chromosome_name","start_position"," end_position") , mart = ensembl) And then loop over this result to map your probes to genes? (and assign NA when no gene is associated with a probe?) Cheers, Steffen On Mon, Jul 5, 2010 at 2:20 AM, Sergii Ivakhno <sergii.ivakhno@cancer.org.uk> wrote: > Dear All, > > I want to retrieve gene symbols for microarray probes such that I > receive some output even if no gene spans the probe. I am using position > information in biomaRt for this: > > genes=getBM(attributes = c("hgnc_symbol"), filters= > c("chromosome_name","start","end"), values = > list(rep(i,length(posnew)),posnew,posnew+10), mart = ensembl) > > > > Unfortunately, it seems that biomaRt does not provide NULL output for > probes outside genes, so that it is not possible to assign resulting > probes to gene names. > > > > posnew = position of array probes > > length(posnew ) = 24760 > > length(genes) = 336 (only !) > > > > I tried few tricks: > > 1) Explicitly specifying na.value = "no gene"; > > 2) Also trying to retrieve "chromosome_name", as this is bound to > provide output for every value in posnew. > > > > genes=getBM(attributes = c("chromosome_location","hgnc_symbol"), > filters= c("chromosome_name","start","end"), values = > list(rep(i,length(posnew)),posnew,posnew+10), mart = ensembl,na.value = > "no gene") > > > > The query returns error: > > > > 1 Query ERROR: caught BioMart::Exception::Usage: Attributes from > multiple attribute pages are not allowed > > Error in getBM(attributes = c("chromosome_location", "hgnc_symbol"), > filters = c("chromosome_name", : > > Number of columns in the query result doesn't equal number of > attributes in query. This is probably an internal error, please report. > > > > > > Would be grateful for suggestions - I realise that you can biomaRt > within lapply loop to query one position at a time, but this proves to > be too time consuming when you have 1 million probes. > > Many thanks! > > Sergii > > > > > sessionInfo() > > R version 2.7.0 (2008-04-22) > > x86_64-unknown-linux-gnu > > > > locale: > > LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_ GB.U > TF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_N AME= > C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFI CATI > ON=C > > > > attached base packages: > > [1] splines grid tools stats graphics grDevices utils > > [8] datasets methods base > > > > other attached packages: > > [1] biomaRt_1.14.1 RCurl_0.9-4 snapCGH_1.8.0 > > [4] aCGH_1.14.0 sma_0.5.15 multtest_1.20.0 > > [7] cluster_1.11.10 GLAD_1.16.0 DNAcopy_1.14.0 > > [10] tilingArray_1.18.0 pixmap_0.4-7 geneplotter_1.18.0 > > [13] annotate_1.18.0 xtable_1.5-2 AnnotationDbi_1.2.1 > > [16] RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 > > [19] survival_2.34-1 vsn_3.6.0 lattice_0.17-6 > > [22] strucchange_1.3-3 sandwich_2.1-0 zoo_1.5-3 > > [25] RColorBrewer_1.0-2 affy_1.18.1 preprocessCore_1.2.0 > > [28] affyio_1.8.0 Biobase_2.0.1 limma_2.14.5 > > > > loaded via a namespace (and not attached): > > [1] KernSmooth_2.22-22 XML_1.96-0 > > > ---------------------------------------------- > Sergii Ivakhno > > PhD student > > Computational Biology Group > Cancer Research UK Cambridge Research Institute > Li Ka Shing Centre > Robinson Way > Cambridge CB2 0RE > England > > +44 (0)1223 404293 (O) > +44 (0)1223 404128 (F) > > http://www.compbio.group.cam.ac.uk <http: www.compbio.group.cam.ac.uk=""/> > / > > > This communication is from Cancer Research UK. Our website is at > www.cancerresearchuk.org. We are a charity registered under number 1089464 > and a company limited by guarantee registered in England & Wales under > number 4325234. Our registered address is 61 Lincoln's Inn Fields, London > WC2A 3PX. Our central telephone number is 020 7242 0200. > > This communication and any attachments contain information which is > confidential and may also be privileged. It is for the exclusive use of > the intended recipient(s). If you are not the intended recipient(s) please > note that any form of disclosure, distribution, copying or use of this > communication or the information in it or in any attachments is strictly > prohibited and may be unlawful. If you have received this communication in > error, please notify the sender and delete the email and destroy any copies > of it. > > E-mail communications cannot be guaranteed to be secure or error free, as > information could be intercepted, corrupted, amended, lost, destroyed, > arrive late or incomplete, or contain viruses. We do not accept liability > for any such matters or their consequences. Anyone who communicates with us > by e-mail is taken to accept the risks in doing so. > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Sergii, On 7/5/2010 5:20 AM, Sergii Ivakhno wrote: > Dear All, > > I want to retrieve gene symbols for microarray probes such that I > receive some output even if no gene spans the probe. I am using position > information in biomaRt for this: > > genes=getBM(attributes = c("hgnc_symbol"), filters= > c("chromosome_name","start","end"), values = > list(rep(i,length(posnew)),posnew,posnew+10), mart = ensembl) > > > > Unfortunately, it seems that biomaRt does not provide NULL output for > probes outside genes, so that it is not possible to assign resulting > probes to gene names. Rather than trying to get biomaRt to churn out NULL results, why don't you just get back the positions that match to gene positions, and then merge() with your original position data? See ?merge, as well as the all.x argument. > > > > posnew = position of array probes > > length(posnew ) = 24760 > > length(genes) = 336 (only !) > > > > I tried few tricks: > > 1) Explicitly specifying na.value = "no gene"; > > 2) Also trying to retrieve "chromosome_name", as this is bound to > provide output for every value in posnew. > > > > genes=getBM(attributes = c("chromosome_location","hgnc_symbol"), > filters= c("chromosome_name","start","end"), values = > list(rep(i,length(posnew)),posnew,posnew+10), mart = ensembl,na.value = > "no gene") > > > > The query returns error: > > > > 1 Query ERROR: caught BioMart::Exception::Usage: Attributes from > multiple attribute pages are not allowed > > Error in getBM(attributes = c("chromosome_location", "hgnc_symbol"), > filters = c("chromosome_name", : > > Number of columns in the query result doesn't equal number of > attributes in query. This is probably an internal error, please report. > > > > > > Would be grateful for suggestions - I realise that you can biomaRt > within lapply loop to query one position at a time, but this proves to > be too time consuming when you have 1 million probes. Yes, and repeatedly hitting online resources in a tight loop is an optimized strategy for getting your IP banned. Best, Jim > > Many thanks! > > Sergii > > > >> sessionInfo() > > R version 2.7.0 (2008-04-22) > > x86_64-unknown-linux-gnu > > > > locale: > > LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_ GB.U > TF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_N AME= > C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFI CATI > ON=C > > > > attached base packages: > > [1] splines grid tools stats graphics grDevices utils > > [8] datasets methods base > > > > other attached packages: > > [1] biomaRt_1.14.1 RCurl_0.9-4 snapCGH_1.8.0 > > [4] aCGH_1.14.0 sma_0.5.15 multtest_1.20.0 > > [7] cluster_1.11.10 GLAD_1.16.0 DNAcopy_1.14.0 > > [10] tilingArray_1.18.0 pixmap_0.4-7 geneplotter_1.18.0 > > [13] annotate_1.18.0 xtable_1.5-2 AnnotationDbi_1.2.1 > > [16] RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 > > [19] survival_2.34-1 vsn_3.6.0 lattice_0.17-6 > > [22] strucchange_1.3-3 sandwich_2.1-0 zoo_1.5-3 > > [25] RColorBrewer_1.0-2 affy_1.18.1 preprocessCore_1.2.0 > > [28] affyio_1.8.0 Biobase_2.0.1 limma_2.14.5 > > > > loaded via a namespace (and not attached): > > [1] KernSmooth_2.22-22 XML_1.96-0 > > > ---------------------------------------------- > Sergii Ivakhno > > PhD student > > Computational Biology Group > Cancer Research UK Cambridge Research Institute > Li Ka Shing Centre > Robinson Way > Cambridge CB2 0RE > England > > +44 (0)1223 404293 (O) > +44 (0)1223 404128 (F) > > http://www.compbio.group.cam.ac.uk<http: www.compbio.group.cam.ac.u="" k=""/> > / > > > This communication is from Cancer Research UK. Our website is at www.cancerresearchuk.org. We are a charity registered under number 1089464 and a company limited by guarantee registered in England& Wales under number 4325234. Our registered address is 61 Lincoln's Inn Fields, London WC2A 3PX. Our central telephone number is 020 7242 0200. > > This communication and any attachments contain information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of disclosure, distribution, copying or use of this communication or the information in it or in any attachments is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender and delete the email and destroy any copies of it. > > E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses. We do not accept liability for any such matters or their consequences. Anyone who communicates with us by e-mail is taken to accept the risks in doing so. > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT

Login before adding your answer.

Traffic: 699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6