Simple annotate question: difference in getSYMBOL and lookUp
1
0
Entering edit mode
@merja-matilainen-2593
Last seen 10.3 years ago
Hi! Can someone explain me the difference in the following attempts to add annotation data my genes (I have a dataset done with Illumina arrays)? This one works: > geneSymbol=getSYMBOL(fit2$genes$ID, 'lumiHumanV2') > fit2$genes=data.frame(fit2$genes, geneSymbol=geneSymbol) Here I get an error: > geneEntrez=lookUp(fit2$genes$ID, 'lumiHumanV2', 'ENTREZID') > fit2$genes=data.frame(fit2$genes, geneEntrezID=geneEntrez) Error in data.frame(fit2$genes, geneEntrezID = geneEntrez) : arguments imply differing number of rows: 48701, 1 I get the same error if I try to look for example for gene function. I assume the answer is what type of data structure these two functions return. If I understood the vignette getSYMBOL gives me a vector and lookUp gives me a list. (the help topic says 'Either a vector or a list depending on whether multiple values per input are possible') Unfortunately I am not that familiar with R data structures yet. Could you tell me how I can add to the fit2$genes the entrez result? And perhaps explain why the match to the symbol of the gene is not giving multiple values if description is. Thanks for your help! Merja ########################################### This message has been scanned by F-Secure Anti-Virus for...{{dropped:4}}
• 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 58 minutes ago
United States
Hi Merja, merja matilainen wrote: > Hi! > > Can someone explain me the difference in the following attempts to > add annotation data my genes (I have a dataset done with Illumina > arrays)? > > This one works: >> geneSymbol=getSYMBOL(fit2$genes$ID, 'lumiHumanV2') >> fit2$genes=data.frame(fit2$genes, geneSymbol=geneSymbol) > > Here I get an error: >> geneEntrez=lookUp(fit2$genes$ID, 'lumiHumanV2', 'ENTREZID') >> fit2$genes=data.frame(fit2$genes, geneEntrezID=geneEntrez) > Error in data.frame(fit2$genes, geneEntrezID = geneEntrez) : > arguments imply differing number of rows: 48701, 1 In this case you could answer the question yourself: > getSYMBOL function (x, data) { unlist(lookUp(x, data, "SYMBOL")) } <environment: namespace:annotate=""> > lookUp function (x, data, what, load = FALSE) { if (length(x) < 1) { stop("No keys provided") } mget(x, envir = getAnnMap(what, chip = data, load = load), ifnotfound = NA) } <environment: namespace:annotate=""> So getSYMBOL() is the same as lookUp(), only wrapped in a call to unlist(). Now unlist() will take a list and turn it into a vector, and mget() will return a list, so if you just wrap your call to lookUp() in an unlist(), then you should get results that can be converted to a data.frame. Note however that this will not always work so cleanly. If any of the illumina IDs map to more than one Entrez Gene ID (I don't think they should), then the resulting vector will be too long and you won't be able to make a data.frame. You can always check this first by something like: table(sapply(geneEntrez, length)) You might also want to extract the results from your fit2 object into a new data.frame rather than overwriting the existing object (you are making copies regardless). Best, Jim > > I get the same error if I try to look for example for gene function. > > I assume the answer is what type of data structure these two > functions return. If I understood the vignette getSYMBOL gives me a > vector and lookUp gives me a list. (the help topic says 'Either a > vector or a list depending on whether multiple values per input are > possible') Unfortunately I am not that familiar with R data > structures yet. Could you tell me how I can add to the fit2$genes the > entrez result? And perhaps explain why the match to the symbol of the > gene is not giving multiple values if description is. > > Thanks for your help! > > Merja ########################################### > > This message has been scanned by F-Secure Anti-Virus > for...{{dropped:4}} > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD COMMENT

Login before adding your answer.

Traffic: 656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6