Annotation for a GEO data set
1
1
Entering edit mode
Joern Grame ▴ 20
@joern-grame-3973
Last seen 7.7 years ago
Dear Bioconductor, I have a question concerning a GEO data set. I have downloaded the data into R using the GEOquery package. I'm trying to map the Affymetrix probe ids onto gene symbols, but can't find the appropriate annotation data. Following some of the tutorials, using the annotate package should help, but what I get from the function annotation is the GEO platform identifier: > library(GEOquery) > library(annotation) > data <- GEOquery(GEO='GSE13639') > annotation(data) [1] "GPL570" I'd like to use functions like getSYMBOL, but I don't know which mapping package to install. Help will be much appreciated. Yours, Joern [[alternative HTML version deleted]]
2
Entering edit mode
@sean-davis-490
Last seen 1 day ago
United States
On Thu, Mar 4, 2010 at 12:30 PM, Joern Grame <gjormac at="" googlemail.com=""> wrote: > Dear Bioconductor, > > I have a question concerning a GEO data set. I have downloaded the data into > R using the GEOquery package. I'm trying to map the Affymetrix probe ids > onto gene symbols, but can't find the appropriate annotation data. Following > some of the tutorials, using the annotate package should help, but what I > get from the function annotation is the GEO platform identifier: > >> library(GEOquery) >> library(annotation) >> data <- GEOquery(GEO='GSE13639') >> annotation(data) > [1] "GPL570" > > I'd like to use functions like getSYMBOL, but I don't know which mapping > package to install. ?Help will be much appreciated. Hi, Joern. GPL570 is represented in Bioconductor as hgu133plus2.db. You can get this the old-fashioned way by looking up GPL570 in GEO and then going to the Bioconductor website to find the right package by hand. Alternatively, you may use the GEOmetadb package to get the information directly: > library(GEOmetadb) > sqlfile = getSQLiteFile() > con = dbConnect("SQLite",sqlfile) > dbGetQuery(con,"select gpl,title,bioc_package from gpl where gpl='GPL570'") Then, you are off to the races.... > biocLite('hgu133plus2.db') will get you the correct package. However, your "data" object already has the annotation information from NCBI GEO in it: > colnames(fData(data)) [1] "ID" "GB_ACC" [3] "SPOT_ID" "Species.Scientific.Name" [5] "Annotation.Date" "Sequence.Type" [7] "Sequence.Source" "Target.Description" [9] "Representative.Public.ID" "Gene.Title" [11] "Gene.Symbol" "ENTREZ_GENE_ID" [13] "RefSeq.Transcript.ID" "Gene.Ontology.Biological.Process" [15] "Gene.Ontology.Cellular.Component" "Gene.Ontology.Molecular.Function" > fData(data)$Gene.Symbol[1:10] [1] DDR1 RFC2 HSPA6 PAX8 GUCA1A UBA7 THRA PTPN21 CCL5 CYP2E1 20828 Levels: ADAM32 AFG3L1 ALG10 ARMCX4 ATP6V1E2 BEST4 C15orf40 ... FAM86B1 > fData(data)["1007_s_at",]$Gene.Symbol [1] DDR1 20828 Levels: ADAM32 AFG3L1 ALG10 ARMCX4 ATP6V1E2 BEST4 C15orf40 ... FAM86B1 Hope that helps. Sean