Question

AnnotationData Packages for GPL2507 Sentrix Human-6 Expression BeadChip

0

Entering edit mode

kankejia0703 • 0

@kankejia0703-18746

Last seen 6.7 years ago

Dear all,

Do you know which Packages can be used Annotation for the GPL2507 Sentrix Human-6 Expression BeadChip? I can't find the right one now. Thanks!

microarray annotation • 1.9k views

ADD COMMENT • link updated 6.7 years ago by James W. MacDonald 68k • written 6.7 years ago by kankejia0703 • 0

score 2 · Answer 1 · 2019-03-22

Let's say you are interested in GSE3188, which has some data from that array.

> library(GEOquery)
> z <- getGEO("GSE3188")
< stuff happens>
> z[1]
$`GSE3188-GPL2507_series_matrix.txt.gz`
ExpressionSet (storageMode: lockedEnvironment)
assayData: 47293 features, 18 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM71605 GSM71607 ... GSM71670 (18 total)
  varLabels: title geo_accession ... data_row_count (34 total)
  varMetadata: labelDescription
featureData
  featureNames: GI_10047089-S GI_10047091-S ... trpF (47293 total)
  fvarLabels: ID SequenceSource ... SPOT_ID (5 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
  pubMedIds: 16565084 
Annotation: GPL2507

So the first ExpressionSet is GPL2507 do note that this comes with some annotation by default

> head(fData(z[[1]]))
                         ID SequenceSource      GB_ACC Annotation Date SPOT_ID
GI_10047089-S GI_10047089-S         RefSeq NM_014332.1              NA      NA
GI_10047091-S GI_10047091-S         RefSeq NM_013259.1              NA      NA
GI_10047093-S GI_10047093-S         RefSeq NM_016299.1              NA      NA
GI_10047099-S GI_10047099-S         RefSeq NM_016303.1              NA      NA
GI_10047103-S GI_10047103-S         RefSeq NM_016305.1              NA      NA
GI_10047105-S GI_10047105-S         RefSeq NM_016352.1              NA      NA

Where you have the ID and the RefSeq ID, which we can use to map things

> library(org.Hs.eg.db)
> ids <- sapply(strsplit(fData(z[[1]])[,3], "\\."), "[", 1)
> head(ids)
[1] "NM_014332" "NM_013259" "NM_016299" "NM_016303" "NM_016305" "NM_016352"
## there are NA values, so coerce to character
> ids[is.na(ids)] <- "NA"
> annot <- lapply(c("ENTREZID","SYMBOL","GENENAME"), function(x) mapIds(org.Hs.eg.db, ids, x, "ACCNUM"))
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
> annotdf <- data.frame(PROBEID = fData(z[[1]])[,1], ACCNUM = ids, ENTREZID = annot[[1]], SYMBOL = annot[[2]], GENENAME = annot[[3]])
> head(annotdf)
        PROBEID    ACCNUM ENTREZID SYMBOL
1 GI_10047089-S NM_014332    23676   SMPX
2 GI_10047091-S NM_013259    29114 TAGLN3
3 GI_10047093-S NM_016299    51182 HSPA14
4 GI_10047099-S NM_016303    51186 TCEAL9
5 GI_10047103-S NM_016305    51188 SS18L2
6 GI_10047105-S NM_016352    51200   CPA4
                                       GENENAME
1                 small muscle protein X-linked
2                                  transgelin 3
3 heat shock protein family A (Hsp70) member 14
4      transcription elongation factor A like 9
5                                   SS18 like 2
6                           carboxypeptidase A4
> fData(z[[1]]) <- annotdf

And now the featureData slot of the ExpressionSet has the annotation, and if you use limma to analyze (which you probably should), then the topTable output will all be annotated with the data we just put in the ExpressionSet.