HGU133PLUS.db : NA's when using getSYMBOL while uptodate NetAffx database gives gene names
1
0
Entering edit mode
@benoittessoulin-12350
Last seen 5.6 years ago

Hi,

I've been working around with Affy files for a while, I had been using direct datas from NetAffx to annotate my raw Affy files (merging expression data with annotation data by probe_id).

I recently shifted to a more straightforward method with Annotate and the HGU133PLUS2 package (which corresponds to my data). While some probesets are still associated with genes in NetAffx (online and when I download database) and in hgu133plus2.db, I can't see them associated with gene names.
 

For instance, I can use two methods to get gene names:

biocLite(hgu133plus2.db)
biocLite(annotate)

r=rownames(df_rma)
head(r)
[1] "1053_at"   "117_at"    "121_at"    "1255_g_at" "1316_at"   "1320_at" 

symb_ID=getSYMBOL(r,"hgu133plus2.db") 
head(symb_ID)
1053_at    117_at    121_at 1255_g_at   1316_at   1320_at
[1] "RFC2"   "HSPA6"    "PAX8"  "GUCA1A"    "THRA"  "PTPN21"

table(is.na(symb_ID))
FALSE
42358

eligibles=hgu133plus2SYMBOL[r]
> annots=toTable(eligibles)
> table(is.na(annots$symbol))

FALSE
42358

This is OK (we start from 54675 rownames, so 12317 genes aren't annotated), but when I look for a particular probeset of a gene of interest (for instance BBC3) for which a probeset is given by Affy:

grep("BBC",annots$symbol)
integer(0)

grep("211692_s_at",annots$probe_id)
integer(0)

This very gene isn't annotated. Still, it's correctly annotated into hgu133plus2.db:

 grep("211692_s_at",(keys(hgu133plus2.db)))

[1] 21014

grep("BBC3",(keys(hgu133plus2.db,keytype="SYMBOL")))
[1] 10487

s=select(hgu133plus2.db,keys="211692_s_at",columns="SYMBOL")
'select()' returned 1:many mapping between keys and columns
s
      PROBEID  SYMBOL
1 211692_s_at    BBC3
2 211692_s_at MIR3191
3 211692_s_at MIR3190

So, is it that probesets that matche several transcripts are "discarded"? In the very good documentation of Marc Carlson it's not straightforwardly inidcated.

 

Tahnk you!

 

 

 

 

 

Annotation hgu133plus2 • 799 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

You have essentially answered your own question. Using the old style functions like getSYMBOL should be avoided, and you should use the more modern functions select or mapIds, for obvious reasons.

ADD COMMENT
0
Entering edit mode

Thanks, was unaware of the mapIds method.

ADD REPLY

Login before adding your answer.

Traffic: 466 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6