Question

Affymetrix hgu133plus2.db: How to best derive an expression value for genes that map to multiple probe ids

0

Entering edit mode

Matthias Munz ▴ 20

@matmu

Last seen 20 days ago

Germany

I want to map probe ids of the Affymetrix HG-U133_Plus_2 Array to Ensembl gene ids using the package hgu133plus2.db. There are a lot of genes that have multiple probe identifiers assigned to them. And there are also probe ids that have multiple Ensembl gene ids assigned to them (those I am removing right now). I wonder what the best approach is to select the best expression value that best represents the expression of a gene? Or is aggregating them by mean the better way to go? I guess one could do this using the probe id suffixes.

Suffixes included in hgu133plus2.db: "s_at" "at" "g_at" "i_at" "f_at" "a_at" "x_at" "r_at" "3_at" "5_at" "M_at" "MA_at" "MB_at" "alu_at"

library(hgu133plus2.db)
library(stringi)

anno = AnnotationDbi::select(hgu133plus2.db,
                             keys = keys(hgu133plus2.db, keytype = "PROBEID"),
                             keytype = "PROBEID",
                             columns = c("ENSEMBL"))

suffixes = unique(unlist(lapply(anno$PROBEID, function(x) stringi::stri_split_fixed(str = x, pattern = "_", n = 2, simplify = TRUE)[2])))

affydata hgu133plus2.db AffymetrixChip • 750 views

ADD COMMENT • link updated 15 months ago by ATpoint ★ 4.5k • written 15 months ago by Matthias Munz ▴ 20

score 0 · Answer 1 · 2023-07-24

0

Entering edit mode

ATpoint ★ 4.5k

@atpoint-13662

Last seen 14 hours ago

Germany

Hello, this has been discussed quite extensively before, many links in here: How to combine multiple probes representing a single gene?

ADD COMMENT • link 15 months ago ATpoint ★ 4.5k