Question

What to do with microarray probes that catch multiple genes?

0

Entering edit mode

ravelarvargas • 0

@ravelarvargas-17116

Last seen 4.8 years ago

I am currently looking at existing differential gene expression data another lab has uploaded to GEO (GSE48761). I have a few issues I don't know how to deal with:

1. How to salvage gene of interest data when one probe grabs multiple genes?

For example, I'm interested on the expression of a gene called TMSB4X. However, the only probe mapping to TMSB4X is the following:

TMSB4Y///TMSB4XP6///TMSB4XP2///TMSB4XP1///TMSB4X

How can I analyze expression of TMSB4X if the probe is catching all these other genes as well?

2. How to do statistics on gene expression when so many probes are grabbing multiple genes?

I am trying to build a contingency table looking at the differential gene expression of a family of genes within this dataset. However, in order to do this part of my contingency table needs to be 'every gene expressed in Werner's syndrome that isn't differentially expressed in Werner's syndrome and isn't one of my genes of interest' but again many of these genes are mapped to probes that contain multiple genes. How do I deal with this problem?

Happy to clarify anything

affymetrix microarrays microarray probe statistics • 957 views

ADD COMMENT • link updated 5.7 years ago by James W. MacDonald 65k • written 5.7 years ago by ravelarvargas • 0

score 1 · Answer 1 · 2018-08-30

There are a combination of probesets that are assumed to interrogate those two genes (and three pseudogenes, which may not actually be expressed):

> table(select(hugene10sttranscriptcluster.db, c("TMSB4X","TMSB4Y","TMSB4XP6","TMSB4XP1","TMSB4XP2"), "PROBEID","ALIAS"))
'select()' returned 1:many mapping between keys and columns
          PROBEID
ALIAS      8050089 8067007 8101774 8158240 8166072 8176644
  TMSB4X         0       1       1       1       1       0
  TMSB4XP1       0       1       1       1       1       0
  TMSB4XP2       1       1       1       1       1       0
  TMSB4XP6       0       1       1       1       1       0
  TMSB4Y         0       1       0       0       1       1

You could assume that anything that measures the X-linked gene, as well as the X-linked pseudogenes is good, and ignore those probesets that interrogate the Y-linked gene. The folks at MBNI take all the probes and generate probesets that are based only on those probes that uniquely map to a given gene. If we look at what they say:

> table(select(hugene10sthsentrezg.db, c("TMSB4X","TMSB4Y","TMSB4XP6","TMSB4XP1","TMSB4XP2"), "PROBEID","ALIAS"))
'select()' returned 1:1 mapping between keys and columns
          PROBEID
ALIAS      7116_at 7120_at 9087_at
  TMSB4X         0       0       0
  TMSB4XP1       0       0       0
  TMSB4XP2       1       0       0
  TMSB4XP6       0       1       0
  TMSB4Y         0       0       1

It appears that the probes that align to the X-linked gene are all complementary to the pseudogenes as well, so it looks like the best you can do is say that the measurements you have will be confounded between the X-linked gene and its pseudogenes.