What to do with microarray probes that catch multiple genes?
1
0
Entering edit mode
@ravelarvargas-17116
Last seen 2.2 years ago

I am currently looking at existing differential gene expression data another lab has uploaded to GEO (GSE48761). I have a few issues I don't know how to deal with:

1. How to salvage gene of interest data when one probe grabs multiple genes?

For example, I'm interested on the expression of a gene called TMSB4X. However, the only probe mapping to TMSB4X is the following:

TMSB4Y///TMSB4XP6///TMSB4XP2///TMSB4XP1///TMSB4X

How can I analyze expression of TMSB4X if the probe is catching all these other genes as well?

2. How to do statistics on gene expression when so many probes are grabbing multiple genes?

I am trying to build a contingency table looking at the differential gene expression of a family of genes within this dataset. However, in order to do this part of my contingency table needs to be 'every gene expressed in Werner's syndrome that isn't differentially expressed in Werner's syndrome and isn't one of my genes of interest' but again many of these genes are mapped to probes that contain multiple genes. How do I deal with this problem?

Happy to clarify anything

1
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States

There are a combination of probesets that are assumed to interrogate those two genes (and three pseudogenes, which may not actually be expressed):

> table(select(hugene10sttranscriptcluster.db, c("TMSB4X","TMSB4Y","TMSB4XP6","TMSB4XP1","TMSB4XP2"), "PROBEID","ALIAS"))
'select()' returned 1:many mapping between keys and columns
PROBEID
ALIAS      8050089 8067007 8101774 8158240 8166072 8176644
TMSB4X         0       1       1       1       1       0
TMSB4XP1       0       1       1       1       1       0
TMSB4XP2       1       1       1       1       1       0
TMSB4XP6       0       1       1       1       1       0
TMSB4Y         0       1       0       0       1       1

You could assume that anything that measures the X-linked gene, as well as the X-linked pseudogenes is good, and ignore those probesets that interrogate the Y-linked gene. The folks at MBNI take all the probes and generate probesets that are based only on those probes that uniquely map to a given gene. If we look at what they say:

> table(select(hugene10sthsentrezg.db, c("TMSB4X","TMSB4Y","TMSB4XP6","TMSB4XP1","TMSB4XP2"), "PROBEID","ALIAS"))
'select()' returned 1:1 mapping between keys and columns
PROBEID
ALIAS      7116_at 7120_at 9087_at
TMSB4X         0       0       0
TMSB4XP1       0       0       0
TMSB4XP2       1       0       0
TMSB4XP6       0       1       0
TMSB4Y         0       0       1

It appears that the probes that align to the X-linked gene are all complementary to the pseudogenes as well, so it looks like the best you can do is say that the measurements you have will be confounded between the X-linked gene and its pseudogenes.

0
Entering edit mode

To clarify a bit more about the MBNI re-mapped data. They take all the probes on the array, and blast against the current genome, selecting only those probes that map uniquely to a single genomic position. They then collect those probes into a probeset based on where they map. So all of the probes that map to TMSB4X also map to other positions in the genome, which is why there isn't a probeset that interrogates that gene. The only probes that uniquely map to the genome are found in two of the pseudogenes and the Y-linked gene.

This isn't to say that you cannot measure TMSB4X; if the pseudogenes aren't actually expressed, then you should be able to get a clean signal. But that's contingent upon the pseudogenes not being expressed, which may or may not be realistic.