I am currently looking at existing differential gene expression data another lab has uploaded to GEO (GSE48761). I have a few issues I don't know how to deal with:
1. How to salvage gene of interest data when one probe grabs multiple genes?
For example, I'm interested on the expression of a gene called TMSB4X. However, the only probe mapping to TMSB4X is the following:
TMSB4Y///TMSB4XP6///TMSB4XP2///TMSB4XP1///TMSB4X
How can I analyze expression of TMSB4X if the probe is catching all these other genes as well?
2. How to do statistics on gene expression when so many probes are grabbing multiple genes?
I am trying to build a contingency table looking at the differential gene expression of a family of genes within this dataset. However, in order to do this part of my contingency table needs to be 'every gene expressed in Werner's syndrome that isn't differentially expressed in Werner's syndrome and isn't one of my genes of interest' but again many of these genes are mapped to probes that contain multiple genes. How do I deal with this problem?
Happy to clarify anything
To clarify a bit more about the MBNI re-mapped data. They take all the probes on the array, and blast against the current genome, selecting only those probes that map uniquely to a single genomic position. They then collect those probes into a probeset based on where they map. So all of the probes that map to TMSB4X also map to other positions in the genome, which is why there isn't a probeset that interrogates that gene. The only probes that uniquely map to the genome are found in two of the pseudogenes and the Y-linked gene.
This isn't to say that you cannot measure TMSB4X; if the pseudogenes aren't actually expressed, then you should be able to get a clean signal. But that's contingent upon the pseudogenes not being expressed, which may or may not be realistic.