I am currently working on a meta-analysis involving two GEO datasets: GSE68183 and GSE80178. Both datasets include CEL files, and I aim to process them to ensure consistent gene annotations across both studies. However, I have encountered several challenges:
The gene identifiers in the two datasets appear to differ, making it difficult to align them for comparative analysis.
I have attempted to process the CEL files using various R packages, including affy, affyio, oligo, and oligoclasses. Despite these efforts, I have been unable to generate consistent gene annotations.
I am seeking guidance on the following:
- What are the recommended approaches to standardize gene identifiers between these two datasets?
- Which tools or packages are best suited for processing CEL files from these specific GEO datasets to achieve consistent gene annotations?
Any insights, suggestions, or references to relevant resources would be greatly appreciated.
Best regards,
My recommendation would be to first translate probe identifiers to Ensembl gene IDs, for example with biomaRt, and then take the intersect. From "common universe" genes you can then proceed. The problem is that gene annotations change over time, so maybe a probe that 10 years ago captured geneA today is deprecated and considered an artifact, or annotations have changed. Hence, I would find it important to really only look at genes that are consistently annotated and have a stable Ensembl ID in all platforms, imho.