Harmonizing Gene Annotations for Meta-Analysis of GSE68183 and GSE80178
1
0
Entering edit mode
Dhite • 0
@1d6e7eb8
Last seen 13 days ago
Indonesia

I am currently working on a meta-analysis involving two GEO datasets: GSE68183 and GSE80178. Both datasets include CEL files, and I aim to process them to ensure consistent gene annotations across both studies. However, I have encountered several challenges:

The gene identifiers in the two datasets appear to differ, making it difficult to align them for comparative analysis.

I have attempted to process the CEL files using various R packages, including affy, affyio, oligo, and oligoclasses. Despite these efforts, I have been unable to generate consistent gene annotations.

I am seeking guidance on the following:

  1. What are the recommended approaches to standardize gene identifiers between these two datasets?
  2. Which tools or packages are best suited for processing CEL files from these specific GEO datasets to achieve consistent gene annotations?

Any insights, suggestions, or references to relevant resources would be greatly appreciated.

Best regards,

GEOquery • 193 views
ADD COMMENT
1
Entering edit mode

My recommendation would be to first translate probe identifiers to Ensembl gene IDs, for example with biomaRt, and then take the intersect. From "common universe" genes you can then proceed. The problem is that gene annotations change over time, so maybe a probe that 10 years ago captured geneA today is deprecated and considered an artifact, or annotations have changed. Hence, I would find it important to really only look at genes that are consistently annotated and have a stable Ensembl ID in all platforms, imho.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 days ago
United States

Those are both HuGene-2.0 ST arrays, so there should be no differences. If you are getting the CEL files, it's a simple process to annotate.

library(BiocManager)
install(c("pd.hugene.2.0.st","hugene20sttranscriptcluster.db","affycoretools"))
getGEOSuppFiles("GSE80178")
getGEOSuppFiles("GSE68183")
setwd("GSE68183/")
untar("GSE68183_RAW.tar")
setwd("../GSE80178/")
untar("GSE80178_RAW.tar")
setwd("../")
library(oligo)
gse68 <- rma(read.celfiles(filenames = dir("GSE68183", "CEL", full.names = TRUE)))
gse80 <- rma(read.celfiles(filenames = dir("GSE80178", "CEL", full.names = TRUE)))
library(affycoretools)
gse68 <- annotatEset(gse68, hugene20sttranscriptcluster.db)
gse80 <- annotatEset(gse80, hugene20sttranscriptcluster.db)
> all.equal(fData(gse68), fData(gse80))
[1] TRUE
> head(fData(gse68))
          PROBEID ENTREZID SYMBOL
16650001 16650001     <NA>   <NA>
16650003 16650003     <NA>   <NA>
16650005 16650005     <NA>   <NA>
16650007 16650007     <NA>   <NA>
16650009 16650009     <NA>   <NA>
16650011 16650011     <NA>   <NA>
         GENENAME
16650001     <NA>
16650003     <NA>
16650005     <NA>
16650007     <NA>
16650009     <NA>
16650011     <NA>

And do note that the head of the featureData object shows a bunch of control probes that aren't annotated.

Login before adding your answer.

Traffic: 630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6