get Gene ID from GEO microarray dataset with fData
1
0
Entering edit mode
@guillaume-robert-18902
Last seen 4.1 years ago
France/Nantes/Inovarion

Hi all,

I'm trying to get gene IDs for a GEO microarray affymetrix dataset and I haven't found a solution yet. I've understood that I want to use the fData function to get the annotation, but it gives me an empty table :

gset <- getGEO("GSE75214", GSEMatrix=TRUE, AnnotGPL=TRUE)[[1]]
head(fData(gset))
             ID Gene title Gene symbol Gene ID UniGene title UniGene symbol UniGene ID Nucleotide Title
7892501 7892501                                                                                        
7892502 7892502                                                                                        
7892503 7892503                                                                                        
7892504 7892504                                                                                        
7892505 7892505                                                                                        
7892506 7892506                                                                                        
        GI GenBank Accession Platform_CLONEID Platform_ORF Platform_SPOTID Chromosome location
7892501                                    NA           NA         control                    
7892502                                    NA           NA         control                    
7892503                                    NA           NA         control                    
7892504                                    NA           NA         control                    
7892505                                    NA           NA         control                    
7892506                                    NA           NA         control                    
        Chromosome annotation GO:Function GO:Process GO:Component GO:Function ID GO:Process ID
7892501                                                                                       
7892502                                                                                       
7892503                                                                                       
7892504                                                                                       
7892505                                                                                       
7892506                                                                                       
        GO:Component ID
7892501                
7892502                
7892503                
7892504                
7892505                
7892506

Am I doing something wrong? Would anyone know how I could access the probes annotations for this dataset ?

Thanks

GEO • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

Yes you are doing something wrong - you are assuming that the head of the fData slot should have entries, and aren't testing to see how many of the rows actually do have entries.

The issue being that the first six entries represent some controls that Affy puts on the array, and by definition won't have any Gene IDs, or any other annotation for that matter. And do note that this shouldn't be news to you, given that the platform_SPOTID column says 'control' for all six rows!

> library(GEOquery)

> gset <- getGEO("GSE75214", GSEMatrix=TRUE, AnnotGPL=TRUE)[[1]]
Found 1 file(s)
GSE75214_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE75nnn/GSE75214/matrix/GSE75214_series_matrix.txt.gz'
Content type 'application/x-gzip' length 34902052 bytes (33.3 MB)
downloaded 33.3 MB

Parsed with column specification:
cols(
  .default = col_double()
)
See spec(...) for full column specifications.
|=================================================================| 100%   73 MB
File stored at: 
C:\Users\Public\Documents\Wondershare\CreatorTemp\RtmpWQcbfe/GPL6244.annot.gz

## Missing data
> apply(fData(gset), 2, function(x) sum(x %in% ""))
                   ID            Gene title           Gene symbol 
                    0                 11057                 11057 
              Gene ID         UniGene title        UniGene symbol 
                11057                 32448                 32650 
           UniGene ID      Nucleotide Title                    GI 
                32448                 10612                 10612 
    GenBank Accession      Platform_CLONEID          Platform_ORF 
                10612                     0                     0 
      Platform_SPOTID   Chromosome location Chromosome annotation 
                    0                 11159                 11057 
          GO:Function            GO:Process          GO:Component 
                15371                 15576                 14413 
       GO:Function ID         GO:Process ID       GO:Component ID 
                15371                 15576                 14413 

## Non-missing data
> apply(fData(gset), 2, function(x) sum(!x %in% ""))
                   ID            Gene title           Gene symbol 
                33252                 22195                 22195 
              Gene ID         UniGene title        UniGene symbol 
                22195                   804                   602 
           UniGene ID      Nucleotide Title                    GI 
                  804                 22640                 22640 
    GenBank Accession      Platform_CLONEID          Platform_ORF 
                22640                 33252                 33252 
      Platform_SPOTID   Chromosome location Chromosome annotation 
                33252                 22093                 22195 
          GO:Function            GO:Process          GO:Component 
                17881                 17676                 18839 
       GO:Function ID         GO:Process ID       GO:Component ID 
                17881                 17676                 18839 

ADD COMMENT

Login before adding your answer.

Traffic: 802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6