Question

GEO series matrix empty

1

Entering edit mode

gs.123 ▴ 20

@36e5f9f3

Last seen 2.2 years ago

Canada

How can I load in the data from GSE108497 into R?

I tried using GEOquery and the expression data is empty:

gse108497<- getGEO('GSE108497',GSEMatrix=T)
show(gse108497)

$GSE108497_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 512 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM2901826 GSM2901827 ... GSM2902337 (512 total)
  varLabels: title geo_accession ... tp:ch1 (74 total)
  varMetadata: labelDescription
featureData
  featureNames:
  fvarLabels: ID Species ... GB_ACC (30 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL10558

I've also tried using getGEOSuppFiles(), but could not find a solution. Thank you.

GEOquery GEO • 2.1k views

ADD COMMENT • link 2.2 years ago gs.123 ▴ 20

1

Entering edit mode

The submitters did not actually submit quantification results for the samples as part of the sample records. Therefore, GEOquery cannot load them automatically. You'll have to use getGEOSuppFiles() and then merge the raw data with the annotation data that you got with getGEO. Unfortunately, there is no standard approach for merging the supplied raw data with the metadata from GEOquery; each dataset will vary somewhat.

ADD REPLY • link 2.2 years ago Sean Davis 21k

0

Entering edit mode

Thank you for your reply. I don't think the submitters submitted anything at all.

ADD REPLY • link 2.2 years ago gs.123 ▴ 20

1

Entering edit mode

The supplementary files (if you want to look programmatically, see getGEOSuppFiles) contain what are described as "non-normalized" and "normalized" data that you can read using standard R tab-delimited text file readers. You'll then have to manipulate those data into a form that you can use in R/Bioconductor. But the data are submitted.

ADD REPLY • link 2.2 years ago Sean Davis 21k

1

Entering edit mode

When I read in "GSE108497_normalized_data.txt", I get this:

> GSE[1:3, 1:4]
             X9269325021_A Detection.Pval X9269325021_B Detection.Pval.1
ILMN_1708238       -3.3708         0.6234       -3.1623           0.6364
ILMN_1711886      146.8245         0.0000      194.8972           0.0000
ILMN_1759828        9.6286         0.1208       16.0719           0.0260

However, I can't find the file that will help me correspond the sample ID (i.e., GSMxxxxx to ILMN_xxxxxxx). Do you know where I can find this?

ADD REPLY • link 2.2 years ago gs.123 ▴ 20

0

Entering edit mode

g = getGEO('GSE108497')[[1]]
pData(g)$description

Note that R adds the "X" to the column names in the .txt file, but if you look at the "description", they match up with the exception of the "X". And, yes, you have to just poke around. This approach works for this dataset, but the next one will be different (columns, naming, etc).

ADD REPLY • link 2.2 years ago Sean Davis 21k

0

Entering edit mode

Thank you so much!! I've been working on this for days. Much appreciated.

ADD REPLY • link 2.2 years ago gs.123 ▴ 20

score 1 · Answer 1 · 2022-02-20

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

It seems to me that there is a serious problem with this dataset. The GEO series lists 512 samples but the supplementary data files contain expression values for only 510 beadchips. I think one would have to write to the submitters.

ADD COMMENT • link 2.2 years ago Gordon Smyth 50k