Question: getGEO Error: Duplicate identifiers for rows
1
gravatar for umahajan
20 months ago by
umahajan10
umahajan10 wrote:

Hi,

I am trying to get GEO datset GSE71989, but I am getting following error.

gset <- getGEO("GSE71989")
Found 1 file(s)
GSE71989_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE71nnn/GSE71989/matrix/GSE71989_series_matrix.txt.gz'
Content type 'application/x-gzip' length 4240450 bytes (4.0 MB)
==================================================
downloaded 4.0 MB

Error: Duplicate identifiers for rows (75, 83), (76, 84), (77, 85), (78, 86), (79, 87), (80, 88), (81, 89), (82, 90)

Please suggest me the solution.

 

 

 

geoquery getgeo • 475 views
ADD COMMENTlink modified 20 months ago by Sean Davis21k • written 20 months ago by umahajan10

This is a bug due to a "feature" of this particular dataset in GEO (the same metadata key is used more than once per sample). I'll get a fix out in the next day or two. 

ADD REPLYlink written 20 months ago by Sean Davis21k

Should be fixed.

ADD REPLYlink written 20 months ago by Sean Davis21k
Answer: getGEO Error: Duplicate identifiers for rows
1
gravatar for James W. MacDonald
20 months ago by
United States
James W. MacDonald51k wrote:

I don't know why you get the error - there's a bunch of tidyverse blahblah in the code for GEOquery and I'm not cool enough to grok that stuff. Anyway, you don't have to use the GSE matrix data, you can just get the celfiles and process yourself.

> getGEOSuppFiles("GSE71989")
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE71nnn/GSE71989/suppl//GSE71989_RAW.tar?tool=geoquery'
Content type 'application/x-tar' length 122869760 bytes (117.2 MB)
downloaded 117.2 MB

> setwd("GSE71989/")
> untar("GSE71989_RAW.tar")
> library(oligo)

> fls <- dir(".", "CEL.gz")
> dat <- rma(read.celfiles(fls))

Loading required package: pd.hg.u133.plus.2
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : GSM1849335_JJ-1.CEL.gz
Reading in : GSM1849336_JJ-2.CEL.gz
Reading in : GSM1849337_JJ-3.CEL.gz
Reading in : GSM1849338_JJ-4.CEL.gz
Reading in : GSM1849339_JJ-5.CEL.gz
Reading in : GSM1849340_JJ-6.CEL.gz
Reading in : GSM1849341_JJ-7.CEL.gz
Reading in : GSM1849342_JJ-8.CEL.gz
Reading in : GSM1849343_JJ-26.CEL.gz
Reading in : GSM1849344_JJ-27.CEL.gz
Reading in : GSM1849345_JJ-29.CEL.gz
Reading in : GSM1849346_JJ-31.CEL.gz
Reading in : GSM1849347_JJ-32.CEL.gz
Reading in : GSM1849348_JJ-34.CEL.gz
Reading in : GSM1849349_JJ-39.CEL.gz
Reading in : GSM1849350_JJ-43.CEL.gz
Reading in : GSM1849351_JJ-44.CEL.gz
Reading in : GSM1849352_JJ-45.CEL.gz
Reading in : GSM1849353_JJ-46.CEL.gz
Reading in : GSM1849354_JJ-47.CEL.gz
Reading in : GSM1849355_JJ-49.CEL.gz
Reading in : GSM1849356_JJ-50.CEL.gz
Background correcting
Normalizing
Calculating Expression

> dat
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 22 samples
  element names: exprs
protocolData
  rowNames: GSM1849335_JJ-1.CEL.gz GSM1849336_JJ-2.CEL.gz ...
    GSM1849356_JJ-50.CEL.gz (22 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM1849335_JJ-1.CEL.gz GSM1849336_JJ-2.CEL.gz ...
    GSM1849356_JJ-50.CEL.gz (22 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hg.u133.plus.2
ADD COMMENTlink written 20 months ago by James W. MacDonald51k
Answer: getGEO Error: Duplicate identifiers for rows
1
gravatar for Sean Davis
20 months ago by
Sean Davis21k
United States
Sean Davis21k wrote:

This was a bug. It should be fixed in the next version in the Bioc-Devel (2.47.18) and in Bioc-Release (2.46.15). You can install from the GEOquery github repository if you need a quicker solution. 

ADD COMMENTlink written 20 months ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 319 users visited in the last hour