Error while reading ArrayExpress dataset
1
0
Entering edit mode
@srinivasa-r-rao-8182
Last seen 5.1 years ago
United Kingdom

Hi,

I am trying to access a dataset from ArrayExpress with the ID "E-GEOD-33675", but I keep getting an error right after it tries to read pheno data from the sdrf file. Is this because I am doing something wrong (I am new to Bioconductor) or is it an issue with the dataset itself (and if so, how can I fix this)? The error message is "Error in .subset2(x, i, exact = exact) : subscript out of bounds", shown below in context. I seem to be able to fetch a few other datasets that I tried without any problem. Any help/suggestions/comments much appreciated. Thanks for your time.

> AEdata <- ArrayExpress("E-GEOD-33675")
trying URL 'http://www.ebi.ac.uk/arrayexpress/files/A-GEOD-14799/A-GEOD-14799.adf.txt'
Content type 'text/plain' length 24270 bytes (23 KB)
downloaded 23 KB

trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-33675/E-GEOD-33675.sdrf.txt'
Content type 'text/plain' length 20615 bytes (20 KB)
downloaded 20 KB

trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-33675/E-GEOD-33675.idf.txt'
Content type 'text/plain' length 6945 bytes
downloaded 6945 bytes

Copying raw data files

trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-33675/E-GEOD-33675.raw.1.zip'
Content type 'application/zip' length 2769721 bytes (2.6 MB)
downloaded 2.6 MB

Unpacking data files
ArrayExpress: Reading pheno data from SDRF
Error in .subset2(x, i, exact = exact) : subscript out of bounds

--

Srinivasa Rao

arrayexpress • 1.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 45 minutes ago
United States

The error occurs in the function readPhenoData(), where it doesn't seem to correctly account for the fact that these are two-color arrays when it tries to assign row.names to the phenoData object.

Anyway, this is on GEO as well.

> library(GEOquery)
Setting options('download.file.method.GEOquery'='curl')
> dat <- getGEO("GSE33675")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE33nnn/GSE33675/matrix/
Found 1 file(s)
GSE33675_series_matrix.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 34040  100 34040    0     0  23156      0  0:00:01  0:00:01 --:--:-- 23156
File stored at:
/data3/tmp/Rtmp8fBcuN/GPL14799.soft
> dat[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 905 features, 28 samples
  element names: exprs
protocolData: none
phenoData
  sampleNames: GSM832644 GSM832645 ... GSM832671 (28 total)
  varLabels: title geo_accession ... data_row_count (30 total)
  varMetadata: labelDescription
featureData
  featureNames: a-PUC2MM2d a-PUC2PM ... PUC2PM-20B (905 total)
  fvarLabels: ID miRNA_ID SPOT_ID
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL14799

And if you care to process the data yourself, you can use either getGEOSuppFiles() from GEOquery, or getAE() from ArrayExpress, and then use e.g., limma to read in and process the data by hand.

 

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6