Question: Error while reading ArrayExpress dataset
0
gravatar for Srinivasa R. Rao
3.9 years ago by
United Kingdom
Srinivasa R. Rao0 wrote:

Hi,

I am trying to access a dataset from ArrayExpress with the ID "E-GEOD-33675", but I keep getting an error right after it tries to read pheno data from the sdrf file. Is this because I am doing something wrong (I am new to Bioconductor) or is it an issue with the dataset itself (and if so, how can I fix this)? The error message is "Error in .subset2(x, i, exact = exact) : subscript out of bounds", shown below in context. I seem to be able to fetch a few other datasets that I tried without any problem. Any help/suggestions/comments much appreciated. Thanks for your time.

> AEdata <- ArrayExpress("E-GEOD-33675")
trying URL 'http://www.ebi.ac.uk/arrayexpress/files/A-GEOD-14799/A-GEOD-14799.adf.txt'
Content type 'text/plain' length 24270 bytes (23 KB)
downloaded 23 KB

trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-33675/E-GEOD-33675.sdrf.txt'
Content type 'text/plain' length 20615 bytes (20 KB)
downloaded 20 KB

trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-33675/E-GEOD-33675.idf.txt'
Content type 'text/plain' length 6945 bytes
downloaded 6945 bytes

Copying raw data files

trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-33675/E-GEOD-33675.raw.1.zip'
Content type 'application/zip' length 2769721 bytes (2.6 MB)
downloaded 2.6 MB

Unpacking data files
ArrayExpress: Reading pheno data from SDRF
Error in .subset2(x, i, exact = exact) : subscript out of bounds

--

Srinivasa Rao

arrayexpress • 1.0k views
ADD COMMENTlink modified 3.9 years ago by James W. MacDonald49k • written 3.9 years ago by Srinivasa R. Rao0
Answer: Error while reading ArrayExpress dataset
0
gravatar for James W. MacDonald
3.9 years ago by
United States
James W. MacDonald49k wrote:

The error occurs in the function readPhenoData(), where it doesn't seem to correctly account for the fact that these are two-color arrays when it tries to assign row.names to the phenoData object.

Anyway, this is on GEO as well.

> library(GEOquery)
Setting options('download.file.method.GEOquery'='curl')
> dat <- getGEO("GSE33675")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE33nnn/GSE33675/matrix/
Found 1 file(s)
GSE33675_series_matrix.txt.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 34040  100 34040    0     0  23156      0  0:00:01  0:00:01 --:--:-- 23156
File stored at:
/data3/tmp/Rtmp8fBcuN/GPL14799.soft
> dat[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 905 features, 28 samples
  element names: exprs
protocolData: none
phenoData
  sampleNames: GSM832644 GSM832645 ... GSM832671 (28 total)
  varLabels: title geo_accession ... data_row_count (30 total)
  varMetadata: labelDescription
featureData
  featureNames: a-PUC2MM2d a-PUC2PM ... PUC2PM-20B (905 total)
  fvarLabels: ID miRNA_ID SPOT_ID
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL14799

And if you care to process the data yourself, you can use either getGEOSuppFiles() from GEOquery, or getAE() from ArrayExpress, and then use e.g., limma to read in and process the data by hand.

 

 

 

ADD COMMENTlink written 3.9 years ago by James W. MacDonald49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 177 users visited in the last hour