Question: ArrayExpress unable to find raw data: "Experiment has no raw files available"
0
gravatar for Keith Hughitt
2.9 years ago by
Keith Hughitt120
United States
Keith Hughitt120 wrote:

Hello,

I just tried using the ArrayExpress library for the first time to retrieve some RNA-Seq samples through the EBI ArrayExpress database.

When I attempt to call the ArrayExpress function, however, I run into the following error:

> library(ArrayExpress)                                                                                   
> acc <- 'E-MTAB-3312'                                                                                    
> ArrayExpress(acc)                                                                                       
trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-MTAB-3312/E-MTAB-3312.sdrf.txt'                     
Content type 'text/plain' length 20793 bytes (20 KB)                                                      
==================================================                                                        
downloaded 20 KB                                                                                          
                                                                                                          
trying URL 'http://www.ebi.ac.uk/arrayexpress/files/E-MTAB-3312/E-MTAB-3312.idf.txt'                      
Content type 'text/plain' length 4837 bytes                                                               
==================================================                                                        
downloaded 4837 bytes                                                                                     
                                                                                                          
Unpacking data files                                                                                      
Error in ae2bioc(mageFiles = expFiles, dataCols = dataCols, drop = drop) :                                
  ArrayExpress: Experiment has no raw files available. Consider using processed data instead by following 
procedure in the vignette                                                                                 
NULL     

A little digging revealed that the issue lies lies in the `getAE` function from the ArrayExpression package.

The function retrieves an XML file associated with the experiment (in this case, http://www.ebi.ac.uk/arrayexpress/xml/v2/files/E-MTAB-3312). When it doesn't find "file" elements with a "raw" child "kind", ArrayExpress assumes that there is no raw data available for the experiment.

Looking at the SDRF file associated with the same experiment, however, shows a "Comment[FASTQ_URI]" column with links to FTP-hosted fastq.gz files for the data.

This looks a link to the raw reads associated with the experiment, but since I don't have a lot of experience working with ArrayExpress, I'm not really sure if this is an expected, or if this particular experiment is somehow abnormal.

Any thoughts?

If this is a reasonable place to expect to find the raw data, then perhaps the getAE and related functions should be modified to check the sdrf.txt files for data URI's, even when there are no raw/processs-specific files linked to in the experiment XML file?

Version info:

  • R SVN (Nov 20, 2016)
  • Bioconductor 3.5
  • ArrayExpress 1.34.0

sessionInfo():

> sessionInfo()
R Under development (unstable) (2016-11-20 r71670)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ArrayExpress_1.34.0 Biobase_2.34.0      BiocGenerics_0.20.0 setwidth_1.0-4      colorout_1.1-2     

loaded via a namespace (and not attached):
 [1] affxparser_1.46.0          XVector_0.14.0             splines_3.4.0              GenomicRanges_1.26.1       zlibbioc_1.20.0            IRanges_2.8.1              bit_1.1-12                
 [8] lattice_0.20-34            foreach_1.4.3              GenomeInfoDb_1.10.1        SummarizedExperiment_1.4.0 grid_3.4.0                 ff_2.2-13                  DBI_0.5-1                 
[15] iterators_1.0.8            oligoClasses_1.36.0        preprocessCore_1.36.0      oligo_1.38.0               affyio_1.44.0              Matrix_1.2-7.1             S4Vectors_0.12.0          
[22] codetools_0.2-15           RSQLite_1.0.0              limma_3.30.4               compiler_3.4.0             BiocInstaller_1.24.0       Biostrings_2.42.0          stats4_3.4.0              
[29] XML_3.98-1.5              

 

arrayexpress • 663 views
ADD COMMENTlink modified 2.9 years ago by ugis20 • written 2.9 years ago by Keith Hughitt120
Answer: ArrayExpress unable to find raw data: "Experiment has no raw files available"
2
gravatar for ugis
2.9 years ago by
ugis20
United Kingdom
ugis20 wrote:

Hi Keith,

ArrayExpress package at this point will be useful only for microarray data, so "Experiment has no raw files available" message is correct.

Best,

Ugis

 

ADD COMMENTlink written 2.9 years ago by ugis20

Thanks for the clarification, Ugis. Do you know why that is the case, of if it is stated anywhere? I did not see any note in the package vignette / manual regarding lack of support for RNA-Seq data.

ADD REPLYlink written 2.9 years ago by Keith Hughitt120
Answer: ArrayExpress unable to find raw data: "Experiment has no raw files available"
0
gravatar for ugis
2.9 years ago by
ugis20
United Kingdom
ugis20 wrote:

 

Keith - the package was written about 10 years ago, and the short description is "Access the ArrayExpress Microarray Database at EBI and build Bioconductor data structures: ExpressionSet, AffyBatch, NChannelSet". We haven't had resources to bring this in to the sequencing data world yet.

Best,

Ugis

ADD COMMENTlink written 2.9 years ago by ugis20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 251 users visited in the last hour