Search
Question: Downloading Twochannel microarray data from ArrayExpress
0
gravatar for Andy91
27 days ago by
Andy9130
Netherlands
Andy9130 wrote:

Dear Bioconductor,

I am currently in the process of performing a study on various microarray datasets, some of which are two channel microarrays. For consistency reasons, I would like to download the raw two channel data. As such I resorted to the “ArrayExpress” R package. I have experienced quite some difficulties in downloading the aforementioned data as most of the time I get the following error:

> mtab5095.eset <- ArrayExpress("E-MTAB-5095")
trying URL 'https://www.ebi.ac.uk/arrayexpress/files/A-MEXP-2104/A-MEXP-2104.adf.txt'
Content type 'text/plain' length 5941699 bytes (5.7 MB)
==================================================
downloaded 5.7 MB

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5095/E-MTAB-5095.sdrf.txt'
Content type 'text/plain' length 13791 bytes (13 KB)
==================================================
downloaded 13 KB

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5095/E-MTAB-5095.idf.txt'
Content type 'text/plain' length 5589 bytes
==================================================
downloaded 5589 bytes

Copying raw data files

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5095/E-MTAB-5095.raw.1.zip'
Content type 'application/zip' length 113238265 bytes (108.0 MB)
==================================================
downloaded 108.0 MB

Unpacking data files
ArrayExpress: Reading pheno data from SDRF
Error in `row.names<-.data.frame`(`*tmp*`, value = c("US10020348_252800421889_S03_GE2_107_Sep09_1_1.txt",  :
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘US10020348_252800419789_S01_GE2_107_Sep09_1_1.txt’, ‘US10020348_252800420180_S01_GE2_107_Sep09_1_4.txt’, ‘US10020348_252800421147_S01_GE2_107_Sep09_1_1.txt’, ‘US10020348_252800421889_S03_GE2_107_Sep09_1_1.txt’, ‘US10020348_252800421889_S03_GE2_107_Sep09_1_3.txt’

Based on the output, it seems that the row names are not unique. Does that mean that the uploader did not do a proper job when uploading the .sdrf file? Interestingly, when I try out the example from the article associated to ArrayExpress (Kauffmann et al. 2009), I get exactly the same error:

> AEset <- ArrayExpress("E-ATMX-18")
trying URL 'https://www.ebi.ac.uk/arrayexpress/files/A-ATMX-8/A-ATMX-8.adf.txt'
Content type 'text/plain' length 3743536 bytes (3.6 MB)
==================================================
downloaded 3.6 MB

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-ATMX-18/E-ATMX-18.sdrf.txt'
Content type 'text/plain' length 21142 bytes (20 KB)
==================================================
downloaded 20 KB

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-ATMX-18/E-ATMX-18.idf.txt'
Content type 'text/plain' length 6889 bytes
==================================================
downloaded 6889 bytes

Copying raw data files

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-ATMX-18/E-ATMX-18.raw.1.zip'
Content type 'application/zip' length 28842045 bytes (27.5 MB)
==================================================
downloaded 27.5 MB

Unpacking data files
ArrayExpress: Reading pheno data from SDRF
Error in `row.names<-.data.frame`(`*tmp*`, value = c("4.txt", "4.txt",  :
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘10.txt’, ‘11.txt’, ‘12.txt’, ‘2.txt’, ‘3.txt’, ‘4.txt’, ‘5.txt’, ‘6.txt’, ‘7.txt’, ‘8.txt’, ‘9.txt’

Does anybody else have the same issue with two channel microarrays on ArrayExpress and did anybody figure out how to fix this?

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pd.u133.x3p_3.12.0         pd.hg.u133a.2_3.12.0       convert_1.52.0             marray_1.54.0              ArrayExpress_1.36.1       
 [6] limma_3.32.10              primeviewcdf_2.18.0        pd.huex.1.0.st.v2_3.14.1   hthgu133pluspmcdf_2.18.0   hgu133plus2cdf_2.18.0     
[11] hgu133acdf_2.18.0          pd.hugene.1.0.st.v1_3.14.1 DBI_0.7                    RSQLite_2.0                affy_1.54.0               
[16] oligo_1.40.2               Biostrings_2.44.2          XVector_0.16.0             IRanges_2.10.5             S4Vectors_0.14.7          
[21] oligoClasses_1.38.0        BiocInstaller_1.26.1       GEOquery_2.42.0            Biobase_2.36.2             BiocGenerics_0.22.1       
[26] rafalib_1.0.0             

loaded via a namespace (and not attached):
[1] SummarizedExperiment_1.6.5 splines_3.4.2              lattice_0.20-35            blob_1.1.0                 XML_3.98-1.9              
 [6] rlang_0.1.2                bit64_0.9-7                RColorBrewer_1.1-2         affyio_1.46.0              matrixStats_0.52.2        
[11] GenomeInfoDbData_0.99.0    foreach_1.4.3              zlibbioc_1.22.0            codetools_0.2-15           memoise_1.1.0             
[16] knitr_1.17                 ff_2.2-13                  GenomeInfoDb_1.12.3        AnnotationDbi_1.38.2       preprocessCore_1.38.1     
[21] Rcpp_0.12.13               DelayedArray_0.2.7         affxparser_1.48.0          bit_1.1-12                 digest_0.6.12             
[26] GenomicRanges_1.28.6       grid_3.4.2                 tools_3.4.2                bitops_1.0-6               RCurl_1.95-4.8            
[31] tibble_1.3.4               pkgconfig_2.0.1            Matrix_1.2-11              httr_1.3.1                 iterators_1.0.8           

 

ADD COMMENTlink written 27 days ago by Andy9130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 153 users visited in the last hour