Question

Downloading Twochannel microarray data from ArrayExpress

0

Entering edit mode

Andy91 ▴ 60

@andy91-8905

Last seen 3.6 years ago

Netherlands

Dear Bioconductor,

I am currently in the process of performing a study on various microarray datasets, some of which are two channel microarrays. For consistency reasons, I would like to download the raw two channel data. As such I resorted to the “ArrayExpress” R package. I have experienced quite some difficulties in downloading the aforementioned data as most of the time I get the following error:

> mtab5095.eset <- ArrayExpress("E-MTAB-5095")
trying URL 'https://www.ebi.ac.uk/arrayexpress/files/A-MEXP-2104/A-MEXP-2104.adf.txt'
Content type 'text/plain' length 5941699 bytes (5.7 MB)
==================================================
downloaded 5.7 MB

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5095/E-MTAB-5095.sdrf.txt'
Content type 'text/plain' length 13791 bytes (13 KB)
==================================================
downloaded 13 KB

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5095/E-MTAB-5095.idf.txt'
Content type 'text/plain' length 5589 bytes
==================================================
downloaded 5589 bytes

Copying raw data files

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5095/E-MTAB-5095.raw.1.zip'
Content type 'application/zip' length 113238265 bytes (108.0 MB)
==================================================
downloaded 108.0 MB

Unpacking data files
ArrayExpress: Reading pheno data from SDRF
Error in `row.names<-.data.frame`(`*tmp*`, value = c("US10020348_252800421889_S03_GE2_107_Sep09_1_1.txt",  :
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘US10020348_252800419789_S01_GE2_107_Sep09_1_1.txt’, ‘US10020348_252800420180_S01_GE2_107_Sep09_1_4.txt’, ‘US10020348_252800421147_S01_GE2_107_Sep09_1_1.txt’, ‘US10020348_252800421889_S03_GE2_107_Sep09_1_1.txt’, ‘US10020348_252800421889_S03_GE2_107_Sep09_1_3.txt’

Based on the output, it seems that the row names are not unique. Does that mean that the uploader did not do a proper job when uploading the .sdrf file? Interestingly, when I try out the example from the article associated to ArrayExpress (Kauffmann et al. 2009), I get exactly the same error:

> AEset <- ArrayExpress("E-ATMX-18")
trying URL 'https://www.ebi.ac.uk/arrayexpress/files/A-ATMX-8/A-ATMX-8.adf.txt'
Content type 'text/plain' length 3743536 bytes (3.6 MB)
==================================================
downloaded 3.6 MB

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-ATMX-18/E-ATMX-18.sdrf.txt'
Content type 'text/plain' length 21142 bytes (20 KB)
==================================================
downloaded 20 KB

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-ATMX-18/E-ATMX-18.idf.txt'
Content type 'text/plain' length 6889 bytes
==================================================
downloaded 6889 bytes

Copying raw data files

trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-ATMX-18/E-ATMX-18.raw.1.zip'
Content type 'application/zip' length 28842045 bytes (27.5 MB)
==================================================
downloaded 27.5 MB

Unpacking data files
ArrayExpress: Reading pheno data from SDRF
Error in `row.names<-.data.frame`(`*tmp*`, value = c("4.txt", "4.txt",  :
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘10.txt’, ‘11.txt’, ‘12.txt’, ‘2.txt’, ‘3.txt’, ‘4.txt’, ‘5.txt’, ‘6.txt’, ‘7.txt’, ‘8.txt’, ‘9.txt’

Does anybody else have the same issue with two channel microarrays on ArrayExpress and did anybody figure out how to fix this?

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pd.u133.x3p_3.12.0         pd.hg.u133a.2_3.12.0       convert_1.52.0             marray_1.54.0              ArrayExpress_1.36.1       
 [6] limma_3.32.10              primeviewcdf_2.18.0        pd.huex.1.0.st.v2_3.14.1   hthgu133pluspmcdf_2.18.0   hgu133plus2cdf_2.18.0     
[11] hgu133acdf_2.18.0          pd.hugene.1.0.st.v1_3.14.1 DBI_0.7                    RSQLite_2.0                affy_1.54.0               
[16] oligo_1.40.2               Biostrings_2.44.2          XVector_0.16.0             IRanges_2.10.5             S4Vectors_0.14.7          
[21] oligoClasses_1.38.0        BiocInstaller_1.26.1       GEOquery_2.42.0            Biobase_2.36.2             BiocGenerics_0.22.1       
[26] rafalib_1.0.0             

loaded via a namespace (and not attached):
[1] SummarizedExperiment_1.6.5 splines_3.4.2              lattice_0.20-35            blob_1.1.0                 XML_3.98-1.9              
 [6] rlang_0.1.2                bit64_0.9-7                RColorBrewer_1.1-2         affyio_1.46.0              matrixStats_0.52.2        
[11] GenomeInfoDbData_0.99.0    foreach_1.4.3              zlibbioc_1.22.0            codetools_0.2-15           memoise_1.1.0             
[16] knitr_1.17                 ff_2.2-13                  GenomeInfoDb_1.12.3        AnnotationDbi_1.38.2       preprocessCore_1.38.1     
[21] Rcpp_0.12.13               DelayedArray_0.2.7         affxparser_1.48.0          bit_1.1-12                 digest_0.6.12             
[26] GenomicRanges_1.28.6       grid_3.4.2                 tools_3.4.2                bitops_1.0-6               RCurl_1.95-4.8            
[31] tibble_1.3.4               pkgconfig_2.0.1            Matrix_1.2-11              httr_1.3.1                 iterators_1.0.8

arrayexpress twochannel microarray • 1.8k views

ADD COMMENT • link 7.7 years ago Andy91 ▴ 60

0

Entering edit mode

Hi. I am getting the same error. Did you solve it?

ADD REPLY • link 6.3 years ago jorgeklz • 0