Dear Bioconductor,
I am currently in the process of performing a study on various microarray datasets, some of which are two channel microarrays. For consistency reasons, I would like to download the raw two channel data. As such I resorted to the “ArrayExpress” R package. I have experienced quite some difficulties in downloading the aforementioned data as most of the time I get the following error:
> mtab5095.eset <- ArrayExpress("E-MTAB-5095") trying URL 'https://www.ebi.ac.uk/arrayexpress/files/A-MEXP-2104/A-MEXP-2104.adf.txt' Content type 'text/plain' length 5941699 bytes (5.7 MB) ================================================== downloaded 5.7 MB trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5095/E-MTAB-5095.sdrf.txt' Content type 'text/plain' length 13791 bytes (13 KB) ================================================== downloaded 13 KB trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5095/E-MTAB-5095.idf.txt' Content type 'text/plain' length 5589 bytes ================================================== downloaded 5589 bytes Copying raw data files trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-5095/E-MTAB-5095.raw.1.zip' Content type 'application/zip' length 113238265 bytes (108.0 MB) ================================================== downloaded 108.0 MB Unpacking data files ArrayExpress: Reading pheno data from SDRF Error in `row.names<-.data.frame`(`*tmp*`, value = c("US10020348_252800421889_S03_GE2_107_Sep09_1_1.txt", : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘US10020348_252800419789_S01_GE2_107_Sep09_1_1.txt’, ‘US10020348_252800420180_S01_GE2_107_Sep09_1_4.txt’, ‘US10020348_252800421147_S01_GE2_107_Sep09_1_1.txt’, ‘US10020348_252800421889_S03_GE2_107_Sep09_1_1.txt’, ‘US10020348_252800421889_S03_GE2_107_Sep09_1_3.txt’
Based on the output, it seems that the row names are not unique. Does that mean that the uploader did not do a proper job when uploading the .sdrf file? Interestingly, when I try out the example from the article associated to ArrayExpress (Kauffmann et al. 2009), I get exactly the same error:
> AEset <- ArrayExpress("E-ATMX-18") trying URL 'https://www.ebi.ac.uk/arrayexpress/files/A-ATMX-8/A-ATMX-8.adf.txt' Content type 'text/plain' length 3743536 bytes (3.6 MB) ================================================== downloaded 3.6 MB trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-ATMX-18/E-ATMX-18.sdrf.txt' Content type 'text/plain' length 21142 bytes (20 KB) ================================================== downloaded 20 KB trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-ATMX-18/E-ATMX-18.idf.txt' Content type 'text/plain' length 6889 bytes ================================================== downloaded 6889 bytes Copying raw data files trying URL 'https://www.ebi.ac.uk/arrayexpress/files/E-ATMX-18/E-ATMX-18.raw.1.zip' Content type 'application/zip' length 28842045 bytes (27.5 MB) ================================================== downloaded 27.5 MB Unpacking data files ArrayExpress: Reading pheno data from SDRF Error in `row.names<-.data.frame`(`*tmp*`, value = c("4.txt", "4.txt", : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘10.txt’, ‘11.txt’, ‘12.txt’, ‘2.txt’, ‘3.txt’, ‘4.txt’, ‘5.txt’, ‘6.txt’, ‘7.txt’, ‘8.txt’, ‘9.txt’
Does anybody else have the same issue with two channel microarrays on ArrayExpress and did anybody figure out how to fix this?
> sessionInfo() R version 3.4.2 (2017-09-28) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/libopenblasp-r0.2.18.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] pd.u133.x3p_3.12.0 pd.hg.u133a.2_3.12.0 convert_1.52.0 marray_1.54.0 ArrayExpress_1.36.1 [6] limma_3.32.10 primeviewcdf_2.18.0 pd.huex.1.0.st.v2_3.14.1 hthgu133pluspmcdf_2.18.0 hgu133plus2cdf_2.18.0 [11] hgu133acdf_2.18.0 pd.hugene.1.0.st.v1_3.14.1 DBI_0.7 RSQLite_2.0 affy_1.54.0 [16] oligo_1.40.2 Biostrings_2.44.2 XVector_0.16.0 IRanges_2.10.5 S4Vectors_0.14.7 [21] oligoClasses_1.38.0 BiocInstaller_1.26.1 GEOquery_2.42.0 Biobase_2.36.2 BiocGenerics_0.22.1 [26] rafalib_1.0.0 loaded via a namespace (and not attached): [1] SummarizedExperiment_1.6.5 splines_3.4.2 lattice_0.20-35 blob_1.1.0 XML_3.98-1.9 [6] rlang_0.1.2 bit64_0.9-7 RColorBrewer_1.1-2 affyio_1.46.0 matrixStats_0.52.2 [11] GenomeInfoDbData_0.99.0 foreach_1.4.3 zlibbioc_1.22.0 codetools_0.2-15 memoise_1.1.0 [16] knitr_1.17 ff_2.2-13 GenomeInfoDb_1.12.3 AnnotationDbi_1.38.2 preprocessCore_1.38.1 [21] Rcpp_0.12.13 DelayedArray_0.2.7 affxparser_1.48.0 bit_1.1-12 digest_0.6.12 [26] GenomicRanges_1.28.6 grid_3.4.2 tools_3.4.2 bitops_1.0-6 RCurl_1.95-4.8 [31] tibble_1.3.4 pkgconfig_2.0.1 Matrix_1.2-11 httr_1.3.1 iterators_1.0.8
Hi. I am getting the same error. Did you solve it?