Entering edit mode
Hi all,
I just found a problem/discrepancy in running R on PC vs. Unix/Linux
server. Maybe it's widely known, but I didn't know about it and it
caused me big problems. I mostly use my desktop PC for running
microarray analyses, but occasionally I have projects that require
more memory. Then I run some of the memory-intensive steps on our
Linux server, (which has a lot more memory but is REALLY slow), save
the objects, and go back to my PC to finish the analysis. Well, it
turns out that the order of probe set IDs as returned by
featureNames() is slightly different between the computer platforms.
I first thought it might be do to a difference in the chipnamecdf
library Windows binary vs. *nix compilation of the source file, but I
think it's just a difference in the way the computer platforms sort
character data that have numbers. I've put a full, reproducible
example below (our sys admin hasn't upgraded R on the server yet, but
I doubt that's the problem), but in short, my PC puts 177_at before
1773_at, but the server puts 1773_at before 177_at.
I guess this really isn't a "bug" that can be fixed, and I know it's
not a good idea to run part of your R code on one computer and part
on another computer, but don't you agree that this is undesirable
behavior? Maybe I'm not computer-literate enough to have known that
this is a well-known issue, so in part I'm posting this as a warning
to others like me - I don't remember seeing anything like this in the
4+ years I've been following the BioC list. I also wondering in
addition to however many of my analyses that may have been messed up
slightly (ARRRGGHH!!), would this possibly cause problems in things
like public repositories? I know databases don't depend on order, but
I'd be surprised if it hasn't caused problems somewhere else. In this
case, there's only 117 probe sets out of 22,277 that don't match up,
so it would be hard to notice!
Thanks,
Jenny
> library(affy)
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'openVignette()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation(pkgname)'.
> library(ArrayExpress)
>
> rawset = ArrayExpress("E-MEXP-1422")
trying URL 'http://www.ebi.ac.uk/microarray-
as/ae/files/E-MEXP-1422/index.html'
Content type 'text/html;charset=ISO-8859-1' length unknown
opened URL
downloaded 7746 bytes
trying URL
'http://www.ebi.ac.uk/microarray-
as/ae/files/E-MEXP-1422/E-MEXP-1422.raw.1.zip'
Content type 'application/zip' length 11200346 bytes (10.7 Mb)
opened URL
downloaded 10.7 Mb
Read 1 item
trying URL
'http://www.ebi.ac.uk/microarray-
as/ae/files/E-MEXP-1422/E-MEXP-1422.sdrf.txt'
Content type 'text/plain' length 6679 bytes
opened URL
downloaded 6679 bytes
trying URL
'http://www.ebi.ac.uk/microarray-
as/ae/files/A-AFFY-37/A-AFFY-37.adf.txt'
Content type 'text/plain' length 3590863 bytes (3.4 Mb)
opened URL
downloaded 3.4 Mb
trying URL
'http://www.ebi.ac.uk/microarray-
as/ae/files/E-MEXP-1422/E-MEXP-1422.idf.txt'
Content type 'text/plain' length 5378 bytes
opened URL
downloaded 5378 bytes
Read 49 items
The object containing experiment E-MEXP-1422 has been built.
> rawset
AffyBatch object
size of arrays=732x732 features (8499 kb)
cdf=HG-U133A_2 (22277 affyids)
number of samples=6
number of genes=22277
annotation=hgu133a2
notes=E-MEXP-1422
E-MEXP-1422
RNAi
c("cellular_modification_design", "co-expression_design",
"in_vitro_design", "RNAi")
NULL
>
> PSnames.PC <- featureNames(rawset)
>
> all.equal(PSnames.PC, featureNames(rawset))
[1] TRUE
>
> save.image("NameOrderTest.RData")
>
> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] hgu133a2cdf_2.5.0 ArrayExpress_1.6.1 affy_1.24.2
Biobase_2.6.1
loaded via a namespace (and not attached):
[1] affyio_1.14.0 limma_3.2.1 preprocessCore_1.8.0
[4] tools_2.10.1 XML_2.6-0
>
> q()
# now move to Linux server:
> library(affy)
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'openVignette()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>
>
> load("NameOrderTest.RData")
>
>
>
> all.equal(PSnames.PC, featureNames(rawset))
[1] "117 string mismatches"
>
>
> x <- data.frame(PC=PSnames.PC, Linux=featureNames(rawset),
stringsAsFactors=F)
>
> x[ x[,1] != x[,2] , ][ 1:5 , ]
PC Linux
17 177_at 1773_at
18 1773_at 177_at
2328 2028_s_at 202800_at
2329 202800_at 202801_at
2330 202801_at 202802_at
>
>
> all.equal(sort(PSnames.PC), featureNames(rawset))
[1] TRUE
>
>
> PSnames.linux <- featureNames(rawset)
>
> save.image("NameOrderTest.RData")
>
> sessionInfo()
R version 2.9.0 (2009-04-17)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US
.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N
AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI
FICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] hgu133a2cdf_2.4.0 affy_1.22.0 Biobase_2.4.0
loaded via a namespace (and not attached):
[1] affyio_1.8.1 preprocessCore_1.6.0 tools_2.9.0
>
> q()
# now move back to PC:
> library(affy)
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'openVignette()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation(pkgname)'.
> load("NameOrderTest.RData")
>
> all.equal(PSnames.PC, featureNames(rawset))
[1] TRUE
>
> all.equal(PSnames.linux, featureNames(rawset))
[1] "117 string mismatches"
>
> all.equal(sort(PSnames.linux), featureNames(rawset))
[1] TRUE
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu