transferring AffyBatch object and 64 bit R
1
0
Entering edit mode
Donna Toleno ▴ 90
@donna-toleno-2383
Last seen 9.6 years ago
Hello, I am having some technical difficulties with my Affymetrix gene expression analysis. I recently received an account on a Linux cluster because I want to do some analysis on large data sets. I installed the extra packages I need locally and I compiled them in 64 bit and I tested them to make sure the libraries load in an R session. I am using R compiled as a 64 bit application. I have access to an account on one other cluster as well but I don't have much disk space left to spare in the other account. I also have my personal MacBook and at work I have my Windows computer. I would now like to do my analysis on the new cluster. To start out I want to put my AffyBatch object on the new system. I tried to transfer several different ways. So transferring with command line scp and using non-compressed objects from Linux to Linux got me the best results, but I will describe the problem I still have when the AffyBatch object loads. I am still a bit confused about 32bit vs 64bit systems. Do objects carry with them them information about the operating system? Another side note is that I had to load each library separately, including the dependencies in the proper order. For example: library (puma, lib.loc= 'path/to_my_local/R_libraries') will fail if I don't first do library (ROCR, lib.loc = 'same_path') library (gtools, lib.loc = 'same_path') etc. When I load the data in R on the head node (64bit login) I am able to load and display the AffyBatch and all the packages load properly to display the AffyBatch correctly. Then to do my real work I need to submit a script to the queue. I submit this script to the 64bit processors. The script copies the R object to the temporary directory where I am supposed to be doing my work. At this point I use an R CMD BATCH file to load the AffyBatch object and it does not display the object properly . It loads the object but it does not have the cdf information attached to it. When I display the AffyBatch object it looks like this: AffyBatch object size of arrays=1164x1164 features cdf=HG-U133_Plus_2 (??? affyids) number of samples=55 Error in getCdfInfo(object) : Could not obtain CDF environment, problems encountered: Specified environment does not contain HG-U133_Plus_2 Library - package hgu133plus2cdf not installed Data for package affy did not contain hgu133plus2cdf Bioconductor - could not connect Calls: <anonymous> ... <anonymous> -> cat -> featureNames -> featureNames -> getCdfInfo In addition: Warning message: missing cdf environment! in show(AffyBatch) Execution halted Any ideas or clarifications about what is going on would be helpful. The computer support people don't know much about Bioconductor or R. I would appreciate any advice or even questions to ask the computer support people. Thank you in advance.
cdf affy cdf affy • 1.0k views
ADD COMMENT
0
Entering edit mode
@joern-toedling-1244
Last seen 9.6 years ago
Hi Donna, I don't think this issue is related to whether you are using a 64bit or 32bit system. When you are installing R packages locally, which is a good idea, you need to tell R explicitly that from now on it should look for packages there, too. You can do this when attaching the package using the lib.loc argument as you apparently do. Threre are, however, better solutions. One is at start of your R session to add paths to packages using the function .libPaths (the dot at the beginning is part of the function name, see ?.libPaths or on Linux even better, define an environment variable called R_LIBS in your shell startup script, which is called ".cshrc" or ".bashrc" depending on which shell you use with the command setenv R_LIBS "path/to_your_local/R_libraries" # for csh and tcsh or something like export R_LIBS= "path/to_your_local/R_libraries" # or similar for bash, please check that I am using only tcsh You can also type that in the command line before starting R and from then on R will look in that directory as well for installed packages, which you can check again in your R session typing .libPaths() This is also the reason why you cannot display the AffyBatch, since the cdf environment is needed to display it and since the package "hgu133plus2cdf " is not installed in R's standard package directory but probably elsewhere, it cannot find it. Using the R_LIBS variable (or the .libPaths function) should solve that issue. Best regards, Joern Donna Toleno wrote: > Hello, > > I am having some technical difficulties with my Affymetrix gene expression analysis. I recently received an account on a Linux cluster because I want to do some analysis on large data sets. I installed the extra packages I need locally and I compiled them in 64 bit and I tested them to make sure the libraries load in an R session. I am using R compiled as a 64 bit application. > > I have access to an account on one other cluster as well but I don't have much disk space left to spare in the other account. I also have my personal MacBook and at work I have my Windows computer. I would now like to do my analysis on the new cluster. To start out I want to put my AffyBatch object on the new system. I tried to transfer several different ways. > > So transferring with command line scp and using non-compressed objects from Linux to Linux got me the best results, but I will describe the problem I still have when the AffyBatch object loads. > > I am still a bit confused about 32bit vs 64bit systems. Do objects carry with them them information about the operating system? > > Another side note is that I had to load each library separately, including the dependencies in the proper order. For example: > > library (puma, lib.loc= 'path/to_my_local/R_libraries') > > will fail if I don't first do > > library (ROCR, lib.loc = 'same_path') > library (gtools, lib.loc = 'same_path') > > etc. > > When I load the data in R on the head node (64bit login) I am able to load and display the AffyBatch and all the packages load properly to display the AffyBatch correctly. Then to do my real work I need to submit a script to the queue. I submit this script to the 64bit processors. The script copies the R object to the temporary directory where I am supposed to be doing my work. At this point I use an R CMD BATCH file to load the AffyBatch object and it does not display the object properly . It loads the object but it does not have the cdf information attached to it. When I display the AffyBatch object it looks like this: > > AffyBatch object > size of arrays=1164x1164 features > cdf=HG-U133_Plus_2 (??? affyids) > number of samples=55 > Error in getCdfInfo(object) : > Could not obtain CDF environment, problems encountered: > Specified environment does not contain HG-U133_Plus_2 > Library - package hgu133plus2cdf not installed > Data for package affy did not contain hgu133plus2cdf > Bioconductor - could not connect > Calls: <anonymous> ... <anonymous> -> cat -> featureNames -> featureNames -> getCdfInfo > In addition: Warning message: > missing cdf environment! in show(AffyBatch) > Execution halted > > Any ideas or clarifications about what is going on would be helpful. The computer support people don't know much about Bioconductor or R. I would appreciate any advice or even questions to ask the computer support people. > > Thank you in advance. >
ADD COMMENT

Login before adding your answer.

Traffic: 469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6