Computing large correlations in R
0
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
I have two list of lists A and B, A and B contain 100 data frames each and the dimension of each data frame is 15000 X 15000. I would like to find the correlation for the entire data frame in the following way: Consider the first list in both lists and find cor (A,B) and get a single value correlating the entire dataframe. Similarly consider the second list in both lists and find cor(A,B) and continue this for the 100 dataframes. I tried the following: A # list of 100 dataframes B #list of 100 dataframes C<- A[1] # extract only the first list from A D<- B[1] # extract only the first list from B C<-unlist(C) ### unlist C D<-unlist(D) ## unlist D Then computed Correlation<- cor(C,D) ## to obtain a single correlation coefficient to see how these two vectors are correlated But I end up with the error sayin R cannot allocate a vector of size 3.9 GB Is there a better way to do this in faster way which could be implemented to the entire list. I work on a server which allows me to compute large values but it still shows up this error and the unlisting takes ages because of the size of the dataframe. -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.1 -- Sent via the guest posting facility at bioconductor.org.
• 557 views
ADD COMMENT

Login before adding your answer.

Traffic: 481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6