Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 9.6 years ago
I have two list of lists A and B, A and B contain 100 data frames each
and the dimension of each data frame is 15000 X 15000. I would like to
find the correlation for the entire data frame in the following way:
Consider the first list in both lists and find cor (A,B) and get a
single value correlating the entire dataframe. Similarly consider the
second list in both lists and find cor(A,B) and continue this for the
100 dataframes.
I tried the following:
A # list of 100 dataframes
B #list of 100 dataframes
C<- A[1] # extract only the first list from A
D<- B[1] # extract only the first list from B
C<-unlist(C) ### unlist C
D<-unlist(D) ## unlist D
Then computed
Correlation<- cor(C,D) ## to obtain a single correlation
coefficient to see how these two vectors are correlated
But I end up with the error sayin
R cannot allocate a vector of size 3.9 GB
Is there a better way to do this in faster way which could be
implemented to the entire list. I work on a server which allows me to
compute large values but it still shows up this error and the
unlisting takes ages because of the size of the dataframe.
-- output of sessionInfo():
R version 3.0.1 (2013-05-16)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C
LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.0.1
--
Sent via the guest posting facility at bioconductor.org.