Computing correlation with two different data structures
1
0
Entering edit mode
@blackram223-13563
Last seen 7.7 years ago

The data is part of a tissuesGeneExpression and it looks like this:

GSM92242.CEL.gz GSM92243.CEL.gz GSM92244.CEL.gz GSM92245.CEL.gz
1007_s_at         10.373213       11.395477       10.822040       10.077308
1053_at            6.523120        6.280099        6.377340        6.287809
117_at             7.625365        7.829470        7.461025        7.806562
121_at            10.644904       10.669002       10.332522       10.880915
1255_g_at          5.168378        5.207066        5.152687        4.929143

We are given:

s=svd(e) m = rowMeans(e)

Now I need to find the correlation between u and m, where u is part of the svd and typeof(m) = double

$u
                 [,1]          [,2]          [,3]          [,4]          [,5]
    [1,] -0.009117322  4.556496e-03 -1.273478e-02 -2.513035e-03 -1.781687e-02
    [2,] -0.005432838 -2.245454e-03 -1.797830e-03 -2.290583e-03  2.775834e-03
    [3,] -0.006952054 -3.190183e-03  6.364520e-03 -2.584162e-03 -4.524650e-03
    [4,] -0.009712986 -8.782845

ok so we have:

> s[[3]][,1]
  [1] -0.07324862 -0.07325096 -0.07295323 -0.07331382 -0.07307976 -0.07300219
  [7] -0.07288423 -0.07296209 -0.07322670 -0.07312508 -0.07331067 -0.07287291
 [13] -0.07321938 -0.072962

which should be reasonable, and finally we have m:

76057    9.193640    8.021232    9.102173    8.360764    9.023536 
  201460_at 201461_s_at   201462_at 201463_s_at 201464_x_at 201465_s_at 201466_s_at 
   9.468738    5.424650    9.649986   10.614343   10.839478    7.494861    7.873803 
201467_s_at 201468_s_at 201469_s_at   201470_at 201471_s_at   201472_at 
   7.367181    8.164352    7.058134   10.627766   11.126659    9.561431 
 [ reached getOption("max.print") -- omitted 21215 entries ]

I did some experimenting:

 201447_at   201448_at   201449_at 201450_s_at 201451_x_at   201452_at 
   8.158574    8.385202    8.229309    7.934801    6.288633    6.007999    6.565995 
201453_x_at 201454_s_at 201455_s_at 201456_s_at 201457_x_at 201458_s_at   201459_at 
  10.736726    8.276057    9.193640    8.021232    9.102173    8.360764    9.023536 
  201460_at 201461_s_at   201462_at 201463_s_at 201464_x_at 201465_s_at 201466_s_at 
   9.468738    5.424650    9.649986   10.614343   10.839478    7.494861    7.873803 
201467_s_at 201468_s_at 201469_s_at   201470_at 201471_s_at   201472_at 
   7.367181    8.164352    7.058134   10.627766   11.126659    9.561431 
 [ reached getOption("max.print") -- omitted 21215 entries ]
> m[1]
1007_s_at 
  10.2631 
> m[1,]
Error in m[1, ] : incorrect number of dimensions
> m[1:5]
1007_s_at   1053_at    117_at    121_at 1255_g_at 
10.263097  6.115715  7.827447 10.934732  5.223437 
> m[1]+5
1007_s_at 
  15.2631 
> corr = cor(s[[3]][,1], m)
Error in cor(s[[3]][, 1], m) : incompatible dimensions

length(s[[3]])
[1] 35721
> length(s[[3]][1,])
[1] 189
> length(s[[3]][,1])
[1] 189
> length(n)
Error: object 'n' not found
> length(m)
[1] 22215
> 

So from the code above I can propose there is a problem with the length of the two variables, so maybe I am not taking the right one. Do both of them have to have equal elements and can I work around it?

So how do I calculate the correlation of these two data sets?

r software error • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

This doesn't have anything to do with any Bioconductor packages. If you just have basic questions about how to do things using R, try the R-help list (r-help@r-project.org). Or perhaps just do a google search.

ADD COMMENT

Login before adding your answer.

Traffic: 465 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6