Question

Problem with function combine regarding merging two data frames based on their common rows

0

Entering edit mode

Konstantinos Yeles ▴ 90

@konstantinos-yeles-8961

Last seen 6 months ago

Italy

Dear All,

i would like to ask a very specific question about merging common rows of multiple data.frames regarding some statistics on the columns. In detail, after some pre-processesing procedures, i have acquired 2 data frames, which in the rows have gene symbols and in the columns various statistics like mean, standard deviation etc. Thus, in order to merge the two dataframes by keeping only the common rows-symbols and the relevant statistics, i tried:

 dim(a2)
[1] 18172     6

head(a2) # the first data frame
      Control.Mean_GSE8993 Control.SD_GSE8993 IR.Mean_GSE8993 IR.SD_GSE8993
CXCL3            11.040152           1.926972       12.853060     1.2865673
CXCL8            11.023752           2.053631       12.215751     1.3443649
CXCL2            12.955409           1.644389       14.533220     1.0109802
AREG             12.570493           1.892597       14.029891     0.6818683
NR4A2             8.366594           1.357853        9.642525     1.3426967
CXCL6            11.827066           1.882204       12.727931     0.9607712
      BY.Mean_GSE8993 BY.SD_GSE8993
CXCL3        8.421211     0.7262789
CXCL8        7.999791     0.7654306
CXCL2       10.828815     0.7256327
AREG        10.439334     0.6265784
NR4A2        7.282875     0.9353340
CXCL6        9.592875     0.5901343

 & the second data frame

dim(d2)
[1] 18173     6

head(d2)
         Control.Mean Control.SD   IR.Mean    IR.SD   BY.Mean    BY.SD
LENG8       10.953919   2.044573 10.850738 2.283272 10.445768 1.946263
FOSB         8.944820   2.113943  9.509101 1.956087  9.099309 2.023522
FOXE1        9.940223   1.966307 10.307348 1.968307  9.783286 1.594923
CACNA1E     11.123550   1.915697 11.471386 1.898161 10.892187 1.528324
CYB561D1    11.285938   2.024681 11.496708 2.184631 10.813287 1.473858
IL6         11.551701   1.631311 12.415631 2.638829 11.385419 1.902916

and then i tried

m <- combine(a2,d2)
dim(m)
[1] 18178    12

But the main issue that concerns me, is that when i tried a venn diagram, the above 2 data frames have in common 18165 gene symbols, while the merged resulted data frame has 18178 rows-symbols. So, what's wrong with this approach ? Should i use another function or approach ?

And finally, could this applied to more than 2 data.frames ??

Any ideas or help are appreciated !!

Konstantinos

bioconductor merge data frames combine biocgenerics • 1.8k views

ADD COMMENT • link 10.2 years ago • updated 2.2 years ago Konstantinos Yeles ▴ 90

0

Entering edit mode

Konstantinos Yeles ▴ 90

@konstantinos-yeles-8961

Last seen 6 months ago

Italy

Dear Martin,
Thank you for your suggestion!! I changed my two data frames, including symbols in the columns, and used :
m <- merge(a2,d2, by = "SYMBOL") #which works fine.
However, because I have more than two data frames, and I checked that merge only works with two data frames, is there an alternative function or way to merge simultaneously for instance 5 data sets?
Thank you for your consideration on this matter !!

Indeed these kind of problems can be solved by using mutate-joins such as using dplyr::inner_join or by binding the lists of dataframes using bind_cols

ADD COMMENT • link 10.2 years ago • updated 2.2 years ago Konstantinos Yeles ▴ 90

0

Entering edit mode

I do not know of a way to merge multiple data sets; do them sequentially.

ADD REPLY • link 10.2 years ago Martin Morgan 25k

score 2 · Accepted Answer · 2015-11-07

2

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 12 months ago

United States

combine() is the union (all rows and columns) of the data frames. Maybe you're looking for merge()?

ADD COMMENT • link 10.2 years ago Martin Morgan 25k