Entering edit mode
Aleš Maver
▴
80
@ales-maver-3556
Last seen 10.2 years ago
Dear Joao,
you have to supply the with two data.frame objects (you supplied it
instead
with a data.frame and a vector) and if there are matching column names
in
these two data frames, the merge function will match the values in two
data
frames by itself.
So, a simple solution to your problem would be to use:
merge(data1, data2, all.x=T) #I've added all.x, so that NA's will be
produced when there is no matching value in the data2 objects
Hope this is of use,
Ales
2011/10/5 João Daniel Nunes Duarte <jdanielnd@gmail.com>
> Hello,
>
> I am having some problems to use the 'merge' function. I'm not sure
if I
> got
> its working right.
>
> What I want to do is:
>
> 1) Suppose I have a dataframe like:
>
> height width
> 1 1.1 2.3
> 2 2.1 2.5
> 3 1.8 1.9
> 4 1.6 2.1
> 5 1.8 2.4
>
> 2) And I generate a second dataframe sampled from this one, like:
>
> height width
> 1 1.1 2.3
> 3 1.8 1.9
> 5 1.8 2.4
>
> 3) Next, I add a new variable from this dataframe:
>
> height width color
> 1 1.1 2.3 red
> 3 1.8 1.9 red
> 5 1.8 2.4 blue
>
> 4) So, I want to merge those dataframes, so that the new variable,
color,
> is
> binded to the first dataframe. Of course some cases won't have value
for
> it,
> since I generated this variable in a smaller dataframe. In those
cases I
> want the value to be NA. The result dataframe should be:
>
> height width color
> 1 1.1 2.3 red
> 2 2.1 2.5 NA
> 3 1.8 1.9 red
> 4 1.6 2.1 NA
> 5 1.8 2.4 blue
>
> I have written some codes, but they're not working properly. The new
> variable has its values mixed up, and they do not correspond to its
> row.names.
>
> # Generate the first dataframe
> data1 <- data.frame(height=rnorm(20,3,0.2),width=rnorm(20,2,0.5))
> # Sample a smaller dataframe from data1
> data2 <- data1[sample(1:20,15,replace=F),]
> # Generate the new variable
> color <- sample(c("red","blue"),15,replace=T)
> # Bind the new variable to data2
> data2 <- cbind(data2, color)
> # Merge the data1 and data2$color by row.names, and force it to has
the
> same
> values that data1. Next it generates a new dataframe where column 1
is the
> row.name, and then sort it by the row.name from data1.
> data.frame(merge(data1,data2$color, by=0,
> all.x=T),row.names=1)[row.names(data1),]
>
> I'm not sure what am I doing wrong. Can anyone see where the mistake
is?
>
> Thank you!
>
> Cheers,
>
> Joao D.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Ales Maver, MD
Institute of Medical Genetics, Department of Obstetrics and
Gynaecology
UMC Ljubljana
Å lajmerjeva 3
SI-1000 Ljubljana
Slovenia
[[alternative HTML version deleted]]