Question: Merging two files based on the identifier column (gene symbols)?
gravatar for mohammedtoufiq91
7 days ago by
mohammedtoufiq910 wrote:


I have two different *.csv files with different column headers except one column, i.e, one with the gene symbols and expression data (samples), and the other with the gene symbols and phenotypic data/attributes, in both the files, one column (gene symbols) is same. I would like to merge both the files based on mapping with the gene symbol column and save all the data in one file for further data analysis. I would like to know how this could be done.

Thank you,


ADD COMMENTlink written 7 days ago by mohammedtoufiq910

Please do not cross-post.

ADD REPLYlink written 6 days ago by ATpoint20

You could read both files in and do a match on the two columns of gene symbols, do a cbind and write.csv - That assumes the gene symbols are unique?. I think there is also a merge function. But I would also highly recommend looking into the Bioconductor SummarizedExperiment class that is designed to store data of this type. Perhaps others have more sophisticated ways of achieve this or know of some existing function ... ?

ADD REPLYlink modified 6 days ago • written 6 days ago by shepherl ♦♦ 1.5k

merge() is meant to make these sorts of operations easier; dplyr::left_join() is also very effective

ADD REPLYlink written 6 days ago by Martin Morgan ♦♦ 23k

Thank you so much for the suggestions.

ADD REPLYlink written 6 days ago by mohammedtoufiq910
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 285 users visited in the last hour