Question: Merging two files based on the identifier column (gene symbols)?
0
gravatar for mohammedtoufiq91
8 weeks ago by
mohammedtoufiq910 wrote:

Hi,

I have two different *.csv files with different column headers except one column, i.e, one with the gene symbols and expression data (samples), and the other with the gene symbols and phenotypic data/attributes, in both the files, one column (gene symbols) is same. I would like to merge both the files based on mapping with the gene symbol column and save all the data in one file for further data analysis. I would like to know how this could be done.

Thank you,

Toufiq

ADD COMMENTlink written 8 weeks ago by mohammedtoufiq910
1

Please do not cross-post. https://www.biostars.org/p/397989/

ADD REPLYlink written 8 weeks ago by ATpoint30

You could read both files in and do a match on the two columns of gene symbols, do a cbind and write.csv - That assumes the gene symbols are unique?. I think there is also a merge function. But I would also highly recommend looking into the Bioconductor SummarizedExperiment class that is designed to store data of this type. Perhaps others have more sophisticated ways of achieve this or know of some existing function ... ?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by shepherl ♦♦ 1.7k

merge() is meant to make these sorts of operations easier; dplyr::left_join() is also very effective

ADD REPLYlink written 8 weeks ago by Martin Morgan ♦♦ 24k

Thank you so much for the suggestions.

ADD REPLYlink written 8 weeks ago by mohammedtoufiq910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 224 users visited in the last hour