Merging two files based on the identifier column (gene symbols)?

0

Entering edit mode

mohammedtoufiq91 ▴ 10

@mohammedtoufiq91-17679

Last seen 5 weeks ago

United States

Hi,

I have two different *.csv files with different column headers except one column, i.e, one with the gene symbols and expression data (samples), and the other with the gene symbols and phenotypic data/attributes, in both the files, one column (gene symbols) is same. I would like to merge both the files based on mapping with the gene symbol column and save all the data in one file for further data analysis. I would like to know how this could be done.

Thank you,

Toufiq

merge files R packages gene annotations • 1.5k views

ADD COMMENT • link 4.6 years ago mohammedtoufiq91 ▴ 10

1

Entering edit mode

Please do not cross-post. https://www.biostars.org/p/397989/

ADD REPLY • link 4.6 years ago ATpoint ★ 4.0k

0

Entering edit mode

You could read both files in and do a match on the two columns of gene symbols, do a cbind and write.csv - That assumes the gene symbols are unique?. I think there is also a merge function. But I would also highly recommend looking into the Bioconductor SummarizedExperiment class that is designed to store data of this type. Perhaps others have more sophisticated ways of achieve this or know of some existing function ... ?

ADD REPLY • link 4.6 years ago shepherl 3.8k

0

Entering edit mode

merge() is meant to make these sorts of operations easier; dplyr::left_join() is also very effective