Dear All,

I have a "How would you solve" kind of question. I have two sets of tables : 1. Log2FoldChange table and 2. Effectors Table.

Firstly, the Log2FoldChange table was obtained by performing DESeq analysis of 14 different infected samples being compared to Control and then obtaining the foldchange values from DEseq for each sample and then merging all the 14 different log2foldchnage columns into a single table, based on genes (each row is a unique gene). This table is 22000 * 14. So there are 22414 unique genes for 14 different strains in this table.

Secondly, present/absent-effector list for all 14 strains. So it tells us which effectors are present in each strain (they all have different sets of effectors). This is a 50 * 14 table for the same set of 14 strains, with each unique effector enlisted in a row and indicating either 0 or 1 for absence or presence in the rows.

What we want to investigate is: is there a correlation between the presence/absence of effectors and the gene expression in the host? Essentially , we would like to obtain the correlation between these two separate datasets?

Any ideas/suggestions on how to go about solving this problem would be very helpful and useful. My Initial idea is to carry out a Canonical Correlation Analysis (CCA) and I am still working on it. But I am open to more ideas and suggestions from the community.

Thanks in advance for our time and suggestions.