Hello,

I'm new to R and I have a little problem with data analysis. I was asked to create a correlation plot for 2 genes expressed in melanoma cells. I downloaded data from GEO (for 14 data sets, 2 types of similar microaarays), made Expression Sets, normalized with RMA, substracted data for 2 genes and compiled it into one matrix. To every sample I assigned two traits - cell type (normal, primary, metastasis and so on) and number of data set it was substracted from. Then I created simple plot to observe how my data looks like (without multiple sets normalization Spearman's correlation coefficient is above 0,5 with really low p-value). Now I would like to remove any differences between data sets - if I understood it correctly I should remove batch effect with e.g. ComBat. And here's my question - should I assume one batch equals one data set (or one data set contains more batches (differences in data collection dates and so on))? Is ComBat or SVA the best method for this particular case? And should I perform this normalization on whole data matrices (how?) and extract data for my 2 genes of interest?

I'm sorry if my post is a little chaotic but I'm still learning how to use R. I will be really greatful for your advice.

Thank you very match for this great suggestion! I will try to apply it to my data.

Best,

Ewelina