I have a series of 12 seq datasets with two replicates and would like to perform clustering analysis of the rlog-normalized data. However, after rlog transformation, I still have two replicates for each dataset and therefore have to combine the reps in some way before clustering. I have two related questions:
1) What is the most proper way to combine rlog values for the replicates? Averaging?
2) What is the most proper way to collapse the data to a ~0-1 scale or other normalized scale to cluster it? Currently the magnitude of difference in mean rlog values between genes is dominating the changes in relative rlog value between datasets. My solution thus far has been normalization by the row max, but I would appreciate a second opinion.