Hello -
I have a series of 12 seq datasets with two replicates and would like to perform clustering analysis of the rlog-normalized data. However, after rlog transformation, I still have two replicates for each dataset and therefore have to combine the reps in some way before clustering. I have two related questions:
1) What is the most proper way to combine rlog values for the replicates? Averaging?
2) What is the most proper way to collapse the data to a ~0-1 scale or other normalized scale to cluster it? Currently the magnitude of difference in mean rlog values between genes is dominating the changes in relative rlog value between datasets. My solution thus far has been normalization by the row max, but I would appreciate a second opinion.
Thanks much,
-Stephen
Hello,
I have a follow up question, since I am also trying to cluster normalized counts per biological state (but not per biological replication per biological state).
In regards to:
1) Isn't collapseReplicates in DESeq2 only for technical replicates? Through Deseq2, how can I combine biological replicates and obtain normalized counts for midstream clustering analysis? Is there a function in DESeq2?
2) What do you mean to "subtract the row mean (but do not scale be variance or the like)" is this instead of averaging? could you provide more detail please?
Thank you!