I have Control and Treated data sets that I have gone through HTSeq-counts and then I have done differential expression gene analysis through EdgeR, DESeq, and DESeq2.
My pipeline is:
Tophat2 > HTSeq count > EdgeR, DESeq, or DESeq2
Samples: Control: Control_rep1; Control_rep2; Control_rep3 Treated: Treated_rep1; Treated_rep2; Treated_rep3
I have generated heatmaps in R in the past, but when the results show up with the key on the bottom showing each of the triplicates for Control and Treated Groups:
It looks like this: Control_rep1 | Control_rep2 | Control_rep3 | Treated_rep1 | Treated_rep2 | Treated_rep3
I was wondering if there is any method that I should follow to illustrate the combined Controls triplicates and combined Treated triplicates into their two respective columns to just show Control and Treated on the bottom of the heatmap.
I hope to make it look like this in the heatmap:
Control | Treated
I was thinking of the following strategy:
1) I have the log2 fold change values from all three methods.
2) Would converting to Z scores work better for each method.
3) Then visualizing the results separately based on the three methods.
1) Take the mean of each count for each gene per group (Control and Treated).
2) Normalize the data sets
3) Generate the heatmap
I was wondering would this be a better approach to highlight or illustrate differential gene expression.
Also I am trying to remember why is it not advantageous to use Log2 fold change values to generate heat maps with DESeq, DESeq2, and EdgeR results? If anyone could clarify this. Aside from the reason being the genes have not be normalized. I would appreciate it.
Many thanks in advance.