Hi Everyone,
I have Control and Treated data sets that I have gone through HTSeq-counts and then I have done differential expression gene analysis through EdgeR, DESeq, and DESeq2.
My pipeline is:
Tophat2 > HTSeq count > EdgeR, DESeq, or DESeq2
Samples: Control: Control_rep1; Control_rep2; Control_rep3 Treated: Treated_rep1; Treated_rep2; Treated_rep3
I have generated heatmaps in R in the past, but when the results show up with the key on the bottom showing each of the triplicates for Control and Treated Groups:
It looks like this: Control_rep1 | Control_rep2 | Control_rep3 | Treated_rep1 | Treated_rep2 | Treated_rep3
I was wondering if there is any method that I should follow to illustrate the combined Controls triplicates and combined Treated triplicates into their two respective columns to just show Control and Treated on the bottom of the heatmap.
I hope to make it look like this in the heatmap:
Control | Treated
I was thinking of the following strategy:
1) I have the log2 fold change values from all three methods.
2) Would converting to Z scores work better for each method.
3) Then visualizing the results separately based on the three methods.
OR
1) Take the mean of each count for each gene per group (Control and Treated).
2) Normalize the data sets
3) Generate the heatmap
I was wondering would this be a better approach to highlight or illustrate differential gene expression.
Also I am trying to remember why is it not advantageous to use Log2 fold change values to generate heat maps with DESeq, DESeq2, and EdgeR results? If anyone could clarify this. Aside from the reason being the genes have not be normalized. I would appreciate it.
Many thanks in advance.
I am working to highlight something simple in terms of showing the benefits of these tools to generate many things like heatmap graphs. I know in cummeRbund we can show each replicate or we can group them. I wanted to show something similar with the counts data sets that have been analyzed with EdgeR, DESeq and DESeq2. Basically highlighting we can accomplish the same task with both strategies to analyze RNA-Seq data sets.
I agree, a scatter plot would be nice and/or a bar graph showing the different expression information. I am including them as well.
Thank you for the link.
I agree with Mike here. If you can summarize what you want to see into a single value (i.e., the log-fold change, in the case of a DE analysis), then you might as well show that directly. The advantage of using a heatmap is to visualize stuff that can't be easily collapsed into a single value. For example, if you had three or more groups, you could use a heatmap to show how the expression of your genes of interest changes across groups. (This pattern wouldn't be easily summarized into a single value, as you'd need to compute log-fold changes between many pairs of groups.) For similar reasons, a heatmap showing the individual replicates can show whether or not expression is variable within each group, in addition to any DE between groups.
I agree above with just showing the log 2 fold change. I am trying to illustrate something for educational purposes.
I have another sample data that I could illustrate heatmaps with:
It is Control_rep1 | Control_rep2 | Treatment1_rep1 | Treatment1_rep2 | Treatment2_rep1 | Treatment2_rep2 | Treatment3_rep1 | Treatment3_rep2 | Treatment4_rep1 | Treatment4_rep2 | Treatment5_rep1 | Treatment5_rep2 | Treatment6_rep1 | Treatment6_rep2 |
If I just want to show a heatmap without replicates for the Control | Treatment 1 | Treatment 2 | Treatment 3 | Treatment 4 | Treatment 5 | Treatment 6 what should I do.
What strategy would be the best to follow then to merge the replicate data sets for each treatment and control. Then have a heatmap that shows p value less than 0.05, 0.01, or 0.1 (this is not the issue).
1) I have the log2 fold change values from all three methods.
2) Would converting to Z scores work better for each method.
3) Then visualizing the results separately based on the three methods.
OR
1) Take the mean of each count for each gene per group (Control and Treated).
2) Normalize the data sets
3) Generate the heatmap
BTW: I am going to show both sets of heatmaps with the replicates data set shown and one without the replicate data sets (merged replicate data sets). Basically illustrating the usefulness of these tools.
Thank you in advance.
To summarize a few columns of a matrix you can do something like:
You just need to tailor your code to whatever package you are using. For DESeq2, mat would be
counts(dds, normalized=TRUE)
and condition would bedds$condition
.Then you can use cbind() to combine these averages into a matrix which you supply to whatever heatmap function you are using.