Heatmaps with replicates or triplicate data sets with counts to EdgeR, DESeq, or DESeq2
1
0
Entering edit mode
Zain A Alvi ▴ 10
@zain-a-alvi-6439
Last seen 5.4 years ago

Hi Everyone,

I have Control and Treated data sets that I have gone through HTSeq-counts and then I have done differential expression gene analysis through EdgeR, DESeq, and DESeq2.

My pipeline is:

Tophat2 > HTSeq count > EdgeR, DESeq, or DESeq2

Samples: Control: Control_rep1; Control_rep2; Control_rep3 Treated: Treated_rep1; Treated_rep2; Treated_rep3

I have generated heatmaps in R in the past, but when the results show up with the key on the bottom showing each of the triplicates for Control and Treated Groups:

It looks like this: Control_rep1 | Control_rep2 | Control_rep3 | Treated_rep1 | Treated_rep2 | Treated_rep3

I was wondering if there is any method that I should follow to illustrate the combined Controls triplicates and combined Treated triplicates into their two respective columns to just show Control and Treated on the bottom of the heatmap.

I hope to make it look like this in the heatmap:

Control | Treated

I was thinking of the following strategy:

1) I have the log2 fold change values from all three methods.

2) Would converting to Z scores work better for each method.

3) Then visualizing the results separately based on the three methods.

OR

1) Take the mean of each count for each gene per group (Control and Treated).

2) Normalize the data sets

3) Generate the heatmap

I was wondering would this be a better approach to highlight or illustrate differential gene expression.

Also I am trying to remember why is it not advantageous to use Log2 fold change values to generate heat maps with DESeq, DESeq2, and EdgeR results? If anyone could clarify this. Aside from the reason being the genes have not be normalized. I would appreciate it.

heatmap.2 heatmap edgeR deseq2 deseq • 2.4k views
1
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

I've said this in some other thread, but I don't know if it will be easy to find:

For me the point of the heatmap (although it's not always very enlightening) is to show multiple values per condition (to get a sense of variabilty within condition). If you just want to show one value per condition per gene, it's much better to use location rather than color/brightness, because humans are far superior at judging distances than at judging color/brightness differences [1].

So if you go with one pair of values per gene, this makes sense as a scatterplot, with treated on the y and control on the x. Or better yet, rotate this plot by 45 degrees and show differences between treated and control on the y (for example, a log fold change) and the average of treated and control on the x. Now we have defined an MA plot => plotMA() in DESeq2.

0
Entering edit mode

I am working to highlight something simple in terms of showing the benefits of these tools to generate many things like heatmap graphs. I know in cummeRbund we can show each replicate or we can group them. I wanted to show something similar with the counts data sets that have been analyzed with EdgeR, DESeq and DESeq2.  Basically highlighting we can accomplish the same task with both strategies to analyze RNA-Seq data sets.

I agree, a scatter plot would be nice and/or a bar graph showing the different expression information.  I am including them as well.

0
Entering edit mode

I agree with Mike here. If you can summarize what you want to see into a single value (i.e., the log-fold change, in the case of a DE analysis), then you might as well show that directly. The advantage of using a heatmap is to visualize stuff that can't be easily collapsed into a single value. For example, if you had three or more groups, you could use a heatmap to show how the expression of your genes of interest changes across groups. (This pattern wouldn't be easily summarized into a single value, as you'd need to compute log-fold changes between many pairs of groups.) For similar reasons, a heatmap showing the individual replicates can show whether or not expression is variable within each group, in addition to any DE between groups.

0
Entering edit mode

I agree above with just showing the log 2 fold change. I am trying to illustrate something for educational purposes.

I have another sample data that I could illustrate heatmaps with:

It is Control_rep1 | Control_rep2 |  Treatment1_rep1 | Treatment1_rep2 | Treatment2_rep1 | Treatment2_rep2 | Treatment3_rep1 | Treatment3_rep2 | Treatment4_rep1 | Treatment4_rep2 | Treatment5_rep1 | Treatment5_rep2 | Treatment6_rep1 | Treatment6_rep2 |

If I just want to show a heatmap without replicates for the Control | Treatment 1 | Treatment 2 | Treatment 3 | Treatment 4 | Treatment 5 | Treatment 6 what should I do.

What strategy would be the best to follow then to merge the replicate data sets for each treatment and control. Then have a heatmap that shows p value less than 0.05, 0.01, or 0.1 (this is not the issue).

1) I have the log2 fold change values from all three methods.

2) Would converting to Z scores work better for each method.

3) Then visualizing the results separately based on the three methods.

OR

1) Take the mean of each count for each gene per group (Control and Treated).

2) Normalize the data sets

3) Generate the heatmap

BTW: I am going to show both sets of heatmaps with the replicates data set shown and one without the replicate data sets (merged replicate data sets).  Basically illustrating the usefulness of these tools.

0
Entering edit mode

To summarize a few columns of a matrix you can do something like:

controlAve <- rowMeans(mat[ ,condition == "control" ])

You just need to tailor your code to whatever package you are using. For DESeq2, mat would be counts(dds, normalized=TRUE) and condition would be dds\$condition.

Then you can use cbind() to combine these averages into a matrix which you supply to whatever heatmap function you are using.