Question

Heatmaps with replicates or triplicate data sets with counts to EdgeR, DESeq, or DESeq2

0

Entering edit mode

Zain A Alvi ▴ 10

@zain-a-alvi-6439

Last seen 8.6 years ago

Hi Everyone,

I have Control and Treated data sets that I have gone through HTSeq-counts and then I have done differential expression gene analysis through EdgeR, DESeq, and DESeq2.

My pipeline is:

Tophat2 > HTSeq count > EdgeR, DESeq, or DESeq2

Samples: Control: Control_rep1; Control_rep2; Control_rep3 Treated: Treated_rep1; Treated_rep2; Treated_rep3

I have generated heatmaps in R in the past, but when the results show up with the key on the bottom showing each of the triplicates for Control and Treated Groups:

I was wondering if there is any method that I should follow to illustrate the combined Controls triplicates and combined Treated triplicates into their two respective columns to just show Control and Treated on the bottom of the heatmap.

I hope to make it look like this in the heatmap:

Control | Treated

I was thinking of the following strategy:

1) I have the log2 fold change values from all three methods.

2) Would converting to Z scores work better for each method.

3) Then visualizing the results separately based on the three methods.

OR

1) Take the mean of each count for each gene per group (Control and Treated).

2) Normalize the data sets

3) Generate the heatmap

I was wondering would this be a better approach to highlight or illustrate differential gene expression.

Also I am trying to remember why is it not advantageous to use Log2 fold change values to generate heat maps with DESeq, DESeq2, and EdgeR results? If anyone could clarify this. Aside from the reason being the genes have not be normalized. I would appreciate it.

Many thanks in advance.

heatmap.2 heatmap edgeR deseq2 deseq • 4.9k views

ADD COMMENT • link updated 8.6 years ago by Michael Love 43k • written 8.6 years ago by Zain A Alvi ▴ 10

score 1 · Answer 1 · 2016-05-04

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

I've said this in some other thread, but I don't know if it will be easy to find:

For me the point of the heatmap (although it's not always very enlightening) is to show multiple values per condition (to get a sense of variabilty within condition). If you just want to show one value per condition per gene, it's much better to use location rather than color/brightness, because humans are far superior at judging distances than at judging color/brightness differences [1].

So if you go with one pair of values per gene, this makes sense as a scatterplot, with treated on the y and control on the x. Or better yet, rotate this plot by 45 degrees and show differences between treated and control on the y (for example, a log fold change) and the average of treated and control on the x. Now we have defined an MA plot => plotMA() in DESeq2.

[1] http://priceonomics.com/how-william-cleveland-turned-data-visualization/

ADD COMMENT • link 8.6 years ago Michael Love 43k

0

Entering edit mode

I am working to highlight something simple in terms of showing the benefits of these tools to generate many things like heatmap graphs. I know in cummeRbund we can show each replicate or we can group them. I wanted to show something similar with the counts data sets that have been analyzed with EdgeR, DESeq and DESeq2. Basically highlighting we can accomplish the same task with both strategies to analyze RNA-Seq data sets.

I agree, a scatter plot would be nice and/or a bar graph showing the different expression information. I am including them as well.

Thank you for the link.

ADD REPLY • link 8.6 years ago Zain A Alvi ▴ 10

0

Entering edit mode

I agree with Mike here. If you can summarize what you want to see into a single value (i.e., the log-fold change, in the case of a DE analysis), then you might as well show that directly. The advantage of using a heatmap is to visualize stuff that can't be easily collapsed into a single value. For example, if you had three or more groups, you could use a heatmap to show how the expression of your genes of interest changes across groups. (This pattern wouldn't be easily summarized into a single value, as you'd need to compute log-fold changes between many pairs of groups.) For similar reasons, a heatmap showing the individual replicates can show whether or not expression is variable within each group, in addition to any DE between groups.

ADD REPLY • link 8.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

I agree above with just showing the log 2 fold change. I am trying to illustrate something for educational purposes.

I have another sample data that I could illustrate heatmaps with:

What strategy would be the best to follow then to merge the replicate data sets for each treatment and control. Then have a heatmap that shows p value less than 0.05, 0.01, or 0.1 (this is not the issue).

1) I have the log2 fold change values from all three methods.

2) Would converting to Z scores work better for each method.

3) Then visualizing the results separately based on the three methods.

OR

1) Take the mean of each count for each gene per group (Control and Treated).

2) Normalize the data sets

3) Generate the heatmap

BTW: I am going to show both sets of heatmaps with the replicates data set shown and one without the replicate data sets (merged replicate data sets). Basically illustrating the usefulness of these tools.

Thank you in advance.

ADD REPLY • link 8.6 years ago Zain A Alvi ▴ 10

0

Entering edit mode

To summarize a few columns of a matrix you can do something like:

controlAve <- rowMeans(mat[ ,condition == "control" ])

You just need to tailor your code to whatever package you are using. For DESeq2, mat would be counts(dds, normalized=TRUE) and condition would be dds$condition.

Then you can use cbind() to combine these averages into a matrix which you supply to whatever heatmap function you are using.

ADD REPLY • link 8.6 years ago Michael Love 43k