Good morning to everyone,
I'm trying to generate heatmap to show the results of DE genes found at the end of my analysis (performed with the workflows RNASeqGeneEdgeRQL and rnaseqgene which rely on edgeR quasi-likelihood and DESEQ2 respectively).
In the first case, I'm using the function coolmap, in the second pheatmap, the results look like this:
The heatmap apparently includes ALL transcripts I'm working on, not just the DE (on which I applied a threshold of logFC > 1.5) comparing a single condition.
Maybe I'm missing something.
Is it possible to generate a heatmap including only the restricted list of genes obtained after performing logFC filering? Would it make sense at all or it wouldn't be possible to make comparisons and clusterize genes like in a heatmap? Thanks in advance.
This is not quite an issue with any Bioconductor package and is more a matter of general coding in R. You simply need to subset your input data matrix to have only those genes that you want to include for the purpose of clustering and heatmap generation. This can be done directly or indirectly in many different ways, for example, via subset(), match(), which(), and others. There are undoubtedly many tutorials and code snippets online about this.
Keep in mind that clustering using the entire unfiltered dataset is generally referred to as 'unsupervised clustering'. On the other hand, when you 'supervise' the clustering by selecting only those genes that are, for example, statistically significantly differentially expressed, it is referred to as 'supervised clustering'.
You simply need to subset your input data matrix to have only those
genes that you want to include for the purpose of clustering and
heatmap generation
I guessed that, but what confused me is that I would have expected the pipeline itself to work on DE transcripts as a subset and not apparently them all...
This can be done directly or indirectly in many different ways, for
example, via subset(), match(), which(), and others. There are
undoubtedly many tutorials and code snippets online about this.
Thanks for the suggestion, I'll check them out! Would you suggest a specific tutorial/function in my case?
The RNASeqGeneEdgeRQL workflow shows you how to make the heatmap only of DE genes.
Note that coolmap simply plots a matrix of expression values. If you give it 50 rows/genes of data, then it will plot 50 genes. If you give it data for 300 genes, it will plot 300 genes. You have to choose the genes yourself -- the function has no way of sensing which of your genes are significantly DE. If the plot shows different genes to what you expected, then you have input the wrong data to it, i.e., you've made a programming error.
I guessed that, even though what I can’t understand is…
If my glmTreat function outputs (while using summary.de), by applying a given threshold, let’s say, 70 up-regulated and 30 down-regulated transcripts (and several thousands of unsigned ones), why does the heatmap function coolmap (which takes in input the object ‘tr’ which contains the result of the function glmTreat) shows more than 100 genes? (I asked to show the first 150... in different heatmaps with 50 transcripts each one).
I am find your questions hard to understand for a number of reasons:
The heatmaps you show display only a handful of genes and of them appear to be DE. So it appears you have already done what you say you want to do. I don't understand why you would claim that the heatmaps show all genes.
The purpose of this forum is to help you with code usage but you don't show any code, so it is impossible for us to tell you what you have done wrong, if indeed you have done anything wrong.
The RnaSeqGeneEdgeRQL workflow already seems to answer your question in detail. I'm not sure why you don't simply follow the workflow.
If you want help with the a function (like coolmap) then the first thing to do would be to read the function help page (help("coolmap")). The help page would answer any questions you have about input to coolmap and what it displays.
The heatmaps you show display only a handful of genes and of them appear to be DE. So it appears you have already done what you say you want to do. I don't understand why you would claim that the heatmaps show all genes.
The RnaSeqGeneEdgeRQL workflow already seems to answer your question
in detail. I'm not sure why you don't simply follow the workflow.
Maybe it's just me, but there is something that I don't understand well, even though I've tried to follow the workflow.
The purpose of this forum is to help you with code usage but you don't
show any code, so it is impossible for us to tell you what you have
done wrong, if indeed you have done anything wrong.
As I said, I'll try to be clearer this time!
I'm copying here the exact code I used, adapted from the workflow for my analysis, comparing the two conditions Marzo.Carica and Marzo.Scarica and using for now lfc = 1 as a test. The last line shows the number of DE genes found.
1*Marzo.Carica -1*Marzo.Scarica
Down 161
NotSig 36302
Up 106
So, my question is... I should expect my heatmap to show ONLY these 267 differential genes and then give me a sort of error, if I try to show more genes, shouldn't it? This is what I can't figure out, because this doesn't seem to be the case.
I've tried to use the following code to show 50 genes per heatmap (otherwise the names wouldn't be readable) and show heatmaps showing genes from 1 to 300. Don't mind that I chose norm.factors to appear, it's just because the name of the samples are too long and impossible to tell apart and only a temporary change.
I wouldn't have expected the last heatmap to be shown completely, but 50 genes are there even though only 267 DE ones are found by decideTestsDGE... not 300...
Thanks in advance, maybe there is something obvious I'm missing (maybe the heatmap shows all genes but ordered by Pvalue?)
You should always use a p-value adjustment method.
--------------------------------------
So, my question is... I should expect my heatmap to show ONLY these
267 differential genes and then give me a sort of error, if I try to
show more genes, shouldn't it? This is what I can't figure out,
because this doesn't seem to be the case.
No, you should not expect that. You are seeing 267 there because the default threshold values for decideTestsDGE() are p.value=0.05 and lfc=0. However, when you run this following line of code, you are accessing the entire vector of p-values, i.e., those above and below p=0.05:
o <- order(tr$table$PValue)
If you want a heatmap of genes with nominal p<0.05, then start by doing something like this:
subset(tr$table, PValue < 0.05)
..and then proceed from there.
Make sure that you take time to view the values of variables in your workspace so that you can better understand what is happening.
Kevin, thanks for replying, but the code given by OP actually does plot only 50 genes. The heatmap code is completely independent of the decideTests call.
OP's claims make no sense to me. Note that he has not claimed to see 267 genes (as you assume in your comment) but rather he claims to see 300 or more. The heatmap picture shown in the question shows about 60 genes, so it doesn't match either the code or the claims.
So apparently the heatmaps won't just show the 267 DE genes found by summaryis.de), but as shown in 6.png (ranging from 251 to 300) there are more. That's what I needed clarifications about ^^' Maybe it just works on the whole gene set and orders them by Pvalue?
It is important to get the initial question correct and write it in such a way such that any answer will fully address the points made [in your question]. When threads continue in this lengthy fashion, information becomes 'diluted' and it makes it more difficult for future users to find what they need. Can you take some time to go over all of your data and code, and read our replies here? Thanks / Grazie mile.
Hello, thanks for your answer!
I guessed that, but what confused me is that I would have expected the pipeline itself to work on DE transcripts as a subset and not apparently them all...
Thanks for the suggestion, I'll check them out! Would you suggest a specific tutorial/function in my case?