in http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html, it used three types of data to plot heatmap
ntd <- normTransform(dds) vsd <- vst(dds, blind=FALSE) rld <- rlog(dds, blind=FALSE)
which data is right? and there is also another function
normalized_counts <- counts(dds, normalized=TRUE)
how to select?
I also found a problem, I used deseq2 to find significant genes, and select the most differenet genes, and used vsd data to plot heatmap, but the plot obsviously not show high contrast color in two groups
can you help me, thanks a lot. by the way, deseq2 can automatically discard low exression genes when does diff analysis, is it right?
thanks a lot, prefilter seems to be not needed, referring the link https://support.bioconductor.org/p/65256/
It is no major issue to pre-filter your raw data for genes of low counts. What Michael Love is implying is that DESeq2 has some inherent 'quality control' measures that will nevertheless deal with these (genes of low counts) when performing the differential expression analysis.
yes, you are totally right, but I am encountered with strange problems. a gene shows 32 foldchange, but the mean(assay(vsd)["NELL1", ][1:13]) is 5.625 mean(assay(vsd)["NELL1", ][14:26]) is 7.73 so why such a high differential gene , the mean expression in tumor and normal group just show such little difference?
thanks a lot
results(), have you additionally performed fold-change shrinkage via
vsdobject, the variance-stabilised data is measured on a scale that is quite different from the normalised or raw counts; so, a direct comparison of the fold-change from
results()to that derived from the values in
vsdis not possible.
I do not know what you mean shrinkage via lfcShrink(), can you show me the code? I just do like following library(DESeq2) dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design= ~ condition) dds <- DESeq(dds) res <- results(dds)
you said can not compare vst and results() directly, But I select the most diff genes and show in heatmap to show the difference, if it can not show much difference, what is the use of heatmap
Regarding shrinkage: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#log-fold-change-shrinkage-for-visualization-and-ranking
Regarding the heatmap, we typically scale the data prior to generating the heatmap. Please take a look at my tutorial here (see 4 (a)): https://github.com/kevinblighe/E-MTAB-6141
vst seems to hava scaled the data. you mean use lfcShrink to select most genes? this will not change the counts, am I right?
the heatmap is really useful, ,can you give me a email, I want to send my data to you, is it ok? thanks a lot
Hi, I cannot receive your data by email and do your work for you - sorry.
The methodology behind lfcShrinkage is contained in the link that I gave earlier. To give an answer, though: it does not alter the expression data - it just alters the fold-change estimates in the results.
thanks a lot. but just the expression give change thge plot a lot
Hey Kevin Blighe , Thanks for the helpful posts!
I noticed in your link to this heatmap analysis, you used scale() and not vst(). However, in your earlier reply (and in some other posts I've seen), you said you should use vsd for heatmaps and clustering analyses.
I've been wondering which is the correct methodology, or if using either scale() or vst() is fine. I tried both on my data and got a nice heatmap with well-defined clusters using scaled_data, where:
normalized_data <- subset(counts(dds,normalized=T), rownames(counts(dds,normalized=T)) %in% significant_gene_names)
scaled_data <- t(scale(t(normalized_data)))
and I got a less nice looking heatmap using vst_sig, where:
vst <- vst(dds, blind=FALSE)
vst <- assay(vst)
vst <- as.data.frame(vst)
vst_sig <- vst[rownames(vst) %in% significant_gene_names,]
Is it poor practice not to use the vst method? Is it okay to just use scale() as you did in your link? Thank you!
Hey Ian, specifically just for the heatmap and/or the clustering, we can additionally scale and center the
scale()function in R, by default, merely [by row] centers your data (mean = 0) and transforms it to Z-scores. This just makes it easier for the human brain to interpret the heatmap colour gradients, whereby 0 is then just the mean expression, whereas, e.g., blue or yellow represent different standard deviations below and above that mean, respectively, with higher absolute number relating to higher intensity.
In your above code, I wouldn't do
t(scale(t(normalized_data))). I would instead run scale on
You can still just use vst_sig on its own, with no scaling anywhere, but it will be more difficult to set colours and breaks.
Some do prefer to also use the counts, i.e.,
normalized_data; however, these are positive integer values (or double values if normalised) that are shifted toward 0 and follow a negative binomial distribution. If you run
t(scale(t(vst))), you'll see how wildly different are the distributions.
In the case where you use
normalized_data, the colour scheme would usually be white for 0, then a gradual increasing gradient toward dark red, dark purple, dark yellow, etc.
Got it, thank you! I will use vsd and potentially scale from there.