Deseq2 rlog or 'regular' normalization?
4
0
Entering edit mode
@sunilmangalam-8881
Last seen 4.0 years ago
United States

Hi,

 I am facing some difficulties with data transformations of my single cell RNA-Seq data analyzed using DESeq2. This data set is different from typical RNA-Seq experiments.. For example, there is a subset of genes which will be present in one group and totally absent in the other, unlike typical data sets where down regulated genes will still be expressed at lower level. The variation between replicates is also high, and so, we have at least ten replicates for each condition. Even with all these limitations, I am able to get a meaningful result from this analysis.

But the rlog transformation is not optimal for my analysis. I get a warning that more than 10% of the genes have outliers and it suggests doing vst.The vst works for without any warning, but I am still worried if this is optimal or if there is a way to do a better transformation for making heat maps and PCA.

 Also, is there a way to extract the normalized values without any transformation? DESeq used to output this, but this function is not in the DESeq2 vignette.

Thanks for your help!

SunilSukumaran

Research Associate

Monell Chemocal Senses Center

Philadelphia

USA

 

Deseq2 rlog transformation data visualization • 8.1k views
ADD COMMENT
0
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 10 months ago
Zentrum für Molekularbiologie, Universi…

If you just want normalized counts, use

counts( dds, normalized=TRUE )

This hasn't changed since DESeq1. 

If you need advice on using VST or rlog, please first tell us what downstream analysis you want to perform with the transformed data.

ADD COMMENT
0
Entering edit mode

Sorry, I got confused. Is rlg(dds) is logarithm transformation? Then if we just want to normalize the data without any transformation, counts( dds, normalized=TRUE ) would be the code? 

Please help me to be clear, I just need either normalization or transformation

ADD REPLY
0
Entering edit mode

rlog and vst give log2-like transformed counts. This is also covered in the vignette in detail.

ADD REPLY
0
Entering edit mode

Thank you, 

Sorry, how I can explain the the second principal component between two data sets?? I mean I have applied plotPCA by data from rld, I want to know the reason behind the PCA2 that make two data sets separate not overlapped.

ADD REPLY
0
Entering edit mode

You may want to read some internet references on PCA (there are loads). Briefly PC2 is the dimension which captures the second most amount of variance in the samples, while being orthogonal (you can think, roughly, "pointing in a totally different direction") to PC1.

ADD REPLY
0
Entering edit mode

Thanks a lot

I was plotting PCA with un transformed data 

dds=DESeqDataSetFromMatrix(countData = merged,colData = mycols, design        =       ~ condition) 

dds.norm <-  estimateSizeFactors(dds)

plotPCA(dds.norm, intgroup = "condition")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘plotPCA’ for signature ‘"DESeqDataSet"’

How can I plot PCA without transformation please?

I also tried 

mat=log2(counts(dds.norm, normalized=TRUE)+epsilon)

plotPCA(mat, intgroup = "condition")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘plotPCA’ for signature ‘"matrix"’

ADD REPLY
0
Entering edit mode

hi jivara,

We have extensive documentation. Please read that beforehand. There is a vignette:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

And a workflow for beginners:

http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

ADD REPLY
0
Entering edit mode
@sunilmangalam-8881
Last seen 4.0 years ago
United States

Thanks a lot for for your reply, Simon.

I want to do a principal component analysis, calculate Pearson's correlation coefficient between pairwise comparison of all samples, and a hierarchical clustering and heat map where I use only a subset of genes that pass a  differential expression cutoff (typically fold change +/-2, padj 0.05.

Reading through some other posts, I noticed some cases where using  DESeq(dds, betaPrior=FALSE) is advised in cases where differences between the two conditions are very large. Do you advice trying this setting?

Thanks

Sunil

 

ADD COMMENT
0
Entering edit mode

the betaPrior advice is probably not relevant here, that was for a particular other case.

the rlog warning is thrown when the majority (90%) of the row sum occurs in a single sample for many rows (more than 10%) with medium to high expression. so this is very sparse data which does not resemble a negative binomial distribution for many genes.

It's not clear to me what's the best transformation, or how to deal with data which looks like, e.g.

0,1,0,0,0 vs 0,1,0,0,10000

You can imagine lots of scenarios: Is the 10000 the real signal? Are the 0's drop-outs (missing values) or low expression? Performance depends on all the assumptions about signal and noise and on the metrics for evaluation, which is an area of active research in single-cell.

The point of the warning is that, if you want to try a DESeq2 transformation you should use the VST and not the rlog.

ADD REPLY
0
Entering edit mode

Thanks Michael,

The behavior you describe is exactly what I see for many genes (14.2% in my dataset to be exact).I think there will be dropouts in many cases , for example when genes have a different intron/exon structure in the cell type we analyze  than what is in the gtf files we use for analysis.

  I tried to derive Pearson's correlation coefficient with my vst values and it was not very meaningful. But the untransformed normalized values gave more meaningful results. But for a heatmap where I included only genes with padj <0.05, the vst values were better...

Sunil

ADD REPLY

Login before adding your answer.

Traffic: 232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6