Question: Deseq2 rlog or 'regular' normalization?
0
gravatar for sunil.mangalam
3.7 years ago by
United States
sunil.mangalam0 wrote:

Hi,

 I am facing some difficulties with data transformations of my single cell RNA-Seq data analyzed using DESeq2. This data set is different from typical RNA-Seq experiments.. For example, there is a subset of genes which will be present in one group and totally absent in the other, unlike typical data sets where down regulated genes will still be expressed at lower level. The variation between replicates is also high, and so, we have at least ten replicates for each condition. Even with all these limitations, I am able to get a meaningful result from this analysis.

But the rlog transformation is not optimal for my analysis. I get a warning that more than 10% of the genes have outliers and it suggests doing vst.The vst works for without any warning, but I am still worried if this is optimal or if there is a way to do a better transformation for making heat maps and PCA.

 Also, is there a way to extract the normalized values without any transformation? DESeq used to output this, but this function is not in the DESeq2 vignette.

Thanks for your help!

SunilSukumaran

Research Associate

Monell Chemocal Senses Center

Philadelphia

USA

 

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by sunil.mangalam0
Answer: Deseq2 rlog or 'regular' normalization?
0
gravatar for Simon Anders
3.7 years ago by
Simon Anders3.5k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.5k wrote:

If you just want normalized counts, use

counts( dds, normalized=TRUE )

This hasn't changed since DESeq1. 

If you need advice on using VST or rlog, please first tell us what downstream analysis you want to perform with the transformed data.

ADD COMMENTlink written 3.7 years ago by Simon Anders3.5k

Sorry, I got confused. Is rlg(dds) is logarithm transformation? Then if we just want to normalize the data without any transformation, counts( dds, normalized=TRUE ) would be the code? 

Please help me to be clear, I just need either normalization or transformation

ADD REPLYlink written 12 months ago by jivarajivaraj10

rlog and vst give log2-like transformed counts. This is also covered in the vignette in detail.

ADD REPLYlink modified 12 months ago • written 12 months ago by Michael Love23k

Thank you, 

Sorry, how I can explain the the second principal component between two data sets?? I mean I have applied plotPCA by data from rld, I want to know the reason behind the PCA2 that make two data sets separate not overlapped.

ADD REPLYlink written 12 months ago by jivarajivaraj10

You may want to read some internet references on PCA (there are loads). Briefly PC2 is the dimension which captures the second most amount of variance in the samples, while being orthogonal (you can think, roughly, "pointing in a totally different direction") to PC1.

ADD REPLYlink written 12 months ago by Michael Love23k

Thanks a lot

I was plotting PCA with un transformed data 

dds=DESeqDataSetFromMatrix(countData = merged,colData = mycols, design        =       ~ condition) 

dds.norm <-  estimateSizeFactors(dds)

plotPCA(dds.norm, intgroup = "condition")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘plotPCA’ for signature ‘"DESeqDataSet"’

How can I plot PCA without transformation please?

I also tried 

mat=log2(counts(dds.norm, normalized=TRUE)+epsilon)

plotPCA(mat, intgroup = "condition")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘plotPCA’ for signature ‘"matrix"’

ADD REPLYlink modified 12 months ago • written 12 months ago by jivarajivaraj10

hi jivara,

We have extensive documentation. Please read that beforehand. There is a vignette:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

And a workflow for beginners:

http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

ADD REPLYlink modified 12 months ago • written 12 months ago by Michael Love23k
Answer: Deseq2 rlog or 'regular' normalization?
0
gravatar for sunil.mangalam
3.7 years ago by
United States
sunil.mangalam0 wrote:

Thanks a lot for for your reply, Simon.

I want to do a principal component analysis, calculate Pearson's correlation coefficient between pairwise comparison of all samples, and a hierarchical clustering and heat map where I use only a subset of genes that pass a  differential expression cutoff (typically fold change +/-2, padj 0.05.

Reading through some other posts, I noticed some cases where using  DESeq(dds, betaPrior=FALSE) is advised in cases where differences between the two conditions are very large. Do you advice trying this setting?

Thanks

Sunil

 

ADD COMMENTlink written 3.7 years ago by sunil.mangalam0

the betaPrior advice is probably not relevant here, that was for a particular other case.

the rlog warning is thrown when the majority (90%) of the row sum occurs in a single sample for many rows (more than 10%) with medium to high expression. so this is very sparse data which does not resemble a negative binomial distribution for many genes.

It's not clear to me what's the best transformation, or how to deal with data which looks like, e.g.

0,1,0,0,0 vs 0,1,0,0,10000

You can imagine lots of scenarios: Is the 10000 the real signal? Are the 0's drop-outs (missing values) or low expression? Performance depends on all the assumptions about signal and noise and on the metrics for evaluation, which is an area of active research in single-cell.

The point of the warning is that, if you want to try a DESeq2 transformation you should use the VST and not the rlog.

ADD REPLYlink written 3.7 years ago by Michael Love23k

Thanks Michael,

The behavior you describe is exactly what I see for many genes (14.2% in my dataset to be exact).I think there will be dropouts in many cases , for example when genes have a different intron/exon structure in the cell type we analyze  than what is in the gtf files we use for analysis.

  I tried to derive Pearson's correlation coefficient with my vst values and it was not very meaningful. But the untransformed normalized values gave more meaningful results. But for a heatmap where I included only genes with padj <0.05, the vst values were better...

Sunil

ADD REPLYlink written 3.7 years ago by sunil.mangalam0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 134 users visited in the last hour