Question: [DESEQ2] How to access the normalized data of a DESeqDataSet
1
gravatar for john
4.6 years ago by
john60
Germany
john60 wrote:

Hello guys,

By  calling DESeq() on a DESeqDataset it estimates the size factors (normalization) automatically. How can I access this data? I must be stored somewhere. I would like to access the normalized counts.

>dds <- DESeq()

estimating size factors
estimating dispersions
gene-wise dispersion estimates

....

I know there is the counts() function but why use this if the calculation is already done?

## S4 method for signature 'DESeqDataSet'
     counts(object, normalized = FALSE)

 

Does anybody have any hints?

cheers,

John

 

deseq2 • 32k views
ADD COMMENTlink modified 4.6 years ago by Steve Lianoglou12k • written 4.6 years ago by john60
Answer: [DESEQ2] How to access the normalized data of a DESeqDataSet
3
gravatar for Michael Love
4.6 years ago by
Michael Love25k
United States
Michael Love25k wrote:

If you have a fresh dds, you can just do:

dds <- estimateSizeFactors(dds)
counts(dds, normalized=TRUE)

This is just dividing each column of 

counts(dds)

by

sizeFactors(dds)

You can pull up the help for all functions with:

help(package="DESeq2",help="html")

And there is a section of the vignette, "Access to all calculated values":

vignette("DESeq2")
ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Michael Love25k

HI Micheal,

Am new to R. I am trying to do DESeq differential expression for my RNA-seq Normalized counts. Can i get any scripts which can understand very easy. Thanks in advance.

ADD REPLYlink written 18 months ago by k.kathirvel930

Please follow the post above, by reading the vignette. After installing DESeq2, you can just type into your R session:

vignette("DESeq2")

You can also follow this workflow:

http://www.bioconductor.org/help/workflows/rnaseqGene/

ADD REPLYlink modified 18 months ago • written 18 months ago by Michael Love25k

Hi Michael Hope things are well with you! When outputting normalized counts from a dds object like this:

dds <- estimateSizeFactors(dds); 
counts(dds, normalized=TRUE)

... will these counts be normalized to gene length, taking into consideration that counts were imported using tximport and the tx2gene parameter (which passes gene length to the dds object)?

I don't think so, but thought it would be better to ask. Hope this makes sense

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by rodrigo.duarte8820
1

They are scaled in such a way that any biases across samples related to isoform switching are removed.

But they do not have the typical "normalization for gene length" applied, in that longer genes will have larger values in the matrix you obtain. E.g. if a gene has length L and another with length 2L, you would also expect the second gene to have normalized counts that were twice as large.

ADD REPLYlink written 8 weeks ago by Michael Love25k

Thanks, Michael! I just wanted to confirm that because I was working with two dds objects (one created based on a tximport object, and the other one imported using a count matrix). When I transform the dds objects using the vst function, it prints a message (for the dds object created using tximport) saying:

using 'avgTxLength' from assays(dds), correcting for library size

... which I thought meant that it was correcting for transcript length too. Thanks for clarifying!

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by rodrigo.duarte8820
Answer: [DESEQ2] How to access the normalized data of a DESeqDataSet
1
gravatar for Steve Lianoglou
4.6 years ago by
Denali
Steve Lianoglou12k wrote:

While Michael has answered your question "in spirit", allow me to provide an answer to your direct question:

  I know there is the counts() function but why use this if the calculation is already done?

Because the "normalized" data isn't actually stored anywhere. The only thing that is stored are the factors one can use to normalize the raw count data if required.

ADD COMMENTlink written 4.6 years ago by Steve Lianoglou12k
1

Yes, Steve's right. I missed this part of the question.

The point of the software and other count-based methods is to model the raw counts, so that estimation steps take into account the variance profile of counts. If you look over the methods, you'll see that almost all steps use K_ij which is the raw count for gene i and sample j. The normalized counts K_ij/s_j are only used to give each gene a single mean value for the dispersion trend regression (equation 5).

ADD REPLYlink written 4.6 years ago by Michael Love25k

Steve, Michael,

thanks that really helps me out and answeres my question. It takes quite some time to fully understand the DESeq2 package.

I have just downloaded the source code of the package. I found almost all functions like plotCounts() (which is in the "R" folder) but I still cannot find the counts() method. Where did you hide it? (-:

ADD REPLYlink written 4.6 years ago by john60
You can find function definitions in source code files in the R directory with grep "counts <-" *.R The counts method is defined in R/methods.R
ADD REPLYlink written 4.6 years ago by Michael Love25k

AH yes- I found this:

counts.DESeqDataSet <- function(object, normalized=FALSE) {

...}

Which is the same I guess.

 

ADD REPLYlink written 4.6 years ago by john60

I agree with you.

ADD REPLYlink written 11 months ago by kam.amine19920
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 360 users visited in the last hour