Question

[DESEQ2] How to access the normalized data of a DESeqDataSet

5

Entering edit mode

john ▴ 130

@john-7466

Last seen 10.6 years ago

Germany

Hello guys,

By calling DESeq() on a DESeqDataset it estimates the size factors (normalization) automatically. How can I access this data? I must be stored somewhere. I would like to access the normalized counts.

>dds <- DESeq()

estimating size factors
estimating dispersions
gene-wise dispersion estimates

....

I know there is the counts() function but why use this if the calculation is already done?

## S4 method for signature 'DESeqDataSet'
counts(object, normalized = FALSE)

Does anybody have any hints?

cheers,

John

deseq2 • 79k views

ADD COMMENT • link updated 10.9 years ago by Steve Lianoglou ★ 13k • written 10.9 years ago by john ▴ 130

score 5 · Answer 1 · 2015-03-25

5

Entering edit mode

Michael Love 43k

@mikelove

Last seen 9 hours ago

United States

If you have a fresh dds, you can just do:

dds <- estimateSizeFactors(dds)
counts(dds, normalized=TRUE)

This is just dividing each column of

counts(dds)

by

sizeFactors(dds)

You can pull up the help for all functions with:

help(package="DESeq2",help="html")

And there is a section of the vignette, "Access to all calculated values":

vignette("DESeq2")

ADD COMMENT • link 10.9 years ago Michael Love 43k

0

Entering edit mode

HI Micheal,

Am new to R. I am trying to do DESeq differential expression for my RNA-seq Normalized counts. Can i get any scripts which can understand very easy. Thanks in advance.

ADD REPLY • link 7.8 years ago k.kathirvel93 • 0

0

Entering edit mode

Please follow the post above, by reading the vignette. After installing DESeq2, you can just type into your R session:

vignette("DESeq2")

You can also follow this workflow:

http://www.bioconductor.org/help/workflows/rnaseqGene/

ADD REPLY • link 7.8 years ago Michael Love 43k

0

Entering edit mode

Hi Michael Hope things are well with you! When outputting normalized counts from a dds object like this:

dds <- estimateSizeFactors(dds); 
counts(dds, normalized=TRUE)

... will these counts be normalized to gene length, taking into consideration that counts were imported using tximport and the tx2gene parameter (which passes gene length to the dds object)?

I don't think so, but thought it would be better to ask. Hope this makes sense

ADD REPLY • link 6.4 years ago rodrigo.duarte88 ▴ 40

2

Entering edit mode

They are scaled in such a way that any biases across samples related to isoform switching are removed.

But they do not have the typical "normalization for gene length" applied, in that longer genes will have larger values in the matrix you obtain. E.g. if a gene has length L and another with length 2L, you would also expect the second gene to have normalized counts that were twice as large.

ADD REPLY • link 6.4 years ago Michael Love 43k

0

Entering edit mode

Thanks, Michael! I just wanted to confirm that because I was working with two dds objects (one created based on a tximport object, and the other one imported using a count matrix). When I transform the dds objects using the vst function, it prints a message (for the dds object created using tximport) saying:

using 'avgTxLength' from assays(dds), correcting for library size

... which I thought meant that it was correcting for transcript length too. Thanks for clarifying!

ADD REPLY • link 6.4 years ago rodrigo.duarte88 ▴ 40

score 2 · Answer 2 · 2015-03-25

2

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 11 weeks ago

United States

While Michael has answered your question "in spirit", allow me to provide an answer to your direct question:

I know there is the counts() function but why use this if the calculation is already done?

Because the "normalized" data isn't actually stored anywhere. The only thing that is stored are the factors one can use to normalize the raw count data if required.

ADD COMMENT • link 10.9 years ago Steve Lianoglou ★ 13k

1

Entering edit mode

Yes, Steve's right. I missed this part of the question.

The point of the software and other count-based methods is to model the raw counts, so that estimation steps take into account the variance profile of counts. If you look over the methods, you'll see that almost all steps use K_ij which is the raw count for gene i and sample j. The normalized counts K_ij/s_j are only used to give each gene a single mean value for the dispersion trend regression (equation 5).

ADD REPLY • link 10.9 years ago Michael Love 43k

0

Entering edit mode

Steve, Michael,

thanks that really helps me out and answeres my question. It takes quite some time to fully understand the DESeq2 package.

I have just downloaded the source code of the package. I found almost all functions like plotCounts() (which is in the "R" folder) but I still cannot find the counts() method. Where did you hide it? (-:

ADD REPLY • link 10.9 years ago john ▴ 130

0

Entering edit mode

You can find function definitions in source code files in the R directory with grep "counts <-" *.R The counts method is defined in R/methods.R

ADD REPLY • link 10.9 years ago Michael Love 43k

0

Entering edit mode

AH yes- I found this:

counts.DESeqDataSet <- function(object, normalized=FALSE) {

...}

Which is the same I guess.

ADD REPLY • link 10.9 years ago john ▴ 130

0

Entering edit mode

I agree with you.

ADD REPLY • link 7.2 years ago kam.amine1992 • 0