Clarification about DESeq2 counts() function when dealing with covariates
2
0
Entering edit mode
rb123 • 0
@cedd98d2
Last seen 10 days ago
United States

I have a conceptual question about the counts() function in DESeq2.

counts() uses a DESeq2 object as an input (usually called as dds). While creating this object, we can add covariates to the design formula. Let's say the formula is ~ genotype + age.

Does counts() generate normalized counts values corrected for covariates, or NOT considering the covariates?

Code example:


dds <- DESeqDataSetFromMatrix(countData = counts_mat, colData = samples, design = ~ genotype + age)
dds <- DESeq(dds, parallel = T)

counts(dds)

DESeq2 • 131 views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

Let's check!

## Make a fake-o
> dds <- makeExampleDESeqDataSet()
## get the colData to use with a second fake-o
> cd <- colData(dds)
## Add fake age action
> cd\$age <- runif(12)*50
> cd
DataFrame with 12 rows and 2 columns
condition       age
<factor> <numeric>
sample1          A  29.46607
sample2          A  14.12943
sample3          A  43.81298
sample4          A  37.69926
sample5          A   4.87461
...            ...       ...
sample8          B   48.8576
sample9          B   19.1212
sample10         B   12.9558
sample11         B   43.8092
sample12         B   38.2382
## fake-o2
> dds2 <- DESeqDataSetFromMatrix(countData = assay(dds), colData = cd, design = ~condition + age)
the design formula contains one or more numeric variables that have mean or
standard deviation larger than 5 (an arbitrary threshold to trigger this message).
Including numeric variables with large mean can induce collinearity with the intercept.
Users should center and scale numeric variables in the design to improve GLM convergence.
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> dds2 <- DESeq(dds2)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

> all.equal(counts(dds), counts(dds2))
 TRUE

> all.equal(counts(dds, normalize = TRUE), counts(dds2, normalize = TRUE))
 TRUE


Survey says?

0
Entering edit mode

Alternatively you can just check the code.

> selectMethod(counts, c(object="DESeqDataSet" ))
Method Definition:

function (object, ...)
{
.local <- function (object, normalized = FALSE, replaced = FALSE)
{
if (!.hasSlot(object, "rowRanges"))
object <- updateObject(object)
if (replaced) {
if ("replaceCounts" %in% assayNames(object)) {
cnts <- assays(object)[["replaceCounts"]]
}
else {
warning("there are no assays named 'replaceCounts', using original.\ncalling DESeq() will replace outliers if they are detected and store this assay.")
cnts <- assays(object)[["counts"]]
}
}
else {
cnts <- assays(object)[["counts"]]
}
if (!normalized) {
return(cnts)
}
else {
if (!is.null(normalizationFactors(object))) {
return(cnts/normalizationFactors(object))
}
else if (is.null(sizeFactors(object)) || any(is.na(sizeFactors(object)))) {
stop("first calculate size factors, add normalizationFactors, or set normalized=FALSE")
}
else {
return(t(t(cnts)/sizeFactors(object)))
}
}
}
.local(object, ...)
}
<bytecode: 0x000001cd934b37e8>
<environment: namespace:DESeq2>

Signatures:
object
target  "DESeqDataSet"


But this gets into some inside baseball. For example, there is a function assays that you would need to know is an accessor function for a SummarizedExperiment object that returns a list, one item of which should be called 'counts' so cnts <- assays(object)[["counts"]] is simply getting the counts data from the DESeqDataSet without doing anything else to the data. So hypothetically you could look at the code and infer what is going on, and maybe learn something about how things work, which is useful.

0
Entering edit mode
swbarnes2 ★ 1.1k
@swbarnes2-14086
Last seen 1 hour ago
San Diego

You should also be able to look up the algorithm DESeq uses, and apply it yourself to your data in R or Excel. Then you can see that it in no way uses the design.