Question

Extracting custom post-normalization counts from DESeq2

0

Entering edit mode

nickp60 • 0

@nickp60-8205

Last seen 8.8 years ago

United States

Hi,

I am experimenting with different normalization techniques to account for batch effects on an RNAseq dataset. I am using DESeq2 for DE analysis, and RUVseq for normalization. For plotting purposes, I want to be able to extract the post-normalization counts (pseudocounts?) from the DESeqDataSet.

#countsTable is a gene x count matix

samples<-colnames(countsTable)
condition<-c(rep("ctrl",2),rep("A",2),rep("B",2),rep("C",2),rep("D",2), rep("E",2))
pData0 = data.frame(cbind(samples, condition), row.names = samples)#

pData<-pData(set4)  ## from RUV analysis

#   x conditions        W_1
#ctrl_1  ctrl  ctrl -0.2864384
#ctrl_2 ctrl   ctrl  0.2952550
#A_1   A       A -0.1725057
#A_2   A       A  0.1767493
#B_1   B       B -0.3269523
#B_2   B       B  0.3252628
#C_1   C       C -0.4057045
#C_2   C       C  0.3960620
#D_1   D       D -0.2868378
#D_2   D       D  0.2795934
#E_1   E       E -0.1910980
#E_2   E       E  0.1966142

ddsfm <- DESeqDataSetFromMatrix(countData = filtered, colData=pData0, design=~conditions)
ddsfm1 <- DESeqDataSetFromMatrix(countData = filtered, colData=pData, design= ~W_1+conditions)

dds<-DESeq(ddsfm1)
dds0<-DESeq(ddsfm)

head(counts(dds, normalized=T))
head(counts(dds0, normalized=T))

When I do those last calls, the normalized counts look the same for both, even though the log2FoldChanges reflect the normalization. It seems like the counts(dds, normalized=TRUE) call only extracts the counts based on the internal normalization that DESeq2 uses. Is there a way to extract (or compute) post-normalization counts from scaling factors provided in the "design=" section from the creation of the DESeqDataSet?

Thanks in advance!

deseq2 r ruvseq normalization rnaseq • 8.8k views

ADD COMMENT • link 8.8 years ago nickp60 • 0

0

Entering edit mode

nickp60 • 0

@nickp60-8205

Last seen 8.8 years ago

United States

At risk of painting a giant target on my head for posting a pokey for-loop solution, I have a workaround that fits my needs for now. This little snippet utilizes a modified version of the plotCounts function.

make a duplicate function of plotCounts called plotCounts2
replace :

if (is.null(sizeFactors(dds)) & is.null(normalizationFactors(dds))) {
        dds <- estimateSizeFactors(dds)
    }

with

if (is.null(normalizationFactors(dds)) & !is.null(dds$W_1) ) { dds$sizeFactor<-dds$W_1 } else if (is.null(normalizationFactors(dds)) & is.null(dds$W_1) & is.null(sizeFactors(dds))){ dds <- estimateSizeFactors(dds)

This should cause the function to look to the external normalization (the "W_1" column in the colData) and use that instead of the size factor
then use the (comically slow) loop below to extract plot counts for each row

From code above:

ddsfm <- DESeqDataSetFromMatrix(countData = filtered, colData=pData0, design=~conditions)
ddsfm1 <- DESeqDataSetFromMatrix(countData = filtered, colData=pData, design= ~W_1+conditions)

dds<-DESeq(ddsfm1)
dds0<-DESeq(ddsfm)

extracted<-data.frame()    # recipient dataframe
for (x in row.names(dds)){    
  y<-(plotCounts2(dds, gene=x,  intgroup="conditions", returnData=T))
  z<-data.frame(t(data.frame(bbb=y$count, row.names=row.names(y))))
  row.names(z)[grep("^bbb$", row.names(z))]<-paste(x)
  extracted<-rbind(extracted,z)
  extracted
}

Would this be considered a valid work-around, or will this create problems downstream? Would love to hear everyone's thoughts!

ADD COMMENT • link 8.8 years ago nickp60 • 0

1

Entering edit mode

hi Nick,

unless I'm missing something, the two issues I see with this implementation are that you no longer are correcting for library size, and you are assuming the coefficient for W_1 is 1 for all genes (whereas when you provide W_1 to DESeq(), it recognizes that some genes are more or less affected by batch than others - the coefficient for W_1 is fit for each gene).

some code you could use, before I've pushed out the devel version would be:

dds <- estimateSizeFactors(dds)
norm.counts <- counts(dds, normalized=TRUE)
log.norm.counts <- log2(norm.counts + 1)

Then use removeBatchEffect() from the limma package with x=log.norm.counts and batch=W_1.

This is very close to the batch effect correction which occurs implicitly when you add W_1 to the formula in DESeq2 (although here on the log normalized counts scale, while with DESeq2 it occurs on the normalized counts scale).

ADD REPLY • link 8.8 years ago Michael Love 41k

0

Entering edit mode

Hi Michael; thank you, that is helpful. could you explain a bit more about the removeBatchEffect() function call? When I run the function with batch=pData$W_1, the columns of the resulting dataframe are identical. Do I need to adjust the default design argument? And if so, how?

ADD REPLY • link 8.8 years ago nickp60 • 0

0

Entering edit mode

I wouldn't expect them to be identical from the steps I suggested, but it's not really possible for me to figure out what could have happened at this point.

You might just have to wait then for me to implement my planned code in the devel branch.

ADD REPLY • link 8.8 years ago Michael Love 41k

0

Entering edit mode

Fair enough- thanks again for all the help!

ADD REPLY • link 8.8 years ago nickp60 • 0

score 2 · Accepted Answer · 2015-07-15

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 4 hours ago

United States

The way it is included, it's a covariate not a normalization per se, but I see your point. It's the same request/question as this one:

Extracting normalized DESeq2 counts for a multi-factor design?

I have it on my list to provide such normalized+regressed counts for visualization. I'm working on it during this devel cycle.

ADD COMMENT • link 8.8 years ago Michael Love 41k

1

Entering edit mode

Hi Michael,

I would like to follow up on this issue. Is there any progress with this feature. Is it possible to provide a small demo how to visualize the normalized+regressed counts?

Thanks in advance.

Guy

ADD REPLY • link 6.0 years ago guyho ▴ 20

0

Entering edit mode

I haven't worked this in, because I spent all my time on apeglm and shrinkage estimators. Here's some code you can try though:

condition <- dds$condition
w1 <- dds$W_1
# 'idx' is the gene you want to plot
ncts <- counts(dds, normalized=TRUE)[idx,] 
plot(lm(ncts ~ w1)$residual + mean(ncts) ~ condition,
  ylab="normalized, RUV-corrected counts")

ADD REPLY • link 6.0 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thank you very much for the quick response. I learn a lot from your posts.

Guy

ADD REPLY • link 5.9 years ago guyho ▴ 20

0

Entering edit mode

Hi Michael, I think that

lm(ncts ~ w1)$residual + mean(ncts)

ignores the differences that are due to experimental conditions

If I correctly understand, would't the following plot the ruv corrected values?

plot(lm(ncts ~ W1+conditions)$fitted.values ~ condition,
ylab="normalized, RUV-corrected counts")

```

Thanks again,

Guy

ADD REPLY • link 5.9 years ago guyho ▴ 20

0

Entering edit mode

No my code is correct, because it keeps the differences due to condition in the data.

~~Yours removes it which means you will see nothing in the plot.~~

ADD REPLY • link 5.9 years ago Michael Love 41k

0

Entering edit mode

Sorry I just saw the “fitted”. Yours is not correct for different reasons then: it keeps in the technical batch effect and it removes the “biological variation”.

ADD REPLY • link 5.9 years ago Michael Love 41k

0

Entering edit mode

I understand it now, thanks again for the help.

ADD REPLY • link 5.9 years ago guyho ▴ 20

0

Entering edit mode

Hi @Michael,

Thanks for your reply; I look forward to the release of this feature.

(previous content removed due to calculation error)

ADD REPLY • link 8.8 years ago nickp60 • 0