Question: Extracting custom post-normalization counts from DESeq2
0
4.3 years ago by
nickp600
United States
nickp600 wrote:

Hi,

I am experimenting with different normalization techniques to account for batch effects on an RNAseq dataset.  I am using DESeq2 for DE analysis, and RUVseq for normalization.  For plotting purposes, I want to be able to extract the post-normalization counts (pseudocounts?) from the DESeqDataSet.

#countsTable is a gene x count matix

samples<-colnames(countsTable)
condition<-c(rep("ctrl",2),rep("A",2),rep("B",2),rep("C",2),rep("D",2), rep("E",2))
pData0 = data.frame(cbind(samples, condition), row.names = samples)#

pData<-pData(set4)  ## from RUV analysis

#   x conditions        W_1
#ctrl_1  ctrl  ctrl -0.2864384
#ctrl_2 ctrl   ctrl  0.2952550
#A_1   A       A -0.1725057
#A_2   A       A  0.1767493
#B_1   B       B -0.3269523
#B_2   B       B  0.3252628
#C_1   C       C -0.4057045
#C_2   C       C  0.3960620
#D_1   D       D -0.2868378
#D_2   D       D  0.2795934
#E_1   E       E -0.1910980
#E_2   E       E  0.1966142

ddsfm <- DESeqDataSetFromMatrix(countData = filtered, colData=pData0, design=~conditions)
ddsfm1 <- DESeqDataSetFromMatrix(countData = filtered, colData=pData, design= ~W_1+conditions)

dds<-DESeq(ddsfm1)
dds0<-DESeq(ddsfm)

head(counts(dds0, normalized=T))

When I do those last calls, the normalized counts look the same for both, even though the log2FoldChanges reflect the normalization.  It seems like the counts(dds, normalized=TRUE) call only extracts the counts based on the internal normalization that DESeq2 uses.  Is there a way to extract (or compute) post-normalization counts from scaling factors provided in the "design=" section from the creation of the DESeqDataSet?

modified 4.3 years ago • written 4.3 years ago by nickp600
Answer: Extracting custom post-normalization counts from DESeq2
2
4.3 years ago by
Michael Love25k
United States
Michael Love25k wrote:

The way it is included, it's a covariate not a normalization per se, but I see your point. It's the same request/question as this one:

Extracting normalized DESeq2 counts for a multi-factor design?

I have it on my list to provide such normalized+regressed counts for visualization. I'm working on it during this devel cycle.

1

Hi Michael,

I would like to follow up on this issue. Is there any progress with this feature. Is it possible to provide a small demo how to visualize the normalized+regressed counts?

Guy

I haven't worked this in, because I spent all my time on apeglm and shrinkage estimators. Here's some code you can try though:

condition <- dds$condition w1 <- dds$W_1
# 'idx' is the gene you want to plot
ncts <- counts(dds, normalized=TRUE)[idx,]
plot(lm(ncts ~ w1)$residual + mean(ncts) ~ condition, ​ ylab="normalized, RUV-corrected counts") ADD REPLYlink written 17 months ago by Michael Love25k Hi Michael, Thank you very much for the quick response. I learn a lot from your posts. Guy ADD REPLYlink written 17 months ago by guyho10 Hi Michael, I think that lm(ncts ~ w1)$residual + mean(ncts)


ignores the differences that are due to experimental conditions

If I correctly understand, would't the following plot the ruv corrected values?

plot(lm(ncts ~ W1+conditions)$fitted.values ~ condition, ylab="normalized, RUV-corrected counts")   Thanks again, Guy ADD REPLYlink modified 17 months ago • written 17 months ago by guyho10 No my code is correct, because it keeps the differences due to condition in the data. Yours removes it which means you will see nothing in the plot. ADD REPLYlink modified 17 months ago • written 17 months ago by Michael Love25k Sorry I just saw the “fitted”. Yours is not correct for different reasons then: it keeps in the technical batch effect and it removes the “biological variation”. ADD REPLYlink modified 17 months ago • written 17 months ago by Michael Love25k I understand it now, thanks again for the help. ADD REPLYlink written 17 months ago by guyho10 Hi @Michael, Thanks for your reply; I look forward to the release of this feature. (previous content removed due to calculation error) ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by nickp600 Answer: Extracting custom post-normalization counts from DESeq2 0 4.3 years ago by nickp600 United States nickp600 wrote: At risk of painting a giant target on my head for posting a pokey for-loop solution, I have a workaround that fits my needs for now. This little snippet utilizes a modified version of the plotCounts function. • make a duplicate function of plotCounts called plotCounts2 • replace : if (is.null(sizeFactors(dds)) & is.null(normalizationFactors(dds))) { dds <- estimateSizeFactors(dds) } • with  if (is.null(normalizationFactors(dds)) & !is.null(dds$W_1) ) {     dds$sizeFactor<-dds$W_1   }     else if (is.null(normalizationFactors(dds)) & is.null(dds$W_1) & is.null(sizeFactors(dds))){ dds <- estimateSizeFactors(dds) • This should cause the function to look to the external normalization (the "W_1" column in the colData) and use that instead of the size factor • then use the (comically slow) loop below to extract plot counts for each row From code above: ddsfm <- DESeqDataSetFromMatrix(countData = filtered, colData=pData0, design=~conditions) ddsfm1 <- DESeqDataSetFromMatrix(countData = filtered, colData=pData, design= ~W_1+conditions) dds<-DESeq(ddsfm1) dds0<-DESeq(ddsfm) extracted<-data.frame() # recipient dataframe for (x in row.names(dds)){ y<-(plotCounts2(dds, gene=x, intgroup="conditions", returnData=T)) z<-data.frame(t(data.frame(bbb=y$count, row.names=row.names(y))))
row.names(z)[grep("^bbb$", row.names(z))]<-paste(x) extracted<-rbind(extracted,z) extracted }  Would this be considered a valid work-around, or will this create problems downstream? Would love to hear everyone's thoughts! ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by nickp600 1 hi Nick, unless I'm missing something, the two issues I see with this implementation are that you no longer are correcting for library size, and you are assuming the coefficient for W_1 is 1 for all genes (whereas when you provide W_1 to DESeq(), it recognizes that some genes are more or less affected by batch than others - the coefficient for W_1 is fit for each gene). some code you could use, before I've pushed out the devel version would be: dds <- estimateSizeFactors(dds) norm.counts <- counts(dds, normalized=TRUE) log.norm.counts <- log2(norm.counts + 1)` Then use removeBatchEffect() from the limma package with x=log.norm.counts and batch=W_1. This is very close to the batch effect correction which occurs implicitly when you add W_1 to the formula in DESeq2 (although here on the log normalized counts scale, while with DESeq2 it occurs on the normalized counts scale). ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Michael Love25k Hi Michael; thank you, that is helpful. could you explain a bit more about the removeBatchEffect() function call? When I run the function with batch=pData$W_1, the columns of the resulting dataframe are identical.  Do I need to adjust the default design argument?  And if so, how?

I wouldn't expect them to be identical from the steps I suggested, but it's not really possible for me to figure out what could have happened at this point.

You might just have to wait then for me to implement my planned code in the devel branch.