Search
Question: Extracting custom post-normalization counts from DESeq2
0
gravatar for nickp60
22 months ago by
nickp600
United States
nickp600 wrote:

Hi,

I am experimenting with different normalization techniques to account for batch effects on an RNAseq dataset.  I am using DESeq2 for DE analysis, and RUVseq for normalization.  For plotting purposes, I want to be able to extract the post-normalization counts (pseudocounts?) from the DESeqDataSet. 

 

 

#countsTable is a gene x count matix

samples<-colnames(countsTable)
condition<-c(rep("ctrl",2),rep("A",2),rep("B",2),rep("C",2),rep("D",2), rep("E",2))
pData0 = data.frame(cbind(samples, condition), row.names = samples)#

pData<-pData(set4)  ## from RUV analysis

#   x conditions        W_1
#ctrl_1  ctrl  ctrl -0.2864384
#ctrl_2 ctrl   ctrl  0.2952550
#A_1   A       A -0.1725057
#A_2   A       A  0.1767493
#B_1   B       B -0.3269523
#B_2   B       B  0.3252628
#C_1   C       C -0.4057045
#C_2   C       C  0.3960620
#D_1   D       D -0.2868378
#D_2   D       D  0.2795934
#E_1   E       E -0.1910980
#E_2   E       E  0.1966142

ddsfm <- DESeqDataSetFromMatrix(countData = filtered, colData=pData0, design=~conditions)
ddsfm1 <- DESeqDataSetFromMatrix(countData = filtered, colData=pData, design= ~W_1+conditions)

dds<-DESeq(ddsfm1)
dds0<-DESeq(ddsfm)

head(counts(dds, normalized=T))
head(counts(dds0, normalized=T))

When I do those last calls, the normalized counts look the same for both, even though the log2FoldChanges reflect the normalization.  It seems like the counts(dds, normalized=TRUE) call only extracts the counts based on the internal normalization that DESeq2 uses.  Is there a way to extract (or compute) post-normalization counts from scaling factors provided in the "design=" section from the creation of the DESeqDataSet?

 

Thanks in advance!

ADD COMMENTlink modified 22 months ago • written 22 months ago by nickp600
2
gravatar for Michael Love
22 months ago by
Michael Love12k
United States
Michael Love12k wrote:

The way it is included, it's a covariate not a normalization per se, but I see your point. It's the same request/question as this one:

Extracting normalized DESeq2 counts for a multi-factor design?

I have it on my list to provide such normalized+regressed counts for visualization. I'm working on it during this devel cycle.

ADD COMMENTlink written 22 months ago by Michael Love12k

Hi @Michael, 

Thanks for your reply;  I look forward to the release of this feature.  

(previous content removed due to calculation error)

ADD REPLYlink modified 22 months ago • written 22 months ago by nickp600
0
gravatar for nickp60
22 months ago by
nickp600
United States
nickp600 wrote:

At risk of painting a giant target on my head for posting a pokey for-loop solution, I have a workaround that fits my needs for now.  This little snippet utilizes a modified version of the plotCounts function.  

  • make a duplicate function of plotCounts called plotCounts2
  • replace :
if (is.null(sizeFactors(dds)) & is.null(normalizationFactors(dds))) {
        dds <- estimateSizeFactors(dds)
    }
  • with

  if (is.null(normalizationFactors(dds)) & !is.null(dds$W_1) ) {
    dds$sizeFactor<-dds$W_1
  }  
  else if (is.null(normalizationFactors(dds)) & is.null(dds$W_1) & is.null(sizeFactors(dds))){
    dds <- estimateSizeFactors(dds)

  • This should cause the function to look to the external normalization (the "W_1" column in the colData) and use that instead of the size factor 
  • then use the (comically slow) loop below to extract plot counts for each row

 

From code above:

 

ddsfm <- DESeqDataSetFromMatrix(countData = filtered, colData=pData0, design=~conditions)
ddsfm1 <- DESeqDataSetFromMatrix(countData = filtered, colData=pData, design= ~W_1+conditions)

dds<-DESeq(ddsfm1)
dds0<-DESeq(ddsfm)

extracted<-data.frame()    # recipient dataframe
for (x in row.names(dds)){    
  y<-(plotCounts2(dds, gene=x,  intgroup="conditions", returnData=T))
  z<-data.frame(t(data.frame(bbb=y$count, row.names=row.names(y))))
  row.names(z)[grep("^bbb$", row.names(z))]<-paste(x)
  extracted<-rbind(extracted,z)
  extracted
}

 

Would this be considered a valid work-around, or will this create problems downstream?  Would love to hear everyone's thoughts!

 

ADD COMMENTlink modified 22 months ago • written 22 months ago by nickp600
1

hi Nick,

unless I'm missing something, the two issues I see with this implementation are that you no longer are correcting for library size, and you are assuming the coefficient for W_1 is 1 for all genes (whereas when you provide W_1 to DESeq(), it recognizes that some genes are more or less affected by batch than others - the coefficient for W_1 is fit for each gene).

some code you could use, before I've pushed out the devel version would be:

dds <- estimateSizeFactors(dds)
norm.counts <- counts(dds, normalized=TRUE)
log.norm.counts <- log2(norm.counts + 1)

Then use removeBatchEffect() from the limma package with x=log.norm.counts and batch=W_1.

This is very close to the batch effect correction which occurs implicitly when you add W_1 to the formula in DESeq2 (although here on the log normalized counts scale, while with DESeq2 it occurs on the normalized counts scale).

ADD REPLYlink modified 22 months ago • written 22 months ago by Michael Love12k

Hi Michael;  thank you, that is helpful.  could you explain a bit more about the removeBatchEffect() function call?  When I run the function with batch=pData$W_1, the columns of the resulting dataframe are identical.  Do I need to adjust the default design argument?  And if so, how?

ADD REPLYlink written 22 months ago by nickp600

I wouldn't expect them to be identical from the steps I suggested, but it's not really possible for me to figure out what could have happened at this point.

You might just have to wait then for me to implement my planned code in the devel branch.

ADD REPLYlink written 22 months ago by Michael Love12k

Fair enough-  thanks again for all the help!

ADD REPLYlink written 22 months ago by nickp600
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 165 users visited in the last hour