Question: Convert Normalized RNA-Seq counts to TPM after Batch Correction
0
18 months ago by
rrcutler50
rrcutler50 wrote:

Hello all,

I have performed batch correction using SVA using normalized counts generated from DESeq2. I then used a function in Limma in order to adjust log normalized counts in so that I can output these batch corrected counts to do analysis that does not involve differential expression. Now I have batch corrected counts that are normalized among my samples. However, I am interested in comparing counts between genes and thus need to convert this to TPM.

Code:

# getting normalized counts
dat  <- counts(dds, normalized = TRUE)

# filtering out normalized counts less than 1

idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]

# since we only have an adjustment variable, batch, we use this as our variable of interest
mod  <- model.matrix(~ batch, sTable)
# setting the null model to the intercept
mod0 <- model.matrix(~ 1, sTable)
# running sva
svseq <- svaseq(dat, mod, mod0)

# get the sva covariates
sva_covar <- svseq\$sv

counts_deseq_sva <- removeBatchEffect(log(counts(dds, normalized = TRUE)+1), covariates = sva_covar)

I have 3 questions:

1. Does it make sense to be using removeBatchEffects() as I have? I happen to know the batch covariates, so it may be easier to use the "batch" parameter in the removeBatchEffects()... however, that does not fix my problem of outputting normalized counts.
2. Since DESeq2 has already normalized for read depth between samples, if I just divide each read count by it's respective gene length, would this be a similar measure to RPKM?
3. Is there a better way of getting batch corrected TPM counts?

Many thanks,

-R

limma sva deseq2 • 909 views
modified 18 months ago • written 18 months ago by rrcutler50
Answer: Convert Normalized RNA-Seq counts to TPM after Batch Correction
3
18 months ago by
Michael Love25k
United States
Michael Love25k wrote:

To answer #3, if you want batch corrected TPM, I think I would just stay on the abundance scale the whole time (estimating mean shifts and removing them on the log2 scale, and working with log2 TPM for plotting or whatever else you need). Note that I would not force the abundances to sum to 1 million at the end, because this is not a great normalization for practical purposes (see e.g. Bullard 2010, or Robinson 2010, or others).

As far I use and am familiar with SVA, you shouldn't provide the batch to mod. The whole point is to get SVA to estimate numerical variables that capture the shifts seen across batches, so either you have the batches and you just remove mean shifts associated with that categorical variable, or you provide SVA with a biological variable and it finds low rank structure that is orthogonal to the biological variable.

Hi Michael,

Thanks for you reply, I'll try your first suggestion. Could you expand on why TPM is not a great normalization?

I am providing the batch to mod as I just have a single group of the same condition where I wish to remove the batch in between them. Is there a better way of using SVA for this purpose?

Thanks,

-R

I’ve provided links to literate covering why, for comparing across samples, it’s suboptimal to require each sample to have the same sum.

I’ll leave recommended SVA usage with respect to known batches to the package authors.

Hi Michael,

I think you forgot the links you were mentioning.

-R

1

Oh I meant “links” like pointers, google the citations “Bullard 2010” and “Robinson 2010”