I have performed batch correction using SVA using normalized counts generated from DESeq2. I then used a function in Limma in order to adjust log normalized counts in so that I can output these batch corrected counts to do analysis that does not involve differential expression. Now I have batch corrected counts that are normalized among my samples. However, I am interested in comparing counts between genes and thus need to convert this to TPM.
# getting normalized counts dat <- counts(dds, normalized = TRUE) # filtering out normalized counts less than 1 idx <- rowMeans(dat) > 1 dat <- dat[idx, ] # since we only have an adjustment variable, batch, we use this as our variable of interest mod <- model.matrix(~ batch, sTable) # setting the null model to the intercept mod0 <- model.matrix(~ 1, sTable) # running sva svseq <- svaseq(dat, mod, mod0) # get the sva covariates sva_covar <- svseq$sv # Adjust normalized counts counts_deseq_sva <- removeBatchEffect(log(counts(dds, normalized = TRUE)+1), covariates = sva_covar)
I have 3 questions:
- Does it make sense to be using removeBatchEffects() as I have? I happen to know the batch covariates, so it may be easier to use the "batch" parameter in the removeBatchEffects()... however, that does not fix my problem of outputting normalized counts.
- Since DESeq2 has already normalized for read depth between samples, if I just divide each read count by it's respective gene length, would this be a similar measure to RPKM?
- Is there a better way of getting batch corrected TPM counts?