While I am trying to understand or ask myself if I should use
lfcShrink with my data as well as trying to find answers to my questions through this forum, I still cannot figure it out what we should do if we have a variable considered as "Batch" and we want to use
In the vignette of DESeq2 appears that the shrinkage estimator
apeglm cannot handle contrasts but the design can be rearranged and nbinomWaldTest can be used.
(taken from the vignette) Although apeglm cannot be used with contrast, we note that many designs can be easily rearranged such that what was a contrast becomes its own coefficient. In this case, the dispersion does not have to be estimated again, as the designs are equivalent, up to the meaning of the coefficients. Instead, one need only run nbinomWaldTest to re-estimate MLE coefficients - these are necessary for apeglm - and then run lfcShrink specifying the coefficient of interest in resultsNames(dds). We give some examples below of producing equivalent designs for use with coef. We show how the coefficients change with model.matrix, but the user would, for example, either change the levels of dds$condition or replace the design using design(dds)<-, then run nbinomWaldTest followed by lfcShrink.
However, what if apart from the condition, you also have a batch effect in your samples that you want to include in the design?
You cannot just relevel your design. I want to model the batch effect but the condition of interest is still the same (and I still want to have it at the end of the formula).
In order to give you an example, I have added an extra column to the data from the
library(pasilla) pasCts <- system.file("extdata","pasilla_gene_counts.tsv", package="pasilla", mustWork=TRUE) pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv", package="pasilla", mustWork=TRUE) cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id")) coldata <- read.csv(pasAnno, row.names=1) coldata <- coldata[,c("condition","type")] coldata$condition <- factor(coldata$condition) coldata$type <- factor(coldata$type) coldata$batch <- factor(c("a", "b", "c", "a", "b", "b", "c")) rownames(coldata) <- sub("fb", "", rownames(coldata)) cts <- cts[, rownames(coldata)] coldata condition type batch treated1 treated single-read a treated2 treated paired-end b treated3 treated paired-end c untreated1 untreated single-read a untreated2 untreated single-read b untreated3 untreated paired-end b untreated4 untreated paired-end c library("DESeq2") dds <- DESeqDataSetFromMatrix(countData = cts,colData = coldata, design = ~ batch + condition) dds keep <- rowSums(counts(dds)) >= 10 dds_cleaned <- dds[keep,] dds_cleaned <- DESeq(dds_cleaned) resultsNames(dds_cleaned)  "Intercept" "batch_b_vs_a"  "batch_c_vs_a" "condition_untreated_vs_treated results_dds <- results(dds_cleaned) results_withShrinkage <- lfcShrink(dds_cleaned, coef ="condition_untreated_vs_treated", type="apeglm", res=results_dds)
As you can see above,
resultsNames has 4 results, 2 from the batch effect and the last one from the condition of interest.
If I run
lfcShrink as I wrote above (taking the condition as my coeficient), will the batch effect be taken into account?
If not, how can I take care of it? Or maybe using
lfcShrink when you have batch effects is not possible?
The only post that I saw that takes into account batch effects + condition is this one, but the type of shrinkage is "normal" and after reading the literature between the different methods, I think that
apeglm is better for my data.
Any help will be really appreciated.
Thanks very much in advance!