Question

lfcShrink() and batch effects in DESeq2

0

Entering edit mode

Eva ▴ 10

@ae923a5a

Last seen 4 weeks ago

Spain

Hi everyone,

While I am trying to understand or ask myself if I should use lfcShrink with my data as well as trying to find answers to my questions through this forum, I still cannot figure it out what we should do if we have a variable considered as "Batch" and we want to use lfcshrink.

In the vignette of DESeq2 appears that the shrinkage estimator apeglm cannot handle contrasts but the design can be rearranged and nbinomWaldTest can be used.

(taken from the vignette) Although apeglm cannot be used with contrast, we note that many designs can be easily rearranged such that what was a contrast becomes its own coefficient. In this case, the dispersion does not have to be estimated again, as the designs are equivalent, up to the meaning of the coefficients. Instead, one need only run nbinomWaldTest to re-estimate MLE coefficients - these are necessary for apeglm - and then run lfcShrink specifying the coefficient of interest in resultsNames(dds). We give some examples below of producing equivalent designs for use with coef. We show how the coefficients change with model.matrix, but the user would, for example, either change the levels of dds$condition or replace the design using design(dds)<-, then run nbinomWaldTest followed by lfcShrink.

However, what if apart from the condition, you also have a batch effect in your samples that you want to include in the design?

You cannot just relevel your design. I want to model the batch effect but the condition of interest is still the same (and I still want to have it at the end of the formula).

In order to give you an example, I have added an extra column to the data from the pasilla package.

    library(pasilla)
    pasCts <- system.file("extdata","pasilla_gene_counts.tsv",
                          package="pasilla", mustWork=TRUE)
    pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv",
                           package="pasilla", mustWork=TRUE)
    cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
    coldata <- read.csv(pasAnno, row.names=1)
    coldata <- coldata[,c("condition","type")]
    coldata$condition <- factor(coldata$condition)
    coldata$type <- factor(coldata$type)
    coldata$batch <- factor(c("a", "b", "c", "a", "b", "b", "c"))
    rownames(coldata) <- sub("fb", "", rownames(coldata))
    cts <- cts[, rownames(coldata)]
    coldata
               condition        type batch
    treated1     treated single-read     a
    treated2     treated  paired-end     b
    treated3     treated  paired-end     c
    untreated1 untreated single-read     a
    untreated2 untreated single-read     b
    untreated3 untreated  paired-end     b
    untreated4 untreated  paired-end     c
    library("DESeq2")
    dds <- DESeqDataSetFromMatrix(countData = cts,colData = coldata,
                                  design = ~ batch + condition)
    dds
    keep <- rowSums(counts(dds)) >= 10
    dds_cleaned <- dds[keep,]
    dds_cleaned <- DESeq(dds_cleaned)
    resultsNames(dds_cleaned)
    [1] "Intercept"                      "batch_b_vs_a"
    [3] "batch_c_vs_a"                   "condition_untreated_vs_treated


    results_dds <- results(dds_cleaned)
    results_withShrinkage <- lfcShrink(dds_cleaned, coef ="condition_untreated_vs_treated",
                                       type="apeglm", res=results_dds)

As you can see above, resultsNames has 4 results, 2 from the batch effect and the last one from the condition of interest.

If I run lfcShrink as I wrote above (taking the condition as my coeficient), will the batch effect be taken into account? If not, how can I take care of it? Or maybe using lfcShrink when you have batch effects is not possible?

The only post that I saw that takes into account batch effects + condition is this one, but the type of shrinkage is "normal" and after reading the literature between the different methods, I think that apeglm is better for my data.

Any help will be really appreciated.

Thanks very much in advance!

BatchEffect lfcshrink RNA-seq DESeq2 • 855 views

ADD COMMENT • link 8 months ago Eva ▴ 10

score 2 · Accepted Answer · 2023-08-08

2

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 22 hours ago

Germany

The covariates are taken into account during the DESeq() step. That means the "batch correction magic" has already happened once you reach the lfcShrink step. Hence, will the batch effect be taken into account, yes, it will or rather it already has at this point.

ADD COMMENT • link 8 months ago ATpoint ★ 4.0k

0

Entering edit mode

Hi, thanks for your reply! Yes, I understand that the batch correction has been "done" before lfcShrink. My question was more related to the coeficient that I have to select for lfcshrink, as it is splitted, I am worried that if I select "condition_untreated_vs_treated", the log2FC obtained will not take into account the batch that I added in the model because it is using just the differences between the condition.

ADD REPLY • link 8 months ago Eva ▴ 10

1

Entering edit mode

As I've said above, batch has been taken into account and what you have to do now is to select the coefficients you want to test.