Question

Can we use rlog or vst data values in limma for differential expression analysis?

1

Entering edit mode

Sabiha ▴ 20

@7f93ecd8

Last seen 3 months ago

United States

Hi,

I have a question, I am working with mRNA-Seq dataset (18 samples) corresponding to different 2 batches and 3 cell types . Batch_1 has 2 cell types (A and B types), however, Batch_2 has 3rd cell type (C type). I imported the dataset via DESeq2 in R., but for some reason (as mentioned below regarding the “the model matrix is not full rank", I cannot perform differential expression and to measure the effect of the Cell_Type, controlling for batch differences.

I would like to export either normalised_counts (OR) vst or rlog, and import it to limma for differential expression using limma-trend. Can we use rlog or vst data values in limma for differential expression analysis?

library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ Batch+Cell_Type)
dds

“the model matrix is not full rank, so the model cannot be fit as specified.” There are two main reasons for this problem: either one or more columns in the model matrix are linear combinations of other columns, or there are levels of factors or combinations of levels of multiple factors which are missing samples. We address these two problems below and discuss possible solutions:

library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ Cell_Type)
dds

dds <- DESeq(dds)
res <- results(dds)
res

normalized_counts <- counts(dds, normalized=TRUE)

vsd <- vst(dds, blind=FALSE)
rld <- rlog(dds, blind=FALSE)
head(assay(vsd), 3)

Thank you,

Sabiha

DESeq2 limma Normalization R • 2.4k views

ADD COMMENT • link updated 14 months ago by Gordon Smyth 51k • written 20 months ago by Sabiha ▴ 20

Gordon Smyth · Answer 1 · 2022-12-02

1

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Yes, you can analyse vst or rlog data values in limma, but your design matrix will still be singular. limma will solve the singularity problem automatically by dropping columns from the design matrix, but the automatic solution might not be ideal. I would advise you to solve the fundamental issues with your experimental design rather than simply changing the DE software tool.

ADD COMMENT • link 20 months ago Gordon Smyth 51k

0

Entering edit mode

Hi Gordon Smyth thank you for the prompt response.

I received this data from the collaborators. Is this something to re-run a couple of samples of cell types (A and B with C)? Have you seen this scenario in other experimental setup, like how one handles this type of data?

From PCA, it looks indeed there's a batch affects, clearly segregation of 2 batches. Additionally, running batch effect library like combat also doesn't fix the issues before importing in limma?

ADD REPLY • link 20 months ago Sabiha ▴ 20

1

Entering edit mode

This is a very well known type of problem. If you showed your data.frame of Batch and Cell_Type values (i.e., your coldata) then someone would be able give you advice.

Running comBat does not address the issue and doesn't help.

ADD REPLY • link 20 months ago Gordon Smyth 51k

0

Entering edit mode

Gordon Smyth thank you, this is noted.

Here I am displaying my sample_metadata data.frame.

dput(sample_metadata)
structure(list(Samples = c("S1", "S2", "S3", "S4", "S5", "S6", 
                           "S7", "S8", "S9", "S10", "S11", "S12", "S13", "S14", "S15", "S16", 
                           "S17", "S18"), Subject = c("Subject 1", "Subject 1", "Subject 2", 
                                                      "Subject 2", "Subject 3", "Subject 3", "Subject 4", "Subject 4", 
                                                      "Subject 5", "Subject 5", "Subject 6", "Subject 6", "Subject 1", 
                                                      "Subject 2", "Subject 3", "Subject 4", "Subject 5", "Subject 6"
                           ), Batches = c("Batch_1", "Batch_1", "Batch_1", "Batch_1", "Batch_1", 
                                          "Batch_1", "Batch_1", "Batch_1", "Batch_1", "Batch_1", "Batch_1", 
                                          "Batch_1", "Batch_2", "Batch_2", "Batch_2", "Batch_2", "Batch_2", 
                                          "Batch_2"), Cell_Type = c("A", "B", "A", "B", "A", "B", "A", 
                                                                    "B", "A", "B", "A", "B", "C", "C", "C", "C", "C", "C")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                 -18L))
#>    Samples   Subject Batches Cell_Type
#> 1       S1 Subject 1 Batch_1         A
#> 2       S2 Subject 1 Batch_1         B
#> 3       S3 Subject 2 Batch_1         A
#> 4       S4 Subject 2 Batch_1         B
#> 5       S5 Subject 3 Batch_1         A
#> 6       S6 Subject 3 Batch_1         B
#> 7       S7 Subject 4 Batch_1         A
#> 8       S8 Subject 4 Batch_1         B
#> 9       S9 Subject 5 Batch_1         A
#> 10     S10 Subject 5 Batch_1         B
#> 11     S11 Subject 6 Batch_1         A
#> 12     S12 Subject 6 Batch_1         B
#> 13     S13 Subject 1 Batch_2         C
#> 14     S14 Subject 2 Batch_2         C
#> 15     S15 Subject 3 Batch_2         C
#> 16     S16 Subject 4 Batch_2         C
#> 17     S17 Subject 5 Batch_2         C
#> 18     S18 Subject 6 Batch_2         C

^{Created on 2022-12-02 with [reprex v2.0.2](https://reprex.tidyverse.org)}

ADD REPLY • link updated 20 months ago by Gordon Smyth 51k • written 20 months ago by Sabiha ▴ 20

1

Entering edit mode

Batch 2 is completely confounded with Cell_Type C, so it is completely impossible to correct for batches. Indeed it is impossible even to judge whether there is a batch effect, because the effect of Batch 2 can't be separated from the effect of Cell_Type C.

Another issue is that your experiment is paired by subject, but your analysis so far seems to be ignoring the pairing.

Given the data you have, it is only possible to compare A vs B, which would be done using a paired comparison. You cannot do any analysis with C.

ADD REPLY • link 20 months ago Gordon Smyth 51k

0

Entering edit mode

Gordon Smyth thank you. Let me connect the genomics core lab.

In the meantime, I found something related to this Group-specific condition effects, individuals nested within groups. I am not sure, if this will fix the existing issue.

In the above link it says, We have two groups of samples X and Y, each with three distinct individuals (labeled here 1-6). For each individual, we have conditions A and B (for example, this could be control and treated). Assuming my dataset, groups refers to batch, individual as subjects, and conditions as cell types. Here only confusion I have is, it says, three distinct individuals, however, I have six individuals.

I tried doing something like;

d$ind.n <- factor(rep(rep(1:2,each=2),2))
Error in `$<-.data.frame`(`*tmp*`, ind.n, value = c(1L, 1L, 2L, 2L, 1L,  :
                                                      replacement has 8 rows, data has 18

d$ind.n <- factor(rep(rep(1:3,each=2),2))
Error in `$<-.data.frame`(`*tmp*`, ind.n, value = c(1L, 1L, 2L, 2L, 3L,  :
                                                      replacement has 12 rows, data has 18

d$ind.n <- factor(rep(rep(1:6,each=2),2))
Error in `$<-.data.frame`(`*tmp*`, ind.n, value = c(1L, 1L, 2L, 2L, 3L,  :
                                                      replacement has 24 rows, data has 18

ADD REPLY • link 20 months ago Sabiha ▴ 20

0

Entering edit mode

Gordon Smyth Perhaps shall I need to create a new post for this question?

ADD REPLY • link 20 months ago Sabiha ▴ 20

0

Entering edit mode

No, the link you give does not discuss your experimental design and does not address your confounding issue.

I have already given you a complete answer. The limma User's Guide tells you how to analysis paired comparisons. I have already told you that there is no software or statistical solution to the confounding problem in your data.

ADD REPLY • link 20 months ago • updated 14 months ago Gordon Smyth 51k