Design formula with batch effects DESeq2 "Not full rank matrix" error.
Entering edit mode
Flavio • 0
Last seen 11 months ago

Hi all,

I have a problem in figuring out how to include the batch effects in the design formula of DESeq. In particular I have to test the effect of one condition (Low protein diet vs Control diet) on gene expression over 48 mouse liver samples (24 for each condition). The problem is that I sequenced the samples in two separate batches (always keeping Treated vs Control in each batch) and different sequencer. So I have 2 big sources of batch effect:

  1. Different biological replicate (different mothers)
  2. Different sequencing run (this is the biggest batch effect)

I wanted to run DESeq with the following design formula: ~Replicate+Sequencing_run+condition but I got the error of "Not full rank model matrix". Here is the chunk of code that I run:

sample_names <- colnames(cts)
condition <- factor(rep(c(rep("Control",12), rep("Treated", 12)), 2), levels = c("Control", "Treated"))
Replicate <- factor(c(rep(1:3, 4), rep(4:6, 4), rep(7:9, 4), rep(10:12, 4)), levels=c(1,2,3,4,5,6,7,8,9,10,11,12))
Sequencing_run <- factor(c(rep("A", 24), rep("B",24)), levels=c("A","B"))                                           

coldata <- data.frame(condition, Replicate, sex, Sequencing_run, row.names = sample_names)
            condition Replicate Sequencing_run
CTRL1_Rep1    Control         1              A
CTRL1_Rep2    Control         2              A
CTRL1_Rep3    Control         3              A
CTRL2_Rep1    Control         1              A
CTRL2_Rep2    Control         2              A
CTRL2_Rep3    Control         3              A
CTRL3_Rep1    Control         1              A
CTRL3_Rep2    Control         2              A
CTRL3_Rep3    Control         3              A
CTRL4_Rep1    Control         1              A
CTRL4_Rep3    Control         3              A
tRFGG1_Rep1   Treated         4              A
tRFGG1_Rep2   Treated         5              A
tRFGG1_Rep3   Treated         6              A
tRFGG2_Rep1   Treated         4              A
tRFGG2_Rep2   Treated         5              A
tRFGG2_Rep3   Treated         6              A
tRFGG3_Rep1   Treated         4              A
tRFGG3_Rep2   Treated         5              A
tRFGG3_Rep3   Treated         6              A
tRFGG4_Rep1   Treated         4              A
tRFGG4_Rep2   Treated         5              A
tRFGG4_Rep3   Treated         6              A
CTRL1_Rep4    Control         7              B
CTRL1_Rep5    Control         8              B
CTRL1_Rep6    Control         9              B
CTRL2_Rep4    Control         7              B
CTRL2_Rep5    Control         8              B
CTRL2_Rep6    Control         9              B
CTRL3_Rep4    Control         7              B
CTRL3_Rep5    Control         8              B
CTRL3_Rep6    Control         9              B
CTRL4_Rep4    Control         7              B
CTRL4_Rep5    Control         8              B
CTRL4_Rep6    Control         9              B
tRFGG1_Rep4   Treated        10              B
tRFGG1_Rep5   Treated        11              B
tRFGG1_Rep6   Treated        12              B
tRFGG2_Rep4   Treated        10              B
tRFGG2_Rep5   Treated        11              B
tRFGG2_Rep6   Treated        12              B
tRFGG3_Rep4   Treated        10              B
tRFGG3_Rep5   Treated        11              B
tRFGG3_Rep6   Treated        12              B
tRFGG4_Rep4   Treated        10              B
tRFGG4_Rep5   Treated        11              B
tRFGG4_Rep6   Treated        12              B

##RUN DESeq##
dds <-  DESeqDataSetFromMatrix(countData = cts,
                               colData = coldata,
                               design = ~Replicate+Sequencing_run+condition) 

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

I think this is the linear dependency problem between the variables that want to include in the design formula, discussed in the DESeq2 vignette.

  1. Is there a way to solve this problem and tell DESeq to take into account the batch effects when fitting the model?
  2. Regardless whether the problem can be solved, would you analyze all the samples together or would you suggest me to analyze the samples sequenced in the 2 runs as independent experiments and at the end try to correlate the results observed?

Thank you in advance for your help!

Best regards,


DESeq2 BatchEffect RNASeq • 516 views
Entering edit mode

Drop the replicate information in the design, as that is automatically inferred from the columns "condition" and "Sequencing_run". Samples that have the same level in these columns are treated as replicates.

Entering edit mode
swbarnes2 ★ 1.3k
Last seen 5 hours ago
San Diego

The date you ran a library on the instrument does not cause technical artifacts. Library prep date (and maybe RNA extraction date) cause technical artifacts. You should see this in your PCA.


Login before adding your answer.

Traffic: 340 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6