Hi!
I'm trying to account for batch effects when comparing gene expression of my samples. My example meta data is below. All samples except for sample21 have replicates; sample21 is a single transcriptome that I'm interested in looking at the gene expression for, in addition to simple analyses like PCA.
(rowname) Group Tissue ConditionA PhenotypeA Combined-Condition-Tissue Batch
sample1 A brain Condition1 Phenotype1 Condition1-brain A
sample2 C liver Condition1 Phenotype1 Condition1-liver A
sample3 A brain Condition1 Phenotype1 Condition1-brain A
sample4 C liver Condition1 Phenotype1 Condition1-liver A
sample5 A brain Condition1 Phenotype1 Condition1-brain A
sample6 B brain Condition2 Phenotype1 Condition2-brain A
sample7 D liver Condition2 Phenotype1 Condition2-liver A
sample8 B brain Condition2 Phenotype1 Condition2-brain A
sample9 D liver Condition2 Phenotype1 Condition2-liver A
sample10 B brain Condition2 Phenotype1 Condition2-brain A
sample11 E brain Condition1 Phenotype2 Condition1-brain A
sample12 F liver Condition1 Phenotype2 Condition1-liver A
sample13 E brain Condition1 Phenotype2 Condition1-brain A
sample14 F liver Condition1 Phenotype2 Condition1-liver A
sample15 E brain Condition1 Phenotype2 Condition1-brain A
sample16 G brain Condition2 Phenotype2 Condition2-brain A
sample17 H liver Condition2 Phenotype2 Condition2-liver A
sample18 G brain Condition2 Phenotype2 Condition2-brain A
sample19 H liver Condition2 Phenotype2 Condition2-liver A
sample20 G brain Condition2 Phenotype2 Condition2-brain A
sample21 Z wing Condition1 Phenotype2 Condition1-wing B
I began preliminary analysis using just samples from Batch A (sample 21 was not added yet).
I set up my designs like:
dds.group <- DESeqDataSetFromTximport(txi.rsem, colData = meta, design = ~ Group)
dds.combined <- DESeqDataSetFromTximport(txi.rsem, colData = meta, design = ~ Combined-Condition-Tissue)
Both worked fine.
But then I got sample 21 back from a second sequencing run and wanted to compare it to my first set of samples (they are all from the same population, just sequenced at different times. To account for batch effect, I altered my code to:
dds.combined.batch <- DESeqDataSetFromTximport(txi.rsem, colData = meta, design = ~ batch + Combined-Condition-Tissue)
This is the error I got:
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
In the meta table, I replaced the info with "TEST" to simulate replicates, and even just one worked fine with the group parameter. But...there are no replicates for sample 21. How would you recommend incorporating it into my analysis while accounting for batch effects?
Thank you!!! :)