Hi,
I have don't have much experience, but I was asked to analyze a one-off RNA-seq experiment. The experimental design is as follows:
> coldata Condition Genotype Batch DMU1 Untreated Double mutant 2 DMU2 Untreated Double mutant 2 DMT1 Treated Double mutant 2 DMT2 Treated Double mutant 2 WTU1 Untreated Wildtype 1 WTU2 Untreated Wildtype 1 M1U1 Untreated mutant1 1 M1U2 Untreated mutant1 1 M2U1 Untreated mutant2 1 M2U2 Untreated mutant2 1 M1T1 Treated mutant1 1 M1T2 Treated mutant1 1 WTT1 Treated Wildtype 1 WTT2 Treated Wildtype 1 M2T1 Treated mutant2 1 M2T2 Treated mutant2 1
Basically, there are four genotypes: WT, mutant 1, mutant 2, and double mutant (mutant 1 + mutant 2); and two conditions (Treated, Untreated). Double mutant samples (batch 2), were collected at the same time as the batch 1. However, batch1 and batch2 samples were sent to different companies at different times with different platforms (BGI-seq vs illumina) to sequence. The two batches were also sequenced differently: batch 1 was single end 50bp reads, whereas batch 2 was paired-end reads, 150bp. Because of the differences, I expected a batch effect,
From the literature and analyzing batch 1 samples alone using DESeq2, expression of many genes are induced after treatment in WT. These genes are not as highly induced in the two mutants (mutant1/mutant2 affected genes). The affected genes in mutant 1 and mutant 2 largely overlap, but not completely. The major questions we want to answer are as follows: (1) are the induced genes in WT after treatment still induced in the double mutant after treatment? (2) Are the genes that are still induced in the single mutants still induced in the double mutant? I think both these questions require comparison between the samples from different batches. However, from reading other posts (Correcting for Batch Effects Prior to Differential Gene Expression Analysis with limma), the DESeq2 vignette section on this, and from trying it myself using DESeq2 (trying to add batch to the design to account for the batch effect resulting in a "Model matrix not full rank" error), I'm not sure I can separate the batch effect from the condition/genotype effect of the double mutant samples since there wasn't WT controls included in batch 2. I was wondering if there's any way I can salvage anything from the double mutant data in my case to answer the questions above? Thanks in advanced for any help!
I will try the comparisons you suggested. Thanks for your prompt reply and guidance!