First off, I'm a wet lab scientist learning to analyse my own data. I've design my experiment as such
> coldata sample condition litter 1 KO1_sort.bam KO A 2 KO2_sort.bam KO B 3 KO3_sort.bam KO A 4 KO4_sort.bam KO B 5 WT1_sort.bam WT A 6 WT2_sort.bam WT B 7 WT3_sort.bam WT A 8 WT4_sort.bam WT B
Before normalization, WT3 and KO1 shows higher variability from the RLE plot, and also cluster together based on the first principal component on my PCA plot.
RUVg with k=2 is able to reduce the variation seen in the RLE plot and results in WT and KO samples clustering separately on PCA plot.
Empirical genes for RUVg were obtained using a cutoff of pvalue > 0.5 and design = ~ litter + condition in DESeq2. My question is whether I should still account for the 'litter' factor in my DESeq2 design after taking into account the variation modelled using RUVg, or not? Option 1:
design(ddsruv) <- ~W1 + W2 + litter + condition
design(ddsruv) <- ~W1 + W2 + condition