Let's say I have an RNAseq experiment of 12 samples. There are 3 subjects, 2 cell types (B and T cell) and 2 diseases (SLE and HC). I combine cell type and disease into a single combined variable called group, such that it is a factor with 4 distinct levels. Let's say that there are 15k genes detected and each sample has a library depth of around 20M uniquely mapped reads.
I then go through the rigmarole of DESeq2:
coldata = files[, c("sample", "group")]
row.names(coldata) = coldata$sample
# Establish DESeq DGE object
dds = DESeqDataSetFromMatrix(countData=as.matrix(cts[, files$sample]),
colData=coldata,
design = as.formula(~group))
# Run DESeq fitting and get results
dds <- DESeq(dds)
Now let's say I just want to look at the difference between HC and disease within each cell type:
resB = results(dds, contrast = c("group", "B_SLE", "B_HC"),
pAdjustMethod = "fdr", independentFiltering = F)
resT = results(dds, contrast = c("group", "T_SLE", "T_HC"),
pAdjustMethod = "fdr", independentFiltering = F)
This typically how we do it. However, extracting the results in this manner means that DESeq2 is not just fitting to our variable of interest (SLE v HC), but also another axis (B v T). We can perform a similar analysis as such:
for (cellType in unique(files$cellType) {
f = files[files$cellType==cellType, ]
coldata = f[, c("sample", "disease")]
row.names(coldata) = coldata$sample
# Establish DESeq DGE object
dds = DESeqDataSetFromMatrix(countData=as.matrix(cts[, f$sample]),
colData=coldata,
design = as.formula(~disease))
# Run DESeq fitting and get results
dds <- DESeq(dds)
res = results(dds, contrast = c("group", "SLE", "HC"),
pAdjustMethod = "fdr", independentFiltering = F)
}
However, this gives us a different result with different numbers of DEG and different types of DEG. There likely is not a correct answer to this question, only considerations for each method, but how do you know when one is correct and not the other?
With PCA, what % of the variance is caused by the difference between B cells and T cells?