I'm using DESeq2 to analyze 7 patient-matched samples (1 tumor and 1 normal from each patient) and I want to identify DEGs with padj<0.05 and LFC>2 in Tumor compared to Normal. I have annotated my coldata file to include a column for "Source" (tumor or normal) and "Patient ID" (listed as A-G). The DESeq2 vignette says to use a multi-factor design which includes the sample info in the design formula to analyze paired samples. As such, I have: design ~ Patient + Source
. However, when reviewing resultsNames(dds)
, I get the following:
"Intercept"
"Patient_G_vs_A"
"Patient_F_vs_A"
"Patient_E_vs_A"
"Patient_D_vs_A"
"Patient_C_vs_A"
"Patient_B_vs_A"
"Source_Tumor_vs_Normal"
It seems that the program is including Patient A as a reference level for the "Patient" factor (I know it determines that via ABC order)...but by including Patient in the design formula, I simply want to account for differences between EACH patient to increase the statistical power. Even when I define the statistical factors using dds$Source <- factor(dds$Source, levels=c("Normal", "Tumor"))
, it generates the same output for resultsNames(dds)
.
Is this the normal expected output and I can just ignore it (by extracting results using res <- results(dds, alpha=0.05, lfcThreshold=log2(2))
) or am I using the multi-factor design incorrectly? Any advice is very much appreciated.
Detailed code:
countdata <- as.matrix(read.csv("gene_count_matrix.csv", row.names="gene_id"))
coldata <-(read.table("pheno.txt", header=TRUE, row.names=1))
dds <- DESeqDataSetFromMatrix(countData = countdata, colData=coldata, design = ~Patient + Source)
dds <- DESeq(dds)
res <- results(dds, alpha=0.05, lfcThreshold=log2(2))
coldata summary: Source | Patient ----------|---------- Normal | A Normal | B Normal | C Normal | D Normal | E Normal | F Normal | G Tumor | A Tumor | B Tumor | C Tumor | D Tumor | E Tumor | F Tumor | G