Question

DESeq2 Paired Analysis--Tumor vs Normal

0

Entering edit mode

rln0005 • 0

@14b68c07

Last seen 2.7 years ago

United States

I'm using DESeq2 to analyze 7 patient-matched samples (1 tumor and 1 normal from each patient) and I want to identify DEGs with padj<0.05 and LFC>2 in Tumor compared to Normal. I have annotated my coldata file to include a column for "Source" (tumor or normal) and "Patient ID" (listed as A-G). The DESeq2 vignette says to use a multi-factor design which includes the sample info in the design formula to analyze paired samples. As such, I have: design ~ Patient + Source. However, when reviewing resultsNames(dds), I get the following:

"Intercept" "Patient_G_vs_A" "Patient_F_vs_A" "Patient_E_vs_A" "Patient_D_vs_A" "Patient_C_vs_A" "Patient_B_vs_A" "Source_Tumor_vs_Normal"

It seems that the program is including Patient A as a reference level for the "Patient" factor (I know it determines that via ABC order)...but by including Patient in the design formula, I simply want to account for differences between EACH patient to increase the statistical power. Even when I define the statistical factors using dds$Source <- factor(dds$Source, levels=c("Normal", "Tumor")), it generates the same output for resultsNames(dds).

Is this the normal expected output and I can just ignore it (by extracting results using res <- results(dds, alpha=0.05, lfcThreshold=log2(2))) or am I using the multi-factor design incorrectly? Any advice is very much appreciated.

Detailed code:
countdata <- as.matrix(read.csv("gene_count_matrix.csv", row.names="gene_id"))
coldata <-(read.table("pheno.txt", header=TRUE, row.names=1))
dds <- DESeqDataSetFromMatrix(countData = countdata, colData=coldata, design = ~Patient + Source)
dds <- DESeq(dds)
res <- results(dds, alpha=0.05, lfcThreshold=log2(2))

coldata summary: Source | Patient ----------|---------- Normal | A Normal | B Normal | C Normal | D Normal | E Normal | F Normal | G Tumor | A Tumor | B Tumor | C Tumor | D Tumor | E Tumor | F Tumor | G

DESeq2 • 1.1k views

ADD COMMENT • link updated 2.8 years ago by James W. MacDonald 65k • written 2.8 years ago by rln0005 • 0

score 0 · Answer 1 · 2021-07-12

That's what you should expect. Algebraically it's the same thing as doing a paired test, although technically it's doing something slightly different (heuristically, it's converting all the non-patient A data to 'patient A equivalent values' and then computing the average difference between the tumor and normal samples). But you get the exact same values regardless:

## using lm, simply because it's easier to compute by hand
> x <- rnorm(10)
> pt <- gl(5,2)
> tn <- factor(rep(1:2, 5))
> summary(lm(x~pt+tn))

Call:
lm(formula = x ~ pt + tn)

Residuals:
       1        2        3        4        5        6        7        8 
-0.53579  0.53579 -0.24753  0.24753  0.02725 -0.02725  0.60604 -0.60604 
       9       10 
 0.15004 -0.15004 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.4966     0.4708   1.055    0.351
pt2          -0.8480     0.6078  -1.395    0.235
pt3           0.9188     0.6078   1.512    0.205
pt4          -0.5202     0.6078  -0.856    0.440
pt5          -0.9759     0.6078  -1.606    0.184
tn2          -0.4000     0.3844  -1.041    0.357

Residual standard error: 0.6078 on 4 degrees of freedom
Multiple R-squared:  0.7774,    Adjusted R-squared:  0.4991 
F-statistic: 2.793 on 5 and 4 DF,  p-value: 0.1707

> mean(x[seq(2,10,2)] - x[seq(1,9,2)])
[1] -0.3999993

## or using t.test directly

> t.test(x~tn, paired = TRUE)

    Paired t-test

data:  x by tn
t = 1.0405, df = 4, p-value = 0.3568
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.6673121  1.4673107
sample estimates:
mean of the differences 
              0.3999993