DESeq2 Paired Analysis--Tumor vs Normal
1
0
Entering edit mode
rln0005 • 0
@14b68c07
Last seen 2.7 years ago
United States

I'm using DESeq2 to analyze 7 patient-matched samples (1 tumor and 1 normal from each patient) and I want to identify DEGs with padj<0.05 and LFC>2 in Tumor compared to Normal. I have annotated my coldata file to include a column for "Source" (tumor or normal) and "Patient ID" (listed as A-G). The DESeq2 vignette says to use a multi-factor design which includes the sample info in the design formula to analyze paired samples. As such, I have: design ~ Patient + Source. However, when reviewing resultsNames(dds), I get the following:

"Intercept" "Patient_G_vs_A" "Patient_F_vs_A" "Patient_E_vs_A" "Patient_D_vs_A" "Patient_C_vs_A" "Patient_B_vs_A" "Source_Tumor_vs_Normal"

It seems that the program is including Patient A as a reference level for the "Patient" factor (I know it determines that via ABC order)...but by including Patient in the design formula, I simply want to account for differences between EACH patient to increase the statistical power. Even when I define the statistical factors using dds$Source <- factor(dds$Source, levels=c("Normal", "Tumor")), it generates the same output for resultsNames(dds).

Is this the normal expected output and I can just ignore it (by extracting results using res <- results(dds, alpha=0.05, lfcThreshold=log2(2))) or am I using the multi-factor design incorrectly? Any advice is very much appreciated.

Detailed code:
countdata <- as.matrix(read.csv("gene_count_matrix.csv", row.names="gene_id"))
coldata <-(read.table("pheno.txt", header=TRUE, row.names=1))
dds <- DESeqDataSetFromMatrix(countData = countdata, colData=coldata, design = ~Patient + Source)
dds <- DESeq(dds)
res <- results(dds, alpha=0.05, lfcThreshold=log2(2))

coldata summary: Source | Patient ----------|---------- Normal | A Normal | B Normal | C Normal | D Normal | E Normal | F Normal | G Tumor | A Tumor | B Tumor | C Tumor | D Tumor | E Tumor | F Tumor | G

DESeq2 • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

That's what you should expect. Algebraically it's the same thing as doing a paired test, although technically it's doing something slightly different (heuristically, it's converting all the non-patient A data to 'patient A equivalent values' and then computing the average difference between the tumor and normal samples). But you get the exact same values regardless:

## using lm, simply because it's easier to compute by hand
> x <- rnorm(10)
> pt <- gl(5,2)
> tn <- factor(rep(1:2, 5))
> summary(lm(x~pt+tn))

Call:
lm(formula = x ~ pt + tn)

Residuals:
       1        2        3        4        5        6        7        8 
-0.53579  0.53579 -0.24753  0.24753  0.02725 -0.02725  0.60604 -0.60604 
       9       10 
 0.15004 -0.15004 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.4966     0.4708   1.055    0.351
pt2          -0.8480     0.6078  -1.395    0.235
pt3           0.9188     0.6078   1.512    0.205
pt4          -0.5202     0.6078  -0.856    0.440
pt5          -0.9759     0.6078  -1.606    0.184
tn2          -0.4000     0.3844  -1.041    0.357

Residual standard error: 0.6078 on 4 degrees of freedom
Multiple R-squared:  0.7774,    Adjusted R-squared:  0.4991 
F-statistic: 2.793 on 5 and 4 DF,  p-value: 0.1707

> mean(x[seq(2,10,2)] - x[seq(1,9,2)])
[1] -0.3999993

## or using t.test directly

> t.test(x~tn, paired = TRUE)

    Paired t-test

data:  x by tn
t = 1.0405, df = 4, p-value = 0.3568
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.6673121  1.4673107
sample estimates:
mean of the differences 
              0.3999993
ADD COMMENT

Login before adding your answer.

Traffic: 697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6