I am using DESeq2 to compare gene expression by condition across multiple brain regions. "condition" has 2 levels. "region" has 7 levels. I have n=3 treatment and n=3 control in each region for a total of 42 samples
My goal is to compare gene expression by condition within each region to address the questions: What is the effect of condition in region A?
What is the effect of condition in region B?
What is the effect of condition in region C?
What is the effect of condition in region D?
What is the effect of condition in region E?
What is the effect of condition in region F?
What is the effect of condition in region G?
These questions are intended to be standalone with the larger goal of comparing the results for each region qualitatively after DE analyses. I can accomplish this using 7 separate pairwise Wald tests, but my power is greatly reduced by including only n=6 in each model.
Where I am unclear is whether it is valid (or possible) to use a LRT including all 42 samples in the models, then extract results specifically for each region individually
The wording in the manual states that "condition_B_vs_A" will return the results describing the effect of condition in the first level of region (so, the effect of condition in region A). This returns many more DEGs than a pairwise Wald test within region A alone.
Is it valid to re-level each of these regions to be the "first" level of region, thereby using all 42 samples to build the models, but extracting results for each region individually as the main effect? Or am I trying to do something that is not at all valid?
Please let me know if I can clarify my question or what I am trying to accomplish.
Thank you!
# Load the counts.tab file
df <- read.table("combined_expression.tab")
# Load the key.tab file
dkey <- read.table("Key.tab", header=T, sep = "\t")
dkey$CONDITION <- as.factor(dkey$CONDITION)
dkey$REGION <- as.factor(dkey$REGION)
# Filter out low expression genes
columns <- ncol(df)
df <- df[rowSums(df)>=(columns*1),]
# Create matrix from dataframe
dm <- as.matrix(df)
# Run DESeq
dds <- DESeqDataSetFromMatrix(dm, dkey, design = ~ REGION + CONDITION)
dds$CONDITION <- relevel(dds$CONDITION, "cont")
dds_LRT <- DESeq(dds, test="LRT", full = ~ REGION + CONDITION, reduced = ~ REGION)
resultsNames(dds_LRT)
res_LRT<-results(dds_LRT, name="CONDITION_B_vs_A", alpha = FDR)
summary(res_LRT)
sessionInfo( )
> dkey
SAMPLE CONDITION REGION
1 A-S1 cont A
2 A-S2 cont A
3 A-S3 cont A
4 B-S1 cont B
5 B-S2 cont B
6 B-S3 cont B
7 C-S1 cont C
8 C-S2 cont C
9 C-S3 cont C
10 D-S1 cont D
11 D-S2 cont D
12 D-S3 cont D
13 E-S1 cont E
14 E-S2 cont E
15 E-S3 cont E
16 F-S1 cont F
17 F-S2 cont F
18 F-S3 cont F
19 G-S1 cont G
20 G-S2 cont G
21 G-S3 cont G
22 A-P1 treat A
23 A-P2 treat A
24 A-P3 treat A
25 B-P1 treat B
26 B-P2 treat B
27 B-P3 treat B
28 C-P1 treat C
29 C-P2 treat C
30 C-P3 treat C
31 D-P1 treat D
32 D-P2 treat D
33 D-P3 treat D
34 E-P1 treat E
35 E-P2 treat E
36 E-P3 treat E
37 F-P1 treat F
38 F-P2 treat F
39 F-P3 treat F
40 G-P1 treat G
41 G-P2 treat G
42 G-P3 treat G
> resultsNames(dds_LRT)
[1] "Intercept" "REGION_B_vs_A" "REGION_C_vs_A" "REGION_D_vs_A"
[5] "REGION_E_vs_A" "REGION_F_vs_A" "REGION_G_vs_A" "CONDITION_treat_vs_cont"
> summary(res_LRT)
out of 13908 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 1127, 8.1%
LFC < 0 (down) : 1008, 7.2%
outliers [1] : 2, 0.014%
low counts [2] : 270, 1.9%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results