Question

DESeq2: Comparing A Knockout to a Strain

0

Entering edit mode

Viktor Thiel • 0

@viktor-thiel-9079

Last seen 9 months ago

Sweden

Dear all,

we performed RNAseq for a set of mouse samples. The samples come from different mouse strains, and from different knockout lines. Here is an excerpt of the sample metadata:

sample  condition      strain
1       WT             B6
2       WT             B6
3       WT             B6
4       Gene1KO        B6
5       Gene1KO        B6
6       Gene1KO        B6
7       Gene1KO        FOO
8       Gene1KO        FOO
9       Gene1KO        FOO
10      WT             BAR
11      WT             BAR
12      WT             BAR
13      Gene2KO        B6
14      Gene2KO        B6
15      Gene2KO        B6
16      WT             FOO
17      WT             FOO
18      WT             FOO

For the B6 strain, there are other knockouts than Gene1KO and Gene2KO which I omitted.

My design is:

design(dds) <- 0 + strain + condition

In general, the comparisons are straightforward to do because we want to compare the KOs to the WTs of the same strain. They are mostly in the B6 strain.

res1 <- results(dds, contrast = c("condition", "Gene1KO", "WT")
res2 <- results(dds, contrast = c("condition", "Gene2KO", "WT")

In a few particular cases, though, we are interested in comparing WT individuals of strain BAR to the overall effects of knockout Gene1KO, essentially treating strain as a batch variable. Phenotypically, we expect them to be similar overall with potentially pronounced effects in a few specific pathways. The Gene1KO samples come from several strains, but not from strain BAR. Incorporating the interaction condition:strain is not possible because we lack Gene1KO:BAR.

I believe this does what we want:

res3 <- results(dds, contrast = list("conditionGene1KO", "strainBAR")

Is this a valid approach to take? It runs fine, but results in ~22500 significant hits out of ~22800 genes after lfcShrink, which makes me a bit suspicious ...

Thanks in advance!

DESeq2 • 1.0k views

ADD COMMENT • link 22 months ago Viktor Thiel • 0

score 0 · Answer 1 · 2024-03-08

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

That last contrast doesn't make sense, comparing across coefficients from different design terms. I'd recommend consulting with a local statistician on your statistical analysis plan.

ADD COMMENT • link 22 months ago Michael Love 43k

0

Entering edit mode

Thank you for confirming this is a bad idea.

I understand of course that your time is limited. I will leave my follow-up question here anyway, and if you or someone else responds I will have learned something, and if not I will come back once I grasp my misunderstanding.

My reasoning was that, for one gene i and sample j, I essentially have a model log2(cts) = b1 strainABC + b2 strainBAR + b3 * condGene1KO + ...

To go from baseline to Gene1KO: log2(cts) = condGene1KO

To go from baseline to strain BAR: log2(cts) = strainBAR

Difference: log2FC = strainBAR - condGene1KO

Obviously I am missing something here.

ADD REPLY • link 22 months ago Viktor Thiel • 0