Deseq2 design formula and analysis with internal controls
3
0
Entering edit mode
@kubracelikbas4-21187
Last seen 1 day ago
Turkey

Hello all,

I'm having difficulty in creating a design formula for my rna-seq experiment. I have two main groups each group has its own internal control. How can I first normalize them according to their internal control and then compare the two groups with each other?

My coldata:

How can I design my coldata matrix and do the comparison with deseq2?

Thank you.

DESeq2 • 404 views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You are asking about an interaction term, which is (drug-continued - control1) - (drug-withdrawed - control2), which assumes that there are actual differences between the two control samples. If the two controls are essentially the same thing, they cancel out and then it's just drug-continued - drug-withdrawed.

There's probably a more automatic way of doing this, but I don't totally get how DESeq2 does the contrasts, so I do it the dumb direct way. As a fake example

> dds <- makeExampleDESeqDataSet()
> design(dds)
~condition
> colData(dds)
DataFrame with 12 rows and 1 column
condition
<factor>
sample1          A
sample2          A
sample3          A
sample4          A
sample5          A
...            ...
sample8          B
sample9          B
sample10         B
sample11         B
sample12         B

## change this to have four levels

> colData(dds) <- DataFrame(condition = factor(rep(LETTERS[1:4], each = 3)))
> colData(dds)
DataFrame with 12 rows and 1 column
condition
<factor>
1           A
2           A
3           A
4           B
5           B
...       ...
8           C
9           C
10          D
11          D
12          D

## and change to a cell means model, which isn't totally necessary, but maybe easier to understand

> design(dds) <- ~ 0 + condition
> design(dds)
~0 + condition
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

## the signs in the interaction term I described above, after
## distributing into the parentheses are 1, -1, -1, 1
## so we use that as the contrast

> results(dds, c(1, -1, -1, 1))
log2 fold change (MLE): +1,-1,-1,+1
Wald test p-value: +1,-1,-1,+1
DataFrame with 1000 rows and 6 columns
baseMean log2FoldChange
<numeric>      <numeric>
gene1     88.17007       0.895912
gene2     11.37401       0.267528
gene3     10.31525      -3.993132
gene4      2.00921       0.069651
gene5     18.40443      -0.322246
...            ...            ...
gene996    13.0700       1.448528
gene997    14.3798       0.200951
gene998    79.2857      -0.550139
gene999    16.5750      -0.014954
gene1000   14.8213      -2.309181
lfcSE       stat
<numeric>  <numeric>
gene1     0.592058  1.5132152
gene2     1.463057  0.1828555
gene3     1.442082 -2.7690057
gene4     2.609663  0.0266897
gene5     1.069997 -0.3011655
...            ...        ...
gene996   0.992003  1.4602058
gene997   1.149728  0.1747813
gene998   0.633383 -0.8685729
gene999   1.124772 -0.0132952
gene1000  1.049954 -2.1993176
<numeric> <numeric>
gene1    0.13022500  0.852226
gene2    0.85491143  0.998859
gene3    0.00562277  0.763831
gene4    0.97870726  0.998859
gene5    0.76328833  0.998859
...             ...       ...
gene996   0.1442335  0.852768
gene997   0.8612515  0.998859
gene998   0.3850808  0.979411
gene999   0.9893923  0.998859
gene1000  0.0278553  0.763831

0
Entering edit mode

Thanks a lot.

1
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 1 day ago
San Diego

How can I first normalize them according to their internal control

That's generally not how it's done. You compare the two groups you want, and see what genes are different. If you want to know differences between drug_continued and drug_stopped, compare them to each other. Don't add the complication of variability and uncertainty from a third set of samples.

0
Entering edit mode

But I want to see if the difference between drug_continued and drug_stopped are specific to them and not observed between control and drug_continued.

0
Entering edit mode

It sounds like the experiment should have had a Day0 with no treatment, Day 5 after 5 days of treatment, and then a Day 10 after 10 days of treatment, and Day 10 when treatment stopped on Day 8. Something like that.

But I guess you want an interaction, where you want to see if the gene change between stopped and its control is different from the gene change between continued and its control.

1
Entering edit mode
BioinfGuru ▴ 40
@yagalbi-11519
Last seen 1 hour ago
Ireland

A few questions:

• What is the nature of the 2 controls in relation to drug-continued/withdrawed?
• Why is every row duplicated? (replicates?)
• What is the difference between "first" and "second"?

Assuming first/second refer to batches (i.e. there is a technical or biological difference unrelated to treatment):

• Then you have 2 factors (make sure they are actually stored as factors): "group" with 2 levels, and "treatment" with 4 levels. The reference levels in their current order will be "first" and "control1", and will form the intercept. So the design looks like it should simply be "~ group + treatment".
• Important: If "group" can affect "treatment" then it needs to be included as an interaction term, so then your design would be "~ group + treatment + group:treatment". For example: if instead of "group", it was "sex". Males may respond differently to treatment than females. So the design would be "~ group + treatment + sex:treatment". I hope that's clear.

If first/second are not batches (i.e just an arbitrary grouping made by you to keep control2 with drug-withdrawed):

• It might be better to concatenate the names as "first_control1" etc. Then you have one factor (treatment) with 4 levels and the design would just be "~treatment"

This blog: A guide to designs and contrasts in DESeq2 has helped me a lot with my contrasts. Take a look at section 4 - Multiple factors multiple levels. When you get to extracting results is when you will need to take care with "deseq2::results(dds, contrast = ???)". The author also builds a contraster() function that is more intuitive, and helped me realise what I am actually comparing when using "contrast =". There is also an explanation of use of "~ 0" (as mentioned by James above) with another worked example if you choose to go that way.

If you are more mathematically minded ( I am not ) ... this might help: What is a multifactorial design with interactions and negative control experiments.

How can I first normalize them

Don't. Deseq2 will do it for you once you figure out the correct format for your metadata, design and contrasts.

0
Entering edit mode

• So one is the extended version of the other (later passage of the same control cell line), they theoretically should be the same with the first but they do not according to results. -Yes, they are duplicates. -First and second refers to groups: first group and second group.

I'll check the links thanks again.

0
Entering edit mode

As there is a technical difference between group 1 and 2 (run time) so it is a batch.

In that case I believe this is what you are looking for: 2 controls and 2 conditions

It will be something like the following:

Assuming you have a summarized experiment object:

# re-level (check my syntax and order, you have to make sure the control in each group is used as the base-line reference)
se$treatment <- relevel(se$treatment, c("c1", "dc", "c2", "dw"))

# create dds (from summarized experiment object)
dds <- DESeqDataSet(df, design = ~group + group:treatment)

# run deseq
dse <- DESeq(dds)

# get names that can be passed inside results()
resultsNames(dse) # this is where relevel() can be checked.

# get results:
results.group1 <- results(dds, name = "group1.treatmentdc") # compares baseline control (c1) to dc
results.group2 <- results(dds, name = "group2.treatmentdw") # compares baseline control (c2) to dw
results.group1-group2 <- results(dds, contrast = list("group2.treatmentdw","group1.treatmentdc")) # gets (dc-c1)-(dw-c2)


It may not be exactly right, but you are not too far off now. IF it were me I would be using the contraster() function of the blog post above to make sure I was comparing exactly as I thought.

0
Entering edit mode

Thank you, this was what I want.