Question

how to design a formula in DEseq2 that control for the effect of putative confounders?

0

Entering edit mode

colaneri ▴ 30

@colaneri-7770

Last seen 6.6 years ago

United States

I have 22 samples, half are controls and half are treatments and this is indicated in the header of one of the columns of the sampleTable named "treatment"

These samples are from human tissues and I have information about age (in a column with header “age”, also information about sex,with a column named “sex” and finally information about cell composition in a column named “cell_mix”

Now I want to see if genes change their expression by the treatment but I want to control for the other putative confounders. I want to know if this formula is correct:

Dds <- DESeqDataSet(se, design=~ treatment + sex + age + cell_mix);

or may be is something like

dds <- DESeqDataSet(se, design=~ treatment + treatment:sex + treatment:age + treatment:cell_mix);

Finally, sex contains nominal values (male of female)

age are numbers (real age)

and cell_mix contain fraction numbers indicating proportion of the cell that are the ones I care about.

Can I use this values such as they are?

I will appreciate help with this

ALe

Deseq2 dds confounders • 5.5k views

ADD COMMENT • link updated 9.9 years ago by Michael Love 43k • written 9.9 years ago by colaneri ▴ 30

score 0 · Answer 1 · 2015-12-21

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 8 days ago

United States

You could use the design formula without interactions and with treatment at the end. Read the FAQ in the vignette about best practices with continuous variables. I'd also recommend making a PCA plot so you have an idea how the samples spread out in the first 2 components at least. (see vignette)

ADD COMMENT • link 9.9 years ago Michael Love 43k

0

Entering edit mode

Dear Michael, thank for your answer. I will appreciate a clarification: do you mean:

Dds <- DESeqDataSet(se, design=~ sex + age + cell_mix + treatment);

instead of

dds <- DESeqDataSet(se, design=~ treatment + treatment:sex + treatment:age + treatment:cell_mix);

ADD REPLY • link 9.8 years ago colaneri ▴ 30

0

Entering edit mode

I am suggesting the first design: ~ sex + age + cell_mix + treatment

But if I were you, I would cut() age into meaningful bins. See the FAQ about modeling dependence on continuous variables.

ADD REPLY • link 9.8 years ago Michael Love 43k

0

Entering edit mode

Let said Im running the formula: ~genotype + treatment + genotype:treatment

because Im mostly interested in identify genes that respond different across genotypes for a particular treatment.

1) Do I really need to run that formula? Can I just run ~genotype:treatment?

2) Let's say that I run ~genotype + treatment + genotype:treatment

Does the res object (res <- results (dds) contains the results of the interaction?

3) After running resutlsNames(dds) I have : "Intercept", "treatment", "genotipe" "interaction"

Im asume that I can run:

results(dds, name="one of the above") to find out p values for each of the comparisons. Let's say that I do:

results(dds, name="genotype")

what does this table contain? the comparison of genes just considering the genotypes and ignoring the treatment? If that is the case...is that result useful at all?

ADD REPLY • link 9.7 years ago colaneri ▴ 30

0

Entering edit mode

If after reading the section on interactions in the vignette, it's not clear which terms do what in the model, and what the design should be for a particular experiment, I would recommend you meet or partner with a local statistician or someone with a quantitative background who can guide you in this part of the analysis. You have a fairly complex design and it would be worthwhile to talk this through with a statistician.

ADD REPLY • link 9.7 years ago Michael Love 43k