how to design a formula in DEseq2 that control for the effect of putative confounders?
1
0
Entering edit mode
colaneri ▴ 30
@colaneri-7770
Last seen 5.7 years ago
United States

I have 22 samples, half are controls and half are treatments and this is indicated in the header of one of the columns of the sampleTable named "treatment"

These samples are from human tissues and I have information about age (in a column with header “age”, also information about sex,with a column named “sex” and finally information about cell composition in a column named “cell_mix”

 

Now I want to see if genes change their expression by the treatment but I want to control for the other putative confounders. I want to know if this formula is correct:

 

Dds <- DESeqDataSet(se, design=~ treatment + sex + age + cell_mix);

or may be is something like

dds <- DESeqDataSet(se, design=~ treatment + treatment:sex + treatment:age + treatment:cell_mix);

Finally, sex contains nominal values (male of female)

age are numbers (real age)

and cell_mix contain fraction numbers indicating proportion of the cell that are the ones I care about.

Can I use this values such as they are? 

I will appreciate help with this

ALe

 

 

 

 

 

Deseq2 dds confounders • 5.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 23 hours ago
United States
You could use the design formula without interactions and with treatment at the end. Read the FAQ in the vignette about best practices with continuous variables. I'd also recommend making a PCA plot so you have an idea how the samples spread out in the first 2 components at least. (see vignette)
ADD COMMENT
0
Entering edit mode

Dear Michael, thank for your answer. I will appreciate a clarification: do you mean:

Dds <- DESeqDataSet(se, design=~ sex + age + cell_mix + treatment);

instead of

dds <- DESeqDataSet(se, design=~ treatment + treatment:sex + treatment:age + treatment:cell_mix);

 

 

ADD REPLY
0
Entering edit mode

I am suggesting the first design: ~ sex + age + cell_mix + treatment

But if I were you, I would cut() age into meaningful bins. See the FAQ about modeling dependence on continuous variables.

ADD REPLY
0
Entering edit mode

Let said Im running the formula: ~genotype + treatment + genotype:treatment

because Im mostly interested in identify genes that respond different across genotypes for a particular treatment.

1) Do I really need to run that formula? Can I just run   ~genotype:treatment?

2) Let's say that I run ~genotype + treatment + genotype:treatment

Does the  res object (res <- results (dds) contains the results of the interaction?

3) After running  resutlsNames(dds) I have : "Intercept", "treatment", "genotipe" "interaction"

Im asume that I can run: 

results(dds, name="one of the above") to find out p values for each of the comparisons. Let's say that I do:

results(dds, name="genotype")

what does this table contain? the comparison of genes just considering the genotypes and ignoring the treatment? If that is the case...is that result useful at all?

 

 

 

 

 

 

 

 

ADD REPLY
0
Entering edit mode

If after reading the section on interactions in the vignette, it's not clear which terms do what in the model, and what the design should be for a particular experiment, I would recommend you meet or partner with a local statistician or someone with a quantitative background who can guide you in this part of the analysis. You have a fairly complex design and it would be worthwhile to talk this through with a statistician.

ADD REPLY

Login before adding your answer.

Traffic: 554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6