I have 22 samples, half are controls and half are treatments and this is indicated in the header of one of the columns of the sampleTable named "treatment"
These samples are from human tissues and I have information about age (in a column with header “age”, also information about sex,with a column named “sex” and finally information about cell composition in a column named “cell_mix”
Now I want to see if genes change their expression by the treatment but I want to control for the other putative confounders. I want to know if this formula is correct:
Dds <- DESeqDataSet(se, design=~ treatment + sex + age + cell_mix);
or may be is something like
dds <- DESeqDataSet(se, design=~ treatment + treatment:sex + treatment:age + treatment:cell_mix);
Finally, sex contains nominal values (male of female)
age are numbers (real age)
and cell_mix contain fraction numbers indicating proportion of the cell that are the ones I care about.
Can I use this values such as they are?
I will appreciate help with this
ALe
Dear Michael, thank for your answer. I will appreciate a clarification: do you mean:
Dds <- DESeqDataSet(se, design=~ sex + age + cell_mix + treatment);
instead of
dds <- DESeqDataSet(se, design=~ treatment + treatment:sex + treatment:age + treatment:cell_mix);
I am suggesting the first design: ~ sex + age + cell_mix + treatment
But if I were you, I would cut() age into meaningful bins. See the FAQ about modeling dependence on continuous variables.
Let said Im running the formula: ~genotype + treatment + genotype:treatment
because Im mostly interested in identify genes that respond different across genotypes for a particular treatment.
1) Do I really need to run that formula? Can I just run ~genotype:treatment?
2) Let's say that I run ~genotype + treatment + genotype:treatment
Does the res object (res <- results (dds) contains the results of the interaction?
3) After running resutlsNames(dds) I have : "Intercept", "treatment", "genotipe" "interaction"
Im asume that I can run:
results(dds, name="one of the above") to find out p values for each of the comparisons. Let's say that I do:
results(dds, name="genotype")
what does this table contain? the comparison of genes just considering the genotypes and ignoring the treatment? If that is the case...is that result useful at all?
If after reading the section on interactions in the vignette, it's not clear which terms do what in the model, and what the design should be for a particular experiment, I would recommend you meet or partner with a local statistician or someone with a quantitative background who can guide you in this part of the analysis. You have a fairly complex design and it would be worthwhile to talk this through with a statistician.