Hi all,
I am using DESeq2 and edgeR to perform DE analysis on paired samples
on a dog cancer project.
Sorry if the question is redundant but I can?t find one very similar
to my case.
I have been designing models with 2 factors: condition (control /
tumor) and patient ID (to match the paired samples). I used the model
'~sample_id + condition? until now but I would like to add a third
factor, the breed.
Is that then correct to use ?~sample_id + breed + condition? if my
goal is to analyse the DE between control and tumor samples taking
into account the individual variabilities (with the sample ID factor)
and the breed variability (with the breed factor).
Here is an example of a sample table I could have:
Patient ID Condition
Breed
Sample1 1 Control Breed1
Sample2 2 Control Breed2
Sample3 3 Control Breed1
Sample4 4 Control Breed2
Sample5 1 Tumor Breed1
Sample6 2 Tumor Breed2
Sample7 3 Tumor Breed1
Sample8 4 Tumor Breed2
Hi Mathieu
On 19/08/14 10:01, Mathieu Bahin wrote:
> I have been designing models with 2 factors: condition (control /
> tumor) and patient ID (to match the paired samples). I used the
model
> '~sample_id + condition? until now but I would like to add a third
> factor, the breed.
> Is that then correct to use ?~sample_id + breed + condition? if my
> goal is to analyse the DE between control and tumor samples taking
> into account the individual variabilities (with the sample ID
factor)
> and the breed variability (with the breed factor).
No. This would make breed another blocking factor, besides patient_id.
But it does not offer any new information, because all samples from
the
same patient are from the same breed, so the patient_id factor already
captures all variation associated with this.
Therefore, there is no need to account for breed if you just want to
see
the overall effect of cancer.
If, however, you want to know for which genes the expression change
due
to cancer _depends_ on breed, you are looking for an _interaction_
between breed and condition and should hence use:
~ patient_id + condition + breed:condition
(BTW, I renamed your factor from "sample_id" to "patient_id": After
all,
you have two samples from each patient.)
> Another question: If I use the pairwise information, I don?t have
> replicates because I only have two sample (one control, one tumor)
> for each patient. Is it better to use it (and then have no
replicates)
> or not (and then have replicates for ?control? and ?tumor? samples)
?
Of course, you still have replicates. You have several dogs. This is
the
whole point of the paired design. If you omitted the "patient_id"
factor, you would drastically lose inferential power.
Simon