I would like to use limma (voom or trend) for my RNAseq analysis, and I am wondering how to correctly model my data. I have 2 different cells (hi and lo) from 2 donors, so I want to use a paired design. However, there is also a pool of more donors (hi and lo). How can I include this correctly in my design?
This here is my targets file:
Sample Donor Cell
D1_lo D1 lo
D1_hi D1 hi
D2_lo D2 lo
D2_hi D2 hi
Pool_lo Pool lo
Pool_hi Pool hi
Agreed, the paired model ~Cell + Donor is correct here.
There's no heteroskedasticity issue here though because any variation between Donors is being removed by the paired model. The analysis of variation is purely within Donor, so the degree of variability in baseline expression from one Donor to another doesn't enter into the analysis.
So it's just an ordinary paired analysis without any special considerations (assuming that the pool doesn't include D1 and D2).
The pool is made of 3 other donors, so this will work for me! Thanks both.
I'm not sure I follow the logic that there's no heteroskedasticity issue. I would expect the pooled samples to have smaller residuals on average, although perhaps multiple pools would be required to show this effect.