I'm hoping to get your advice on how to test for DE genes in samples of cells isolated from a tissue, accounting for the "background" expression in the whole tissue that each sample was isolated from. Essentially, I'd like help designing a model matrix.
The particular laboratory method we using is Ribo-tagging or TRAP-seq, in which ribosome-associated mRNA from specific cells is isolated from a whole tissue (which contains free mRNA and ribo-associated mRNA from other cells), but I think this discussion may also apply to any kind of cell isolation, including cell sorting or laser micro-dissection.
Ribo-tagging and TRAP-seq have been used to identify transcripts that come from particular cell types, to better characterize cell populations or discover new cell populations. Because the isolation of the cell-type-specific mRNA is not 100% efficient, the field often uses the ratio of cell-type-specific expression over whole tissue expression to more accurately identify genes that are truly specific to the cell type of interest. I've linked a couple papers if you're interested:
However, we are taking things a bit further than identifying cell-specific genes, and would like to identify cell-specific genes that are DE between treatment and control.
In our experiments we have several animals in a treatment group and several in a control group, and each animal yields two samples/RNAseq libraries, the isolated cell-type-specific gene expression levels ("cell expr") and the whole tissue gene expression levels ("tissue expr"). On top of that, we have several time points for each group (different animals in each time point). We want to ask how treatment effects gene expression within each time point, which genes change over time within each group, and which genes are different over time between treatment and control, always accounting for background expression in the tissue to reduce noise.
1. Analysis option 1 is to take the ratio of cell expr/tissue expr and perform DE gene analysis on the ratio values as usual with limma. But, there's some squirrely things that happen to ratios with low-abundance genes, and I'm not certain that limma will enjoy working with ratios of counts when it's used to working with counts. (Maybe this is wrong- I'm of the RNAseq generation, but I know limma was developed for microarrays.)
2. Analysis option 2 is to keep the cell expr data and the tissue expr data separate, perform DE analysis on them individually, and compare the resulting lists of DE genes afterward, noting which genes appear on the cell-specific list and which appear on both the cell-specific and tissue-specific lists.
3. Analysis option 3 is to set up a model matrix that includes the paired cell and tissue samples from each animal and looks at the difference between treatment and control within time points and across time points. I can't wrap my head around what the model matrix would look like. In the case of a within-time point analysis, I could treat the tissue expression of each sample like a baseline expression, but how to test for genes that are differentially regulated between time 0 and time 1 and between treatment and control, accounting for background tissue expression eludes me.
Thanks so much for any thoughts/designs you have!