Hi everyone,
I have RNAseq data of cells sorted from mix cultures of 2 cell types as well as pure cell culture from only one of these 2 types. Read counts of certain transcripts specifically expressed in only one of the 2 cell types show that the sorted sample of one type always has a small amount of contamination from the other type, and vice versa. The contaminating read amounts vary from one replicate to another. Western blots confirm this contamination.
How do I normalize the transcript read counts for all other genes with this variable contamination? Does DESeq2, edgeR or any other software have a way to deal with this problem? If so, could you please point to the correct way of doing it?
Thank you very much!
Hi Michael,
Thank you for your answer. However, I'm not sure what "i" refers to in your example.
To make my questions more specific, below is an example of the raw read counts for 2 cell-specific genes in 2 biological replicates of each sample (I have 3 replicates and time course, but want to simplify the example):
gene beta: expressed only in cell type B
In Bmix samples, the alpha counts are contamination from cells A. In Amix samples, the beta counts are contamination from cells B.
Since I have 2 batches (batch1: rep1, batch2: rep2+rep3), I have included this batch in the design : ~batch + condition
The conditions of this design are A, Amix, B, and Bmix. I'd like to obtain differentially expressed genes of Amix vs A, Bmix vs. B, and Amix vs Bmix.
1) Should I add two more terms in the design ('alpha' and 'beta') in the colData file? If yes, all mixed samples should be "yes" for both alpha & beta, whereas pure samples should be "yes" only for the corresponding cell-specific gene? Then the correct design formula is ~batch + alpha + beta + condition ?
2) In case I decide not to look at Amix vs. Bmix, is it better to do 2 separate analyses for Amix vs. A, and Bmix vs. B, using 2 separate count tables?
Thanks a lot in advance for your reply!
For selecting the design for use with DESeq2, I'd recommend to work with a statistician. There are many choices here, and these reflect assumptions about your experiment. I unfortunately don't have time right now to help with these decisions on the support site.
Thank you Michael for your advice.