I have a set of Mouse RNASeq data for various points of a differentiation time series, with replicates. A known fraction of the cells in each sample are Mouse Embryonic Fibroblasts (MEFs), that fraction varies quite a bit per sample. I have expression data for a pure MEF sample grown under similar conditions.

I'd like to do a fold change analysis between time points, subtracting the MEF expression contamination in such a way that the resulting increased variance per gene is factored into the fold change analysis.

It seems it may be possible to do it within DESeq2 or using svaseq but I can't figure out how. Can anyone recommend a strategy for doing this; I'm really not clear on how to approach it.

Thanks!

*(note this is a crosspost from Biostars, as I wasn't able to get a solution there)*

By "known fraction" I mean I can give a measured numeric value for each sample based on the number of cells.

So you could put it in the model as a numeric covariate (you don't do anything special just put it in the design). This however assumes the relationship with expression is log linear (so linear with log expression). You probably want linear with expression though. You can try transforming the MEF variable before putting it in the design, if you expect a certain relationship.

Thanks! I'll give it a try and post this reply on Biostars.

Hello gpalidwor, I am very interested in how you went with this - I am looking to do a very similar thing to 'subtract' RBC contamination from an expression experiment. Can you tell me whether the approach worked? Did you apply a transformation to the MEF variable?

Hello gpalidwor, I am very interested in how you went with this - I am looking to do a very similar thing to 'subtract' RBC contamination from an expression experiment. Can you tell me whether the approach worked? Did you apply a transformation to the MEF variable?