Question: differential expression analysis of cell subtypes mixture
0
15 months ago by
Assa Yeroslaviz1.4k
Munich, Germany
Assa Yeroslaviz1.4k wrote:

I have a data set of two different cell lines compositions for a developmental assay in Drosophila. By itself it wouldn't be so difficult to analyze it, but the problem is in the composition of each of the cell lines.

The first cell line has four different subtypes (A,B,C,D) the second one has only subtype C. It is expected to have subtle changes for some of the genes, but we're not really sure how many and what kind of changes.

I was wondering if this can be analyzed in a straight forward way. What can I say about genes differences between these two populations?

Is it possible to compare them in the standard way, just compare one to the other?

I am not sure it would tell me anything significant. If a genes is DE in the first population, what does it means?

If anyone know of a source or reference of some kind for this kind of experiment, I would appreciate a hint.

thanks

Assa

modified 15 months ago by Aaron Lun24k • written 15 months ago by Assa Yeroslaviz1.4k
Answer: differential expression analysis of cell subtypes mixture
0
15 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

It seems like you're comparing between the first (mixed) cell line and second cell line. In that case, a standard DE analysis would operate on the population averages. If a gene is detected as DE, it tells you that the average expression of that gene in one population is different to the average in the other population.

(Note that I use the word "average" loosely here, because the population-level expression not only depends on the composition of subtypes but also on the amount of RNA in each subtype. If cells of a certain subtype have more RNA, then they will contribute more to the population profile, even if they are relatively uncommon.)

If you knew the expression profile for each subtype, then perhaps you could deconvolve the contributions of each subtype to the population profile. I believe this is frequently done for immune cell mixtures, though I can't remember specific papers or packages. Googling suggests DeconRNASeq, for example.

More generally, this is why most bulk RNA-seq studies that I've worked with are done on relatively homogeneous cell populations, to avoid problems with interpretation like the ones you've encountered here. If you can't do that (e.g., you don't have easily targeted markers), then the other obvious alternative would be to do single-cell transcriptomics and pull out the populations in silico.

Just to tack on to Aaron's answer, we have a very straightforward deconvolution method in DESeq2, called unmix(). This was built because we needed it for a few local projects, and we've put it through testing on a number of large bulk RNA-seq datasets now. It does the simple thing of non-negative-combinations-on-raw-expression-scale, while making comparisons between observed expression vector and the non-negative linear combination in a variance stabilized space. I like this VST approach relative to other approaches, which were filtering out the low and highly expressed genes. In my opinion, this is where lots of the signal resides.

Thanks Michael for this suggestion. This is one function I haven't seen before. Looking at the ?unmix information, I was wondering if I understand it correctly.

Using this would mean that in x are my samples with the mixed population and pure are the samples withe only one subtype. Is this correct?

Yes.

Thanks for the fast response. This is what i also thought. I still doubt though, that this is what they are looking for. Maybe a little more background information would help. The experiment is about dendrites in drosophila's brains. We are interested in a neural population which is responsible for motion. This can be divided into the aforementioned four subtypes, which are responsible for different kind of motion. Although their morphology is similar the four subtypes differ in he crucial parts responsible for their functionality.

The problems is that the four subtypes A-D are not separable. The best one can achieve is a mixture of cells with higher A-B amount and low(er) C-D amount of cells. But this still wouldn't tell me how much RNA from each subtype is in the mixed samples.

And yes, you're right, the next step in the plan is to do a single-cell RNA-Seq experiment, but as this takes more time and effort, it was considered to first try and see if one can get some preliminary results doing a standard RNA-Seq experiment.