I did some 10X single cell seq experiments and used the BioConductor packages and approaches to analyze the data (SingleCellExperiment, SingleR, scran, scatter, etc) and found some interesting things by DE and GSEA. However, one reviewer has a concern.
Quick breakdown. Its a cell therapy study. I am comparing the scRNA of the cells sampled before they go into the patients. The phenotype / condition being used to split the groups is based on how the patients respond. One of the phenotypes has at the cell production level , bigger cells. they are physically bigger. The concern from the reviewer is that this difference in cell size is biasing the RNA recovered in that you get a lot more RNA from bigger cells (something that anyone who has done RNA extraction knows) but in this case those genes expressed that make up a bigger component of the large cells would 'dilute' out the other genes. Their suggestion is that the DE genes I am seeing aren't real and are an artifact of the bigger cells that are associated with one phenotype diluting the total RNA captured so the genes in the smaller cells appear like they are higher expressed because all those other genes influencing cell size arent there in such abundance at the per cell level.
What would be a good way to show this may or may not be the case? is there a set of a few genes I could look at for a ratio that if the same would show that my library normalization and cell level scaling is controlling for this variable? My first thinking was, if this was the case , wouldn't I see essentially every gene DE'ed and not just 80? but how would I show or explain that to a non-bioinformatics person?
thanks in advance for any help or tips.