I have a few questions regarding the normalization performed by the DESeq2 package:
- I understand that the normalization is working under the assumption that most of the genes are not DE and only a small subset of genes are. I understand that this assumption is realistic in most experiments, but I would appreciate your input regardless the correctness of this assumption when performing analysis across different human tissues. Meaning, does it make sense to assume that most of the genes will have the same expression levels in different human tissues? or should I perform different normalization when dealing with this sort of data?
- In some RNA-Seq experiments we encounter situations in which some libraries are sequenced in a greater depth than others, with larger range of library sizes that we usually expect. In those cases, the size factor given to the deeply sequenced samples will be much higher than 1 and the factor assigned to the smaller samples will be much lower than 1. In those cases, is it better to subsample the larger libraries in order to prevent this wide range of size factors?
I am addressing this issue, since when a sample receives a very small size factor, it seems to me that we are artificially increasing the counts of all genes in this sample without any biological evidence to back that up. It seems to me less problematic to lower the counts of the bigger libraries, but the artificial addition to the small ones seems more of a problem to me. What do you think?
Thank you very much,