Hi, I have a matrix of approximately 2 million genes, reads resulting from RSEM_readCounts with over 100 samples (over 30 treatments, each one has 3 repeats).
- What is the correct way to identify differentially expressed genes?
- I was thinking of using limma, since I found few references suggesting limma for RSEM. How should one normalize RSEM_readCounts with limma? Is there some suggested pipeline for using limma for such case?
- Normalization issue: I have also tried to use logCPM or logCPMPrior3 on the RSEMreadCounts and received 20% of the data with high peak below zero (plus a peak above zero). I have tried to do the same (logCPM or logCPMPrior3) on rounded RSEMreadCounts values but still got high peak below zero. Of course I have filtered for low counts before. Any suggestion for better normalization?