Hi all,
I have a basic question regarding the normalization of RNAseq data - I understand why we have to normalize the raw counts, but I do not fully understand the biological details and I am confused about the differences between methods - so sorry if the answer is obvious.
Basically, I have ~ 58.000 transcripts, and I just want to normalize the raw counts and transform them so that I can make comparisons (I have 2 time points and 60 samples per time point). I would like to do it in R.
My question is: Is there an opportunity to just normalize & transform (I mean sth like (log)CPM) my data, without a prior filter? If yes, do you have any suggestion what method/ package (and function) I could use?
(I would like to filter and apply a variance-mean stabilization afterwards)
Thank you for advices!
Whenever you're feeling lost amidst all the options, following the example workflows like this one would work wonders.
Briefly, you could and should filter lowly-expressed genes before normalisation. This is especially important if you decided to use TMM normalisation (as implemented in
edgeR
'scalcNormFactors
) as the method is quite sensitive to gene filtering (CMIIW).