I've been looking into normalization more and more, and I was wondering about a few things that perhaps some of you might know the answer to or want to discuss
So their exists within samples normalization (TPM or others), i.e. relative abundances and between samples normalization (TMM or others), but is it necessary to do both ever, i.e. is it ever necessary to normalize relative abundances across a cohort?
I don't think it would be, but another scenario which seems to be quite common is filtering out isoforms that have no expression for 90% (or some other threshold) of the samples if working with a large cohort. But if you do this while working with TPM then the sum of TPM for every isoform for each subject will no longer be equal. Would it make sense to then use TMM after such a filtration process? I think it would.
Do you think such filter out of isoforms is flawed in some manner?
My guess is it used because people are worried about the sensitivity of RNA-seq and biologically most think that for specific tissue type a good percentage of genes are not expressed. So I think it makes some sense
It seems like all between samples normalizations require raw counts as input, and leave it there. I read harold pimentel's blog post about it (https://haroldpimentel.wordpress.com/2014/12/08/in-rna-seq-2-2-between-sample-normalization/, very informative) but I haven't seen a follow up about this problem if it is a problem.
I'm new to this stuff, so I was wondering what others thoughts are on the issue.