Or "Everything You Always Wanted to Know About RNA-Seq (But Were Afraid to Ask) Part 2"
In RNA-Seq, it is common practice to compare the abundance of transcripts within the same sample after some form of intrasample normalizations (e.g., TPM) that take into account both transcript length and sequencing depth (although only the former is strictly necessary as long as no other samples are considered). But how reliable are these quantifications really? In particular, I wonder:
- How influential is the GC-bias? Is it important to correct for it?
- How much do biases that may come from the PCR amplification step due to the gene-specific transcription efficiency of the polymerase matter? Can this be accounted for in some way?
- Are there other biases that undermine intrasample comparisons?
Many thanks to everyone who would like to share their experience and opinions!
[ crossposted on Biostars: https://www.biostars.org/p/9563307/ ]
I have been doing this for about five years (a much shorter experience than yours), but just like you I, too, have only ever done differential expression analyses of the same gene across different conditions. Nevertheless, occasionally, it may be interesting to know, within a family of genes that are relatively comparable from a functional/structural point of view, which one is more expressed and whether their ratios change as a result of a treatment or a pathological condition. For example, we are very interested in ion channels and I think it is legitimate to ask which TRP channels are expressed in the healthy model and if the expression ratio between, e.g., TRPV4 and TRPV1 (both quite similar non-selective cation channels) is reversed in the diseased counterpart. However, I agree that comparing the transcript abundance between TRPM8 and, e.g., interleukin 12 would make much less sense. Maybe. What do you think?
Beside this, I'm sorry for the off topic... should I delete the post?
What you describe isn't a comparison of transcripts within a subject but is instead a comparison of transcript differentials between subjects. This is a sort of pairing, where you would e.g., compute the difference between TRPV4 and TRPV1 in the diseased subjects and the same difference for the healthy subjects, and then test for differences in the differences. By first computing the within-subject differences, you are inherently adjusting for any technical biases in the estimation, and I am not sure you need to worry about the biases you list.
I don't see a problem with TRPM8 vs IL-12 either, really.
You don't need to delete the post. Just note that in the future you should probably keep this sort of discussion on biostars instead.