3.0 years ago by
My understanding of the RPKM measure is that it was intended to make the data amenable to analysis with conventional modeling methods. I don't know that this is actually true, and people like Lior Pachter, who were early proponents of this measure seem to have decided that TPM is a more reasonable measure than RPKM or FPKM, so the data you have in hand may not be considered to be particularly useful these days.
So if that is all you have, then I think the conventional thing to do is just use something like limma, and pretend that RPKM are reasonable inputs. But do note that tools like cufflinks spend a lot of time trying to tease out differences in isoform expression. In other words, instead of giving you some measure of the expression of a gene, cufflinks is trying to say how much of each possible isoform of that gene is being expressed. That is sort of old tech these days as well, as aligners like salmon or kallisto will do an arguably better job at much faster speeds, and will also give you TPM, which you can then just round to the nearest integer and use with edgeR or DESeq2.
Or if you really just want to summarize at the gene level, ignoring transcriptional differences, you could use something like subread and featureCounts to get counts. But all of this assumes you have access to the bamfiles.