Find DEG from RNA-seq: What if have only RPKM data?
Entering edit mode
Yue Zhao • 0
Last seen 6.5 years ago

Hi all,

Recently I'm doing the RNA-seq analysis, yet I got a problem. The data that I have is a matrix of RPKM, not the read counts, so is there any way to find DEG? As mentioned in the DESeq2 document, methods like DESeq2 can only take matrix of read counts. I tried edgeR, but it seems edgeR is also not for RPKM right? As the original RNA-seq data has been deleted by the person who gave me the RPKM data, I'm wondering if there is some way to analyze the RPKM matrix and get the DEG between some inner groups of my data? (the species is cotton, the RPKM matrix is  37000*50, which could be grouped into 6 groups, each group has different number of samples.) 

Looking forward to your reply and many thanks for reading this email from a stranger :)





deg rna-seq rpkm • 4.8k views
Entering edit mode
b.nota ▴ 340
Last seen 13 months ago

Hi Yue,


It is not advised to use RPKM data for statistical analysis in DESeq2 or edgeR. I don't know what you mean by original data (fastq or bam?), but I would highly recommend not to delete raw data before you publish your study.

I don't know how the person calculated RPKM values, but you might want to ask this person to reverse the calculation.

Usually RPKM is calculated by:

Numb. of mapped reads / (length of transcript / 1000) / (total reads / 10^6)

Correct me if I am wrong.

So if you know the total of reads of each sample (library) and the gene length of each transcript you can calculate the number of mapped reads back.

Hope this helps!




Entering edit mode

If the RPKM values were calculated by cufflinks, then they are NOT able to be back-translated to integer counts. While RPKM is not the most ideal normalization, it's not horrible (except for very low expression genes, but you should filter these out anyway). If that's all you have, then I would suggest using standard limma, not the voom normalization, to find DEGs. You could also try going back to the center that did your sequencing to see if they have a copy of the original .fastq files.

Good luck!


Entering edit mode

Thanks Ben, and Jenny! I asked the person who gave me the data, and he finally found the read counts data somewhere...If he didn't, I think maybe I'll use voom and limma instead, cuz the RPKM was calculated by cufflinks. I didn't notice that RPKM is not supported by edgeR before, so the GO analysis result is a total mess. It's so appreciated to have the kind responses of you guys!  

Best wishes,



Login before adding your answer.

Traffic: 290 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6