Question: Can FPKM or TPM values be used as RUVs input ?
6 months ago
I am trying to normalize my RNA-seq expression using spike-ins by RUVSeq package. However, I noticed that the input for RUVseq is count-based expression. My expression matrix is based on RPKM or TPM from STAR-RSEM pipeline. So I want to ask could RUVSeq also work for RPKM or TPM values? Do I must calculate count values instead?



4 months ago
davide risso
Weill Cornell Medicine
Hi Wei,

RUVSeq is designed to work with gene-level counts. If you are working with TPM or FPKM quantities, you may want to use the ruv package on CRAN or the RUVnormalize package in Bioconductor, depending if you have a supervised or unsupervised problem.

Although these packages were designed for microarray data, you can try to use as input the log of FPKM / TPM, essentially assuming a linear model for these data.

An alternative would be to compute the expected counts from RSEM (perhaps rounding them to nearest integer) and assume that you're dealing with actual counts, using the regular RUVSeq pipeline.

Note that both approaches are non-standard uses of the packages, hence it is extra important for you to check the assumptions of the model (i.e., linear models for ruv / RUVnormalize and log-linear model for RUVSeq).

