This post is in response to a number of emails and posts asking about reading data into edgeR and producing RPKM.
Reading a data file containing both counts and annotation
Suppose we start with a tab-delimited file
counts.txt like this:
The file contains counts but also gene IDs and an annotation column. To read this into edgeR:
library(edgeR) Data <- read.delim("counts.txt", sep="\t", row.names=1) y <- DGEList(Data, annotation="Length")
To normalize the library sizes and compute a matrix of RPKM values:
y <- normLibSizes(y) RPKM <- rpkm(y)
To make a PCA plot of log-RPKM values
logRPKM <- rpkm(y, log=TRUE) plotMDS(logRPKM, gene.selection="common")
To make an MDS plot from the log-RPKM values
Creating a DGEList from featureCounts
If the count matrix is created using
Rsubread::featureCounts then the output can be transformed to a DGEList directly, without any need for intermediate data files:
fc <- featureCounts( ... ) y <- featureCounts2DGEList(fc)
The resulting DGEList object will automatically include annotation columns including chromosome and gene length.