Question

count tables created with easyRNASeq and the rpkm() function from edgeR

0

Entering edit mode

Sylvain Foisy ▴ 70

@sylvain-foisy-5539

Last seen 4.6 years ago

Canada

Hi,

I created a count table file using easyRNASeq and I am using edgeR downstream for the analysis. I want to create a rpkm-transform file from this but I am hitting a situation where when I use the rpkm() function, I have to provide the gene.length vector for the RPKM calculations and I do not know how to provide it from...

Thanks in advance

S

easyrnaseq edger rpkm • 2.6k views

ADD COMMENT • link updated 9.2 years ago by Nicolas Delhomme ▴ 320 • written 9.2 years ago by Sylvain Foisy ▴ 70

score 0 · Answer 1 · 2015-02-16

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

If you want RPKM, why don't you just compute them using easyRNASeq? It seems like extra work to use easyRNASeq, just to put the data in edgeR, and then compute RPKMs, which edgeR doesn't use anyway.

There are examples in the vignette for easyRNASeq, so you should peruse that carefully.

ADD COMMENT • link 9.2 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James,

The RPKM values are not for use by edgeR: I simply wanted to use edgeR's normalization and rpkm() function to get a table for a custom analysis we want to perform. In our project, regular diff ex is not that informative since we compare very different cell types and we end up with thousands of genes being diff ex...

Thanks for the inputs

S

ADD REPLY • link 9.2 years ago Sylvain Foisy ▴ 70

score 0 · Answer 2 · 2015-02-17

0

Entering edit mode

Nicolas Delhomme ▴ 320

@nicolas-delhomme-6252

Last seen 5.4 years ago

Sweden

Hej Sylvain!

There is a growing body of literature that describes why RPKM/FPKM is not the most optimal solution when wanting to conduct differential expression analysis (see e.g. Dillies, M.A., Brief. Bioinf., 2012 and Soneson & Delorenzi, 2013). There's also a video from Lior Pachter (, one of the FPKM author that explains why. For that reason I state in the vignette of easyRNASeq that RPKM should only be use for visualisation if at all. In the upcoming release of easyRNASeq, I will disable RPKM altogether.

The best approach is to get the unmodified not-normalized count table from easyRNASeq and use that for edgeR.

If you really want to compute your FPKM, you can get the gene length from your annotation (your gff3 or retrieve it from biomaRt) and provide it as a named vector to the function. If you do not have an easy mean to extract the gene length, you can run easyRNASeq with outputFormat="RNAseq". This will return you an object of class RNAseq and you can access your annotation by accessing the genomicAnnotation slot (using the genomicAnnotation accessor) or if you computed geneModels, using the geneModel slot. Note that possibly a number of these functions may not be exported by the easyRNASeq package and you need to prepend the function name with easyRNASeq:: if you want to access them.

HTH,

Nico

ADD COMMENT • link 9.2 years ago Nicolas Delhomme ▴ 320

0

Entering edit mode

Hi Nicolas,

Thanks for the inputs; as always, very instructive ;-) The RPKM values are for a custom method that we are trying to develop for very different cell types, that are showing literally thousands of diff ex genes using easyRNASeq for count generation and edgeR analysis... Or could we use normalized counts from edgeR for each gene?

Merci pour les infos!

S

ADD REPLY • link 9.2 years ago Sylvain Foisy ▴ 70

0

Entering edit mode

Hej Sylvain!

I have used with success, vst or voom transformation approaches (as implemented in the DESeq2 and the voom packages) before doing non-DE type of analysis on rather large datasets (90-ish samples). Have a look at e.g. the chapter on variance stabilising transformation in the DESeq 2 vignette. The issue with FPKM is that it corrects for biases within samples but not across samples, so transforming all your data into FPKM might not make your samples more comparable with one-another. The vst or voom approaches at least correct for the library size effect in a more accurate way and also transforms - to a certain extend, not so well for lowly expressed genes - the data so that it becomes homoschedastic (the variance is independent of the mean), which is a requirement for most common statistical approaches - e.g. linear models.

HTH,

Nico

ADD REPLY • link 9.2 years ago Nicolas Delhomme ▴ 320