Question: count tables created with easyRNASeq and the rpkm() function from edgeR
0
gravatar for Sylvain Foisy
4.8 years ago by
Canada
Sylvain Foisy30 wrote:

Hi,

I created a count table file using easyRNASeq and I am using edgeR downstream for the analysis. I want to create a rpkm-transform file from this but I am hitting a situation where when I use the rpkm() function, I have to provide the gene.length vector for the RPKM calculations and I do not know how to provide it from... 

Thanks in advance

S

edger easyrnaseq rpkm • 1.1k views
ADD COMMENTlink modified 4.8 years ago by Nicolas Delhomme320 • written 4.8 years ago by Sylvain Foisy30
Answer: count tables created with easyRNASeq and the rpkm() function from edgeR
0
gravatar for James W. MacDonald
4.8 years ago by
United States
James W. MacDonald51k wrote:

If you want RPKM, why don't you just compute them using easyRNASeq? It seems like extra work to use easyRNASeq, just to put the data in edgeR, and then compute RPKMs, which edgeR doesn't use anyway.

There are examples in the vignette for easyRNASeq, so you should peruse that carefully.

ADD COMMENTlink written 4.8 years ago by James W. MacDonald51k

Hi James,

The RPKM values are not for use by edgeR: I simply wanted to use edgeR's normalization and rpkm() function to get a table for a custom analysis we want to perform. In our project, regular diff ex is not that informative since we compare very different cell types and we end up with thousands of genes being diff ex...

Thanks for the inputs

S

ADD REPLYlink written 4.8 years ago by Sylvain Foisy30
Answer: count tables created with easyRNASeq and the rpkm() function from edgeR
0
gravatar for Nicolas Delhomme
4.8 years ago by
Sweden
Nicolas Delhomme320 wrote:

Hej Sylvain!

 

There is a growing body of literature that describes why RPKM/FPKM is not the most optimal solution when wanting to conduct differential expression analysis (see e.g. Dillies, M.A., Brief. Bioinf., 2012 and Soneson & Delorenzi, 2013). There's also a video from Lior Pachter (, one of the FPKM author that explains why. For that reason I state in the vignette of easyRNASeq that RPKM should only be use for visualisation if at all. In the upcoming release of easyRNASeq, I will disable RPKM altogether.

The best approach is to get the unmodified not-normalized count table from easyRNASeq and use that for edgeR.

If you really want to compute your FPKM, you can get the gene length from your annotation (your gff3 or retrieve it from biomaRt) and provide it as a named vector to the function. If you do not have an easy mean to extract the gene length, you can run easyRNASeq with outputFormat="RNAseq". This will return you an object of class RNAseq and you can access your annotation by accessing the genomicAnnotation slot (using the genomicAnnotation accessor) or if you computed geneModels, using the geneModel slot. Note that possibly a number of these functions may not be exported by the easyRNASeq package and you need to prepend the function name with easyRNASeq:: if you want to access them.

 HTH,

Nico

ADD COMMENTlink written 4.8 years ago by Nicolas Delhomme320

Hi Nicolas,

Thanks for the inputs; as always, very instructive ;-) The RPKM values are for a custom method that we are trying to develop for very different cell types, that are showing literally thousands of diff ex genes using easyRNASeq for count generation and edgeR analysis... Or could we use normalized counts from edgeR for each gene?

Merci pour les infos!

S

ADD REPLYlink written 4.8 years ago by Sylvain Foisy30

 

Hej Sylvain!

I have used with success, vst or voom transformation approaches (as implemented in the DESeq2 and the voom packages) before doing non-DE type of analysis on rather large datasets (90-ish samples). Have a look at e.g. the chapter on variance stabilising transformation in the DESeq 2 vignette. The issue with FPKM is that it corrects for biases within samples but not across samples, so transforming all your data into FPKM might not make your samples more comparable with one-another. The vst or voom approaches at least correct for the library size effect in a more accurate way and also transforms - to a certain extend, not so well for lowly expressed genes - the data so that it becomes homoschedastic (the variance is independent of the mean), which is a requirement for most common statistical approaches - e.g. linear models.

HTH,

Nico

ADD REPLYlink written 4.8 years ago by Nicolas Delhomme320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 421 users visited in the last hour