Bootstrap confidence intervals on RNA-seq expression from Kallisto
1
5
Entering edit mode
@sean-davis-490
Last seen 24 days ago
United States

The recently published Kallisto package can perform bootstrapping to get rough confidence intervals on transcript quantification.  Has anyone looked into using these estimates or estimates like it in the setting of RNA-seq differential expression?  I'm really more curious than in need of a specific solution, though an implementation is always welcome.

rnaseq deseq2 edger voom limma • 3.1k views
ADD COMMENT
4
Entering edit mode

For what it's worth, I wrote some Bioc-friendly input parsers for the quantification files (to simple matrix, or to SummarizedExperiment) and for the bootstrap (using the great rhdf5 package); happy to accept feedback on these; it would be good as a community to use a consistent set of tools, so there's only one collection of bugs.

ADD REPLY
1
Entering edit mode

One useful feature would be to convert the confidence intervals into weights on the logCPM values, so that one could use them in limma. (Unfortunately, I don't know enough of the mathematics to know how to do that conversion.)

ADD REPLY
0
Entering edit mode

Exactly what I had in mind.  

ADD REPLY
0
Entering edit mode

Reading further, a precision weight is simply the inverse of the estimated variance. So I guess you would just compute the variance of the bootstrap estimates of normalized logCPM for each feature and then take the inverse as the weight.

ADD REPLY
0
Entering edit mode

Kallisto looks promising - I'd check out some of the comparisons between it and Salmon, same principle, different implementations.

http://sjcockell.me/2015/05/18/alignment-free-transcriptome-quantification/

Both Authors got involved.

ADD REPLY
0
Entering edit mode
@paul-harrison-5740
Last seen 9 months ago
Australia/Melbourne/Monash University B…

Since RNA-seq is count based, in an RNA-seq analysis at the gene level the noise in individual counts for genes can be assumed to be Poisson (variance equal to the mean). This is an estimate of "technical variation", but does not include "biological varation". This would be the equivalent of the Kallisto bootstrapping method -- it's considerably simpler because there is no confusion as to which transcript a read is assigned to.

In limma, voom provides precision weights. As Ryan Thompson pointed out, these are simply the inverse of the variances. These voom weights will also include the biological variation component. (And using Kallisto confidence intervals, one might need to also estimate the amount of biological variance before using limma.)

In terms of confidence intervals in the final result of a differential expression analysis: limma's topTable function can provide confidence intervals on log fold change, but note that these are not adjusted for multiple testing. limma and edgeR also provide the TREAT method for finding genes with fold change exceeding some specified amount, and these do provide FDR control.

ADD COMMENT

Login before adding your answer.

Traffic: 459 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6