Question: Bootstrap confidence intervals on RNA-seq expression from Kallisto
gravatar for Sean Davis
2.6 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:

The recently published Kallisto package can perform bootstrapping to get rough confidence intervals on transcript quantification.  Has anyone looked into using these estimates or estimates like it in the setting of RNA-seq differential expression?  I'm really more curious than in need of a specific solution, though an implementation is always welcome.

ADD COMMENTlink modified 4 months ago by Paul Harrison60 • written 2.6 years ago by Sean Davis21k

For what it's worth, I wrote some Bioc-friendly input parsers for the quantification files (to simple matrix, or to SummarizedExperiment) and for the bootstrap (using the great rhdf5 package); happy to accept feedback on these; it would be good as a community to use a consistent set of tools, so there's only one collection of bugs.

ADD REPLYlink written 2.6 years ago by Martin Morgan ♦♦ 21k

One useful feature would be to convert the confidence intervals into weights on the logCPM values, so that one could use them in limma. (Unfortunately, I don't know enough of the mathematics to know how to do that conversion.)

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Ryan C. Thompson6.2k

Exactly what I had in mind.  

ADD REPLYlink written 2.6 years ago by Sean Davis21k

Reading further, a precision weight is simply the inverse of the estimated variance. So I guess you would just compute the variance of the bootstrap estimates of normalized logCPM for each feature and then take the inverse as the weight.

ADD REPLYlink written 16 months ago by Ryan C. Thompson6.2k

Kallisto looks promising - I'd check out some of the comparisons between it and Salmon, same principle, different implementations.

Both Authors got involved.

ADD REPLYlink written 2.6 years ago by andrew.j.skelton73290
gravatar for Paul Harrison
4 months ago by
Paul Harrison60 wrote:

Since RNA-seq is count based, in an RNA-seq analysis at the gene level the noise in individual counts for genes can be assumed to be Poisson (variance equal to the mean). This is an estimate of "technical variation", but does not include "biological varation". This would be the equivalent of the Kallisto bootstrapping method -- it's considerably simpler because there is no confusion as to which transcript a read is assigned to.

In limma, voom provides precision weights. As Ryan Thompson pointed out, these are simply the inverse of the variances. These voom weights will also include the biological variation component. (And using Kallisto confidence intervals, one might need to also estimate the amount of biological variance before using limma.)

In terms of confidence intervals in the final result of a differential expression analysis: limma's topTable function can provide confidence intervals on log fold change, but note that these are not adjusted for multiple testing. limma and edgeR also provide the TREAT method for finding genes with fold change exceeding some specified amount, and these do provide FDR control.

ADD COMMENTlink written 4 months ago by Paul Harrison60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 110 users visited in the last hour