Question

Bootstrap confidence intervals on RNA-seq expression from Kallisto

5

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

The recently published Kallisto package can perform bootstrapping to get rough confidence intervals on transcript quantification. Has anyone looked into using these estimates or estimates like it in the setting of RNA-seq differential expression? I'm really more curious than in need of a specific solution, though an implementation is always welcome.

rnaseq deseq2 edger voom limma • 4.7k views

ADD COMMENT • link updated 6.6 years ago by Paul Harrison ▴ 100 • written 8.9 years ago by Sean Davis 21k

4

Entering edit mode

For what it's worth, I wrote some Bioc-friendly input parsers for the quantification files (to simple matrix, or to SummarizedExperiment) and for the bootstrap (using the great rhdf5 package); happy to accept feedback on these; it would be good as a community to use a consistent set of tools, so there's only one collection of bugs.

ADD REPLY • link 8.9 years ago Martin Morgan 25k

1

Entering edit mode

One useful feature would be to convert the confidence intervals into weights on the logCPM values, so that one could use them in limma. (Unfortunately, I don't know enough of the mathematics to know how to do that conversion.)

ADD REPLY • link 8.9 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Exactly what I had in mind.

ADD REPLY • link 8.9 years ago Sean Davis 21k

0

Entering edit mode

Reading further, a precision weight is simply the inverse of the estimated variance. So I guess you would just compute the variance of the bootstrap estimates of normalized logCPM for each feature and then take the inverse as the weight.

ADD REPLY • link 7.7 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Kallisto looks promising - I'd check out some of the comparisons between it and Salmon, same principle, different implementations.

http://sjcockell.me/2015/05/18/alignment-free-transcriptome-quantification/

Both Authors got involved.

ADD REPLY • link 8.9 years ago andrew.j.skelton73 ▴ 370

score 0 · Answer 1 · 2017-09-22

Since RNA-seq is count based, in an RNA-seq analysis at the gene level the noise in individual counts for genes can be assumed to be Poisson (variance equal to the mean). This is an estimate of "technical variation", but does not include "biological varation". This would be the equivalent of the Kallisto bootstrapping method -- it's considerably simpler because there is no confusion as to which transcript a read is assigned to.

In limma, voom provides precision weights. As Ryan Thompson pointed out, these are simply the inverse of the variances. These voom weights will also include the biological variation component. (And using Kallisto confidence intervals, one might need to also estimate the amount of biological variance before using limma.)

In terms of confidence intervals in the final result of a differential expression analysis: limma's topTable function can provide confidence intervals on log fold change, but note that these are not adjusted for multiple testing. limma and edgeR also provide the TREAT method for finding genes with fold change exceeding some specified amount, and these do provide FDR control.