The recently published Kallisto package can perform bootstrapping to get rough confidence intervals on transcript quantification. Has anyone looked into using these estimates or estimates like it in the setting of RNA-seq differential expression? I'm really more curious than in need of a specific solution, though an implementation is always welcome.
For what it's worth, I wrote some Bioc-friendly input parsers for the quantification files (to simple matrix, or to SummarizedExperiment) and for the bootstrap (using the great rhdf5 package); happy to accept feedback on these; it would be good as a community to use a consistent set of tools, so there's only one collection of bugs.
One useful feature would be to convert the confidence intervals into weights on the logCPM values, so that one could use them in limma. (Unfortunately, I don't know enough of the mathematics to know how to do that conversion.)
Reading further, a precision weight is simply the inverse of the estimated variance. So I guess you would just compute the variance of the bootstrap estimates of normalized logCPM for each feature and then take the inverse as the weight.
Since RNA-seq is count based, in an RNA-seq analysis at the gene level the noise in individual counts for genes can be assumed to be Poisson (variance equal to the mean). This is an estimate of "technical variation", but does not include "biological varation". This would be the equivalent of the Kallisto bootstrapping method -- it's considerably simpler because there is no confusion as to which transcript a read is assigned to.
In limma, voom provides precision weights. As Ryan Thompson pointed out, these are simply the inverse of the variances. These voom weights will also include the biological variation component. (And using Kallisto confidence intervals, one might need to also estimate the amount of biological variance before using limma.)
In terms of confidence intervals in the final result of a differential expression analysis: limma's topTable function can provide confidence intervals on log fold change, but note that these are not adjusted for multiple testing. limma and edgeR also provide the TREAT method for finding genes with fold change exceeding some specified amount, and these do provide FDR control.
For what it's worth, I wrote some Bioc-friendly input parsers for the quantification files (to simple matrix, or to SummarizedExperiment) and for the bootstrap (using the great rhdf5 package); happy to accept feedback on these; it would be good as a community to use a consistent set of tools, so there's only one collection of bugs.
One useful feature would be to convert the confidence intervals into weights on the logCPM values, so that one could use them in limma. (Unfortunately, I don't know enough of the mathematics to know how to do that conversion.)
Exactly what I had in mind.
Reading further, a precision weight is simply the inverse of the estimated variance. So I guess you would just compute the variance of the bootstrap estimates of normalized logCPM for each feature and then take the inverse as the weight.
Kallisto looks promising - I'd check out some of the comparisons between it and Salmon, same principle, different implementations.
http://sjcockell.me/2015/05/18/alignment-free-transcriptome-quantification/
Both Authors got involved.