Hi, this is a question for Gordon Smyth and everyone else, thank you.
A few weeks ago I posted a problem on BioconductorHelp, which was about a problematic voom mean-variance trend plot from isoform data. Gordon Smyth responded as follows: "Yes, problems with the voom mean-variance trend is totally to be expected with isoform-level data and it is a concern. It will cause voom to be much less powerful than it usually would be. The problem is caused by variance inflation due to overlap (and hence ambiguity) between isoforms. This is why voom has only ever been recommended for gene-level data."
My understanding is that there is both reduced power and increased false discoveries when all isoforms are analyzed together. For those isoforms whose expression can be more reliably quantified, there is loss of power, while for isoforms with strong variance inflation there are more false discoveries. So a very simple approach would be to quantify uncertainty by bootstrapping in Salmon and assign the isoforms to a small number of groups based on their technical variance from bootstrapping. Then I would run voom separately for these groups.
I think that a similar issue arises for us at the gene level. Our data were generated with deep sequencing (library sizes around 100 million). At the gene level the voom trend plot shows 2 strong clusters of genes, those with low expression (and high and declining SD) and those with moderate to high expression (and lower and nearly constant SD). Here one might also consider running voom separately for these two groups.
Is this grouping approach too simple? Would you mind sharing your opinion on this? Thank you, Ina