Question

isoform data, deep sequencing and voom

0

Entering edit mode

Ina Hoeschele ▴ 620

@ina-hoeschele-2992

Last seen 2.9 years ago

United States

Hi, this is a question for Gordon Smyth and everyone else, thank you.

A few weeks ago I posted a problem on BioconductorHelp, which was about a problematic voom mean-variance trend plot from isoform data. Gordon Smyth responded as follows: "Yes, problems with the voom mean-variance trend is totally to be expected with isoform-level data and it is a concern. It will cause voom to be much less powerful than it usually would be. The problem is caused by variance inflation due to overlap (and hence ambiguity) between isoforms. This is why voom has only ever been recommended for gene-level data."

My understanding is that there is both reduced power and increased false discoveries when all isoforms are analyzed together. For those isoforms whose expression can be more reliably quantified, there is loss of power, while for isoforms with strong variance inflation there are more false discoveries. So a very simple approach would be to quantify uncertainty by bootstrapping in Salmon and assign the isoforms to a small number of groups based on their technical variance from bootstrapping. Then I would run voom separately for these groups.

I think that a similar issue arises for us at the gene level. Our data were generated with deep sequencing (library sizes around 100 million). At the gene level the voom trend plot shows 2 strong clusters of genes, those with low expression (and high and declining SD) and those with moderate to high expression (and lower and nearly constant SD). Here one might also consider running voom separately for these two groups.

Is this grouping approach too simple? Would you mind sharing your opinion on this? Thank you, Ina

isoform deep sequencing voom • 1.4k views

ADD COMMENT • link updated 4.0 years ago by Gordon Smyth 51k • written 4.0 years ago by Ina Hoeschele ▴ 620

score 0 · Answer 1 · 2020-08-09

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

I use edgeR::catchSalmon to quantify and remove the technical variation measured by bootstrapping transcript-level data. See example(catchSalmon). You can run voom on the resulting DGEList and the problematic mean-variance trend will be corrected. There's no need to run voom separately on different groups.

ADD COMMENT • link 4.0 years ago Gordon Smyth 51k

0

Entering edit mode

thank you very much, Gordon. Has this been described somewhere in a presentation or paper so I can understand what is being done? You are estimating overdispersion for edgeR, but to use this information in voom you must be modifying the effective library sizes? Thanks again, Ina

ADD REPLY • link 4.0 years ago Ina Hoeschele ▴ 620

0

Entering edit mode

thank you very much, Gordon. Has this been described somewhere in a presentation or paper so I can understand what is being done? You are estimating overdispersion for edgeR, but to use this information in voom you must be modifying the effective library sizes? Thanks again, Ina

ADD REPLY • link 4.0 years ago Ina Hoeschele ▴ 620

0

Entering edit mode

thank you very much, Gordon. Has this been described somewhere in a presentation or paper so I can understand what is being done? You are estimating overdispersion for edgeR, but to use this information in voom you must be modifying the effective library sizes? Thanks again, Ina

ADD REPLY • link 4.0 years ago Ina Hoeschele ▴ 620

0

Entering edit mode

thank you very much, Gordon. Has this been described somewhere in a presentation or paper so I can understand what is being done? You are estimating overdispersion for edgeR, but to use this information in voom you must be modifying the effective library sizes? Thanks again, Ina

ADD REPLY • link 4.0 years ago Ina Hoeschele ▴ 620