Question

tximport recommendation for limma-trend downstream analysis?

1

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 6 months ago

United States

Hi there,

I was looking through the vignette for tximport, and it has recommendations for how to import data for downstream analysis in edgeR, DESeq2 and limma-voom, but it does not mention the lesser-used limma-trend. The edgeR method stores the length corrections in y$offset, but the voom() function does not use the y$offset so tximport recommends importing either "scaledTPM" or "lengthScaledTPM". The limmaUsersGuide() suggests doing logCPM <- cpm(y, log = TRUE, prior.count = 3). I thought that since cpm() is an edgeR function it would use the y$offset, but looking at the code of cpm.DGEList, it doesn't use y$offset either. So am I correct in assuming that I should use the tximport method for limma-voom, but then should use cpm() instead of voom()?

Thanks,

Jenny

tximport limma limma-trend edgeR • 2.6k views

ADD COMMENT • link updated 7.2 years ago by Aaron Lun ★ 28k • written 7.2 years ago by Jenny Drnevich ★ 2.0k

score 1 · Answer 1 · 2017-09-29

You can force cpm to use the offset matrix by passing exp(offset) as the library size. For cpm, I don't think this should make any difference relative to using lengthScaledTPM without offsets, so the only advantage is being able to use the same tximport run for edgeR, DESeq2, and limma.

For voom, I've written a custom version that uses an offset matrix in place of the normalized library sizes: https://github.com/DarwinAwardWinner/CD4-csaw/blob/master/scripts/utilities.R#L254-L390 (Although now that I think about it, it should be possible to modify voom to accept a matrix-like lib.size argument just like cpm, instead of having a separate function for it.) I don't know that it's optimal or handles every edge case, but it has been working for me.

score 1 · Answer 2 · 2017-09-29

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 4 hours ago

The city by the bay

One way of computing log-CPMs with offsets is to do something like this:

cpm(y$counts, lib.size=exp(y$offset), log=TRUE, prior.count=3)

... assuming your offsets are on a scale that is interpretable as the log-library size. This is what edgeR assumes the offsets to be, check out ?scaleOffset.

Also, I assume that the length corrections occur between samples, rather than between genes. edgeR will mostly ignore systematic differences in the sizes of the offsets between genes.

ADD COMMENT • link 7.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks Aaron, yes the correction is just across samples (per gene).

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Thanks, everyone! I like the idea of directly giving the counts and exp(y$offset) as the lib.size in cpm() rather than lengthScaledTPM because my next question was going to be if prior.count = 3 was too large for lengthScaledTPM values, which sum to 1 million as opposed to normalize library sizes which are ~20-50 million.

Best,

Jenny

ADD REPLY • link 7.2 years ago Jenny Drnevich ★ 2.0k

1

Entering edit mode

Check out the reference for tximport. Counts from abundance are on the count scale, and add up to the original library size, not 1e6. They can be thought of as counts but where changes in average transcript length across samples has been divided out.

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Good to know! I probably would have figured that out once I had data in hand. I’ve got 4-6 transcriptome assemblies + Salmon counts coming in soon, so I’ll get lots of practice with tximport. Thanks for a great package!! Jenny From: Michael Love [bioc] [mailto:noreply@bioconductor.org] Sent: Friday, September 29, 2017 11:01 AM To: Zadeh, Jenny Drnevich <drnevich@illinois.edu> Subject: [bioc] C: tximport recommendation for limma-trend downstream analysis? Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Michael Love<https: support.bioconductor.org="" u="" 5822=""/> wrote Comment: tximport recommendation for limma-trend downstream analysis?<https: support.bioconductor.org="" p="" 100969="" #100979="">: Check out the reference for tximport. Counts from abundance are on the count scale, and add up to the original library size, not 1e6. They can be thought of as counts but where changes in average transcript length across samples has been divided out. ________________________________ Post tags: tximport, limma, limma-trend, edgeR You may reply via email or visit C: tximport recommendation for limma-trend downstream analysis?

ADD REPLY • link 7.2 years ago Jenny Drnevich ★ 2.0k

score 0 · Answer 3 · 2017-09-29

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 14 minutes ago

United States

Note in the tximport vignette that instead of count + offset as input to limma-voom, the hand off is to use countsFromAbundance="lengthScaledTPM" to get around the offset.

ADD COMMENT • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Sorry, to be more explicit, countsFromAbundance avoids the need to offset for the average transcript length bias altogether. See the tximport reference for more details.

ADD REPLY • link 7.2 years ago Michael Love 43k