tximport recommendation for limma-trend downstream analysis?
3
1
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 19 hours ago
United States

Hi there,

I was looking through the vignette for tximport, and it has recommendations for how to import data for downstream analysis in edgeR, DESeq2 and limma-voom, but it does not mention the lesser-used limma-trend. The edgeR method stores the length corrections in y$offset, but the voom() function does not use the y$offset so tximport recommends importing either "scaledTPM" or "lengthScaledTPM". The limmaUsersGuide() suggests doing logCPM <- cpm(y, log = TRUE, prior.count = 3). I thought that since cpm() is an edgeR function it would use the y$offset, but looking at the code of cpm.DGEList, it doesn't use y$offset either. So am I correct in assuming that I should use the tximport method for limma-voom, but then should use cpm() instead of voom()?

Thanks,

Jenny

tximport limma limma-trend edgeR • 2.2k views
ADD COMMENT
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 7 months ago
Scripps Research, La Jolla, CA

You can force cpm to use the offset matrix by passing exp(offset) as the library size. For cpm, I don't think this should make any difference relative to using lengthScaledTPM without offsets, so the only advantage is being able to use the same tximport run for edgeR, DESeq2, and limma.

For voom, I've written a custom version that uses an offset matrix in place of the normalized library sizes: https://github.com/DarwinAwardWinner/CD4-csaw/blob/master/scripts/utilities.R#L254-L390 (Although now that I think about it, it should be possible to modify voom to accept a matrix-like lib.size argument just like cpm, instead of having a separate function for it.) I don't know that it's optimal or handles every edge case, but it has been working for me.

ADD COMMENT
0
Entering edit mode

Looks like everyone answered all at once. Nice to see that the Americans are bright-eyed and bushy-tailed!

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

One way of computing log-CPMs with offsets is to do something like this:

cpm(y$counts, lib.size=exp(y$offset), log=TRUE, prior.count=3)

... assuming your offsets are on a scale that is interpretable as the log-library size. This is what edgeR assumes the offsets to be, check out ?scaleOffset.

Also, I assume that the length corrections occur between samples, rather than between genes. edgeR will mostly ignore systematic differences in the sizes of the offsets between genes.

ADD COMMENT
0
Entering edit mode

Thanks Aaron, yes the correction is just across samples (per gene).

ADD REPLY
0
Entering edit mode

Thanks, everyone! I like the idea of directly giving the counts and exp(y$offset) as the lib.size in cpm() rather than lengthScaledTPM because my next question was going to be if prior.count = 3 was too large for lengthScaledTPM values, which sum to 1 million as opposed to normalize library sizes which are ~20-50 million. 

Best,

Jenny

ADD REPLY
1
Entering edit mode

Check out the reference for tximport. Counts from abundance are on the count scale, and add up to the original library size, not 1e6. They can be thought of as counts but where changes in average transcript length across samples has been divided out.

ADD REPLY
0
Entering edit mode
Good to know! I probably would have figured that out once I had data in hand. I’ve got 4-6 transcriptome assemblies + Salmon counts coming in soon, so I’ll get lots of practice with tximport. Thanks for a great package!! Jenny From: Michael Love [bioc] [mailto:noreply@bioconductor.org] Sent: Friday, September 29, 2017 11:01 AM To: Zadeh, Jenny Drnevich <drnevich@illinois.edu> Subject: [bioc] C: tximport recommendation for limma-trend downstream analysis? Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Michael Love<https: support.bioconductor.org="" u="" 5822=""/> wrote Comment: tximport recommendation for limma-trend downstream analysis?<https: support.bioconductor.org="" p="" 100969="" #100979="">: Check out the reference for tximport. Counts from abundance are on the count scale, and add up to the original library size, not 1e6. They can be thought of as counts but where changes in average transcript length across samples has been divided out. ________________________________ Post tags: tximport, limma, limma-trend, edgeR You may reply via email or visit C: tximport recommendation for limma-trend downstream analysis?
ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

Note in the tximport vignette that instead of count + offset as input to limma-voom, the hand off is to use countsFromAbundance="lengthScaledTPM" to get around the offset.

ADD COMMENT
0
Entering edit mode

Sorry, to be more explicit, countsFromAbundance avoids the need to offset for the average transcript length bias altogether. See the tximport reference for more details.

ADD REPLY

Login before adding your answer.

Traffic: 933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6