Question

Get the countsFromAbundance as limma input

0

Entering edit mode

mico • 0

@mico-15362

Last seen 4 months ago

United States

I am analyzing the TCGA data and doing the differential expression analysis for about 6,000 samples and 20,000 coding genes.

I was suggested to use DESeq2 for DE analysis, but the DESeq() takes an extremely long time with 6,000 samples, therefore I would like to use the limma-voom instead. In the tximport vignette, the scaled counts generated from abundance ("scaledTPM" or "lengthScaledTPM") are recommended to be used in limma-voom. However, I only have the tximport object generated from Salmon and its countsFromAbundance = "no". Thus in this case txi$abundance and txi$counts are the gene-level summarized TPM table and count table from the original Salmon .sf files.

My question is, can I use the gene-level TPM or count table to get counts from abundance? I have no access to the fastq files, and TPM can't be converted into CPM. Or any suggestions for other tools? Thanks!

tximport limma deseq2 differential expression • 2.3k views

ADD COMMENT • link updated 5.9 years ago by Gordon Smyth 51k • written 5.9 years ago by mico • 0

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 2 minutes ago

WEHI, Melbourne, Australia

For the TCGA data, I would consider using the limma-trend analysis pipeline. This is faster again than limma-voom, although it uses more memory. See Section 15 of the limma User's Guide.

ADD COMMENT • link 5.9 years ago Gordon Smyth 51k

0

Entering edit mode

Hi Smyth, thanks for your suggestion. Actually I tried both limma-trend and limma-voom, they didn't cost too much time. I found that if using limma-trend, the p-values in toptable are much higher and logFC are much smaller than those in limma-voom's.

The following is my code:

# limma-voom:
v0 <- voom(dge, design0, plot = T)
fit0 <- lmFit(v0, design0)
fit0 <- eBayes(fit0)
tt0 <- topTable(fit0, number=Inf, coef=2)

# limma-trend:
logCPM <- cpm(dge, log=T, prior.count=3)
fit1 <- lmFit(logCPM, design0)
fit1 <- eBayes(fit1, trend = T)
tt1 <- topTable(fit1, number=Inf, coef=2)

According to the limma user guide, limma-trend will usually work well if the ratio of the largest library size to the smallest is not more than about 3-fold.

And I found:

countsSum <- colSums(counts) # library size
max(countsSum) / min(countsSum)
[1] 657.6432

So maybe my dataset is unsuited for limma-trend...?

ADD REPLY • link 5.9 years ago mico • 0

0

Entering edit mode

It would seem that the library sizes are indeed too different for limma-trend.

TCGA RNA-seq samples usually have very consistent library sizes.

ADD REPLY • link 5.9 years ago Gordon Smyth 51k

0

Entering edit mode

I just drew the logCPM density plot, then I saw some weird lines in both raw & filtered plots (I have highlighted them with blue boxes): https://drive.google.com/open?id=1qIvW-fiO3MC1QA_0-5putL307Brx7N6h

I filtered the data using the filterByExpr() function. Do these lines mean some samples had a very large proportion of lowly-expressed genes? then maybe I need to exclude these samples first?

ADD REPLY • link 5.9 years ago mico • 0

0

Entering edit mode

I don't understand what you are plotting or why you think it is relevant to your analysis. You can't possibly get small log-cpm values like your plot seems to show from a voom analysis.

Anyway, Mike Love and I have answered your original question, so I'm closing the thread from my point of view.

ADD REPLY • link 5.9 years ago Gordon Smyth 51k

score 2 · Accepted Answer · 2018-08-30

2

Entering edit mode

Michael Love 42k

@mikelove

Last seen 8 hours ago

United States

I also use limma-voom for many analyses, including large sample size datasets.

You can generate gene-level counts-from-abundance from gene-level TPMs. I have an unexported function that can help you. I should probably export it, but for now you have to call it like this (the triple colon is for unexported functions):

cts <- tximport:::makeCountsFromAbundance(txi$counts, txi$abundance, txi$length, countsFromAbundance)

where countsFromAbundance is the type you would like to generate (either "scaledTPM" or "lengthScaledTPM").

ADD COMMENT • link 5.9 years ago Michael Love 42k

0

Entering edit mode

I've exported this function in the devel branch, in case others get stuck in this position, but the recommended way to generate them (for other users) would be with tximport(). This is safer, as tximport() knows exactly what it's dealing with in terms of each matrix.

ADD REPLY • link 5.9 years ago Michael Love 42k