Question: Get the countsFromAbundance as limma input
0
9 months ago by
Darklings0 wrote:

I am analyzing the TCGA data and doing the differential expression analysis for about 6,000 samples and 20,000 coding genes.

I was suggested to use DESeq2 for DE analysis, but the DESeq() takes an extremely long time with 6,000 samples, therefore I would like to use the limma-voom instead. In the tximport vignette, the scaled counts generated from abundance ("scaledTPM" or "lengthScaledTPM") are recommended to be used in limma-voom. However, I only have the tximport object generated from Salmon and its countsFromAbundance = "no". Thus in this case txi$abundance and txi$counts are the gene-level summarized TPM table and count table from the original Salmon .sf files.

My question is, can I use the gene-level TPM or count table to get counts from abundance? I have no access to the fastq files, and TPM can't be converted into CPM. Or any suggestions for other tools? Thanks!

modified 9 months ago by Gordon Smyth37k • written 9 months ago by Darklings0
Answer: Get the countsFromAbundance as limma input
2
9 months ago by
Michael Love23k
United States
Michael Love23k wrote:

I also use limma-voom for many analyses, including large sample size datasets.

You can generate gene-level counts-from-abundance from gene-level TPMs. I have an unexported function that can help you. I should probably export it, but for now you have to call it like this (the triple colon is for unexported functions):

cts <- tximport:::makeCountsFromAbundance(txi$counts, txi$abundance, txi\$length, countsFromAbundance)

where countsFromAbundance is the type you would like to generate (either "scaledTPM" or "lengthScaledTPM").

I've exported this function in the devel branch, in case others get stuck in this position, but the recommended way to generate them (for other users) would be with tximport(). This is safer, as tximport() knows exactly what it's dealing with in terms of each matrix.

Answer: Get the countsFromAbundance as limma input
0
9 months ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

For the TCGA data, I would consider using the limma-trend analysis pipeline. This is faster again than limma-voom, although it uses more memory. See Section 15 of the limma User's Guide.

Hi Smyth, thanks for your suggestion. Actually I tried both limma-trend and limma-voom, they didn't cost too much time. I found that if using limma-trend, the p-values in toptable are much higher and logFC are much smaller than those in limma-voom's.

The following is my code:

# limma-voom:
v0 <- voom(dge, design0, plot = T)
fit0 <- lmFit(v0, design0)
fit0 <- eBayes(fit0)
tt0 <- topTable(fit0, number=Inf, coef=2)

# limma-trend:
logCPM <- cpm(dge, log=T, prior.count=3)
fit1 <- lmFit(logCPM, design0)
fit1 <- eBayes(fit1, trend = T)
tt1 <- topTable(fit1, number=Inf, coef=2)

According to the limma user guide, limma-trend will usually work well if the ratio of the largest library size to the smallest is not more than about 3-fold.

And I found:

countsSum <- colSums(counts) # library size
max(countsSum) / min(countsSum)
[1] 657.6432

So maybe my dataset is unsuited for limma-trend...?

It would seem that the library sizes are indeed too different for limma-trend.

TCGA RNA-seq samples usually have very consistent library sizes.

I just drew the logCPM density plot, then I saw some weird lines in both raw & filtered plots (I have highlighted them with blue boxes):  https://drive.google.com/open?id=1qIvW-fiO3MC1QA_0-5putL307Brx7N6h

I filtered the data using the filterByExpr() function. Do these lines mean some samples had a very large proportion of lowly-expressed genes? then maybe I need to exclude these samples first?

I don't understand what you are plotting or why you think it is relevant to your analysis. You can't possibly get small log-cpm values like your plot seems to show from a voom analysis.

Anyway, Mike Love and I have answered your original question, so I'm closing the thread from my point of view.