Get the countsFromAbundance as limma input
2
0
Entering edit mode
mico • 0
@mico-15362
Last seen 14 months ago

I am analyzing the TCGA data and doing the differential expression analysis for about 6,000 samples and 20,000 coding genes.

I was suggested to use DESeq2 for DE analysis, but the DESeq() takes an extremely long time with 6,000 samples, therefore I would like to use the limma-voom instead. In the tximport vignette, the scaled counts generated from abundance ("scaledTPM" or "lengthScaledTPM") are recommended to be used in limma-voom. However, I only have the tximport object generated from Salmon and its countsFromAbundance = "no". Thus in this case txi$abundance and txi$counts are the gene-level summarized TPM table and count table from the original Salmon .sf files.

My question is, can I use the gene-level TPM or count table to get counts from abundance? I have no access to the fastq files, and TPM can't be converted into CPM. Or any suggestions for other tools? Thanks!

tximport limma deseq2 differential expression • 828 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 23 hours ago
United States

I also use limma-voom for many analyses, including large sample size datasets.

You can generate gene-level counts-from-abundance from gene-level TPMs. I have an unexported function that can help you. I should probably export it, but for now you have to call it like this (the triple colon is for unexported functions):

cts <- tximport:::makeCountsFromAbundance(txi$counts, txi$abundance, txi$length, countsFromAbundance)

where countsFromAbundance is the type you would like to generate (either "scaledTPM" or "lengthScaledTPM").

ADD COMMENT
0
Entering edit mode

I've exported this function in the devel branch, in case others get stuck in this position, but the recommended way to generate them (for other users) would be with tximport(). This is safer, as tximport() knows exactly what it's dealing with in terms of each matrix.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 12 hours ago
WEHI, Melbourne, Australia

For the TCGA data, I would consider using the limma-trend analysis pipeline. This is faster again than limma-voom, although it uses more memory. See Section 15 of the limma User's Guide.

ADD COMMENT
0
Entering edit mode

Hi Smyth, thanks for your suggestion. Actually I tried both limma-trend and limma-voom, they didn't cost too much time. I found that if using limma-trend, the p-values in toptable are much higher and logFC are much smaller than those in limma-voom's.

The following is my code:

# limma-voom:
v0 <- voom(dge, design0, plot = T)
fit0 <- lmFit(v0, design0)
fit0 <- eBayes(fit0)
tt0 <- topTable(fit0, number=Inf, coef=2)

# limma-trend:
logCPM <- cpm(dge, log=T, prior.count=3)
fit1 <- lmFit(logCPM, design0)
fit1 <- eBayes(fit1, trend = T)
tt1 <- topTable(fit1, number=Inf, coef=2)

According to the limma user guide, limma-trend will usually work well if the ratio of the largest library size to the smallest is not more than about 3-fold. 

And I found:

countsSum <- colSums(counts) # library size
max(countsSum) / min(countsSum)
[1] 657.6432

So maybe my dataset is unsuited for limma-trend...?

ADD REPLY
0
Entering edit mode

It would seem that the library sizes are indeed too different for limma-trend.

TCGA RNA-seq samples usually have very consistent library sizes.

ADD REPLY
0
Entering edit mode

I just drew the logCPM density plot, then I saw some weird lines in both raw & filtered plots (I have highlighted them with blue boxes):  https://drive.google.com/open?id=1qIvW-fiO3MC1QA_0-5putL307Brx7N6h

I filtered the data using the filterByExpr() function. Do these lines mean some samples had a very large proportion of lowly-expressed genes? then maybe I need to exclude these samples first?

ADD REPLY
0
Entering edit mode

I don't understand what you are plotting or why you think it is relevant to your analysis. You can't possibly get small log-cpm values like your plot seems to show from a voom analysis.

Anyway, Mike Love and I have answered your original question, so I'm closing the thread from my point of view.

ADD REPLY

Login before adding your answer.

Traffic: 369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6