Question: Get the countsFromAbundance as limma input
0
gravatar for mico
15 months ago by
mico0
mico0 wrote:

I am analyzing the TCGA data and doing the differential expression analysis for about 6,000 samples and 20,000 coding genes.

I was suggested to use DESeq2 for DE analysis, but the DESeq() takes an extremely long time with 6,000 samples, therefore I would like to use the limma-voom instead. In the tximport vignette, the scaled counts generated from abundance ("scaledTPM" or "lengthScaledTPM") are recommended to be used in limma-voom. However, I only have the tximport object generated from Salmon and its countsFromAbundance = "no". Thus in this case txi$abundance and txi$counts are the gene-level summarized TPM table and count table from the original Salmon .sf files.

My question is, can I use the gene-level TPM or count table to get counts from abundance? I have no access to the fastq files, and TPM can't be converted into CPM. Or any suggestions for other tools? Thanks!

ADD COMMENTlink modified 15 months ago by Gordon Smyth39k • written 15 months ago by mico0
Answer: Get the countsFromAbundance as limma input
2
gravatar for Michael Love
15 months ago by
Michael Love26k
United States
Michael Love26k wrote:

I also use limma-voom for many analyses, including large sample size datasets.

You can generate gene-level counts-from-abundance from gene-level TPMs. I have an unexported function that can help you. I should probably export it, but for now you have to call it like this (the triple colon is for unexported functions):

cts <- tximport:::makeCountsFromAbundance(txi$counts, txi$abundance, txi$length, countsFromAbundance)

where countsFromAbundance is the type you would like to generate (either "scaledTPM" or "lengthScaledTPM").

ADD COMMENTlink written 15 months ago by Michael Love26k

I've exported this function in the devel branch, in case others get stuck in this position, but the recommended way to generate them (for other users) would be with tximport(). This is safer, as tximport() knows exactly what it's dealing with in terms of each matrix.

ADD REPLYlink written 15 months ago by Michael Love26k
Answer: Get the countsFromAbundance as limma input
0
gravatar for Gordon Smyth
15 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

For the TCGA data, I would consider using the limma-trend analysis pipeline. This is faster again than limma-voom, although it uses more memory. See Section 15 of the limma User's Guide.

ADD COMMENTlink written 15 months ago by Gordon Smyth39k

Hi Smyth, thanks for your suggestion. Actually I tried both limma-trend and limma-voom, they didn't cost too much time. I found that if using limma-trend, the p-values in toptable are much higher and logFC are much smaller than those in limma-voom's.

The following is my code:

# limma-voom:
v0 <- voom(dge, design0, plot = T)
fit0 <- lmFit(v0, design0)
fit0 <- eBayes(fit0)
tt0 <- topTable(fit0, number=Inf, coef=2)

# limma-trend:
logCPM <- cpm(dge, log=T, prior.count=3)
fit1 <- lmFit(logCPM, design0)
fit1 <- eBayes(fit1, trend = T)
tt1 <- topTable(fit1, number=Inf, coef=2)

According to the limma user guide, limma-trend will usually work well if the ratio of the largest library size to the smallest is not more than about 3-fold. 

And I found:

countsSum <- colSums(counts) # library size
max(countsSum) / min(countsSum)
[1] 657.6432

So maybe my dataset is unsuited for limma-trend...?

ADD REPLYlink modified 15 months ago • written 15 months ago by mico0

It would seem that the library sizes are indeed too different for limma-trend.

TCGA RNA-seq samples usually have very consistent library sizes.

ADD REPLYlink written 15 months ago by Gordon Smyth39k

I just drew the logCPM density plot, then I saw some weird lines in both raw & filtered plots (I have highlighted them with blue boxes):  https://drive.google.com/open?id=1qIvW-fiO3MC1QA_0-5putL307Brx7N6h

I filtered the data using the filterByExpr() function. Do these lines mean some samples had a very large proportion of lowly-expressed genes? then maybe I need to exclude these samples first?

ADD REPLYlink written 15 months ago by mico0

I don't understand what you are plotting or why you think it is relevant to your analysis. You can't possibly get small log-cpm values like your plot seems to show from a voom analysis.

Anyway, Mike Love and I have answered your original question, so I'm closing the thread from my point of view.

ADD REPLYlink modified 15 months ago • written 15 months ago by Gordon Smyth39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 376 users visited in the last hour