Question: Get the countsFromAbundance as limma input
0
gravatar for Darklings
11 months ago by
Darklings0
Darklings0 wrote:

I am analyzing the TCGA data and doing the differential expression analysis for about 6,000 samples and 20,000 coding genes.

I was suggested to use DESeq2 for DE analysis, but the DESeq() takes an extremely long time with 6,000 samples, therefore I would like to use the limma-voom instead. In the tximport vignette, the scaled counts generated from abundance ("scaledTPM" or "lengthScaledTPM") are recommended to be used in limma-voom. However, I only have the tximport object generated from Salmon and its countsFromAbundance = "no". Thus in this case txi$abundance and txi$counts are the gene-level summarized TPM table and count table from the original Salmon .sf files.

My question is, can I use the gene-level TPM or count table to get counts from abundance? I have no access to the fastq files, and TPM can't be converted into CPM. Or any suggestions for other tools? Thanks!

ADD COMMENTlink modified 11 months ago by Gordon Smyth38k • written 11 months ago by Darklings0
Answer: Get the countsFromAbundance as limma input
2
gravatar for Michael Love
11 months ago by
Michael Love24k
United States
Michael Love24k wrote:

I also use limma-voom for many analyses, including large sample size datasets.

You can generate gene-level counts-from-abundance from gene-level TPMs. I have an unexported function that can help you. I should probably export it, but for now you have to call it like this (the triple colon is for unexported functions):

cts <- tximport:::makeCountsFromAbundance(txi$counts, txi$abundance, txi$length, countsFromAbundance)

where countsFromAbundance is the type you would like to generate (either "scaledTPM" or "lengthScaledTPM").

ADD COMMENTlink written 11 months ago by Michael Love24k

I've exported this function in the devel branch, in case others get stuck in this position, but the recommended way to generate them (for other users) would be with tximport(). This is safer, as tximport() knows exactly what it's dealing with in terms of each matrix.

ADD REPLYlink written 11 months ago by Michael Love24k
Answer: Get the countsFromAbundance as limma input
0
gravatar for Gordon Smyth
11 months ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

For the TCGA data, I would consider using the limma-trend analysis pipeline. This is faster again than limma-voom, although it uses more memory. See Section 15 of the limma User's Guide.

ADD COMMENTlink written 11 months ago by Gordon Smyth38k

Hi Smyth, thanks for your suggestion. Actually I tried both limma-trend and limma-voom, they didn't cost too much time. I found that if using limma-trend, the p-values in toptable are much higher and logFC are much smaller than those in limma-voom's.

The following is my code:

# limma-voom:
v0 <- voom(dge, design0, plot = T)
fit0 <- lmFit(v0, design0)
fit0 <- eBayes(fit0)
tt0 <- topTable(fit0, number=Inf, coef=2)

# limma-trend:
logCPM <- cpm(dge, log=T, prior.count=3)
fit1 <- lmFit(logCPM, design0)
fit1 <- eBayes(fit1, trend = T)
tt1 <- topTable(fit1, number=Inf, coef=2)

According to the limma user guide, limma-trend will usually work well if the ratio of the largest library size to the smallest is not more than about 3-fold. 

And I found:

countsSum <- colSums(counts) # library size
max(countsSum) / min(countsSum)
[1] 657.6432

So maybe my dataset is unsuited for limma-trend...?

ADD REPLYlink modified 11 months ago • written 11 months ago by Darklings0

It would seem that the library sizes are indeed too different for limma-trend.

TCGA RNA-seq samples usually have very consistent library sizes.

ADD REPLYlink written 11 months ago by Gordon Smyth38k

I just drew the logCPM density plot, then I saw some weird lines in both raw & filtered plots (I have highlighted them with blue boxes):  https://drive.google.com/open?id=1qIvW-fiO3MC1QA_0-5putL307Brx7N6h

I filtered the data using the filterByExpr() function. Do these lines mean some samples had a very large proportion of lowly-expressed genes? then maybe I need to exclude these samples first?

ADD REPLYlink written 11 months ago by Darklings0

I don't understand what you are plotting or why you think it is relevant to your analysis. You can't possibly get small log-cpm values like your plot seems to show from a voom analysis.

Anyway, Mike Love and I have answered your original question, so I'm closing the thread from my point of view.

ADD REPLYlink modified 11 months ago • written 11 months ago by Gordon Smyth38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 189 users visited in the last hour