Question: using limma for circular RNA analyses
0
6 months ago by
tan_pei_fang0 wrote:

Dear all, I am aligning my RNA-seq reads using bwa-mem and circular RNA quantification using sailfish-circ. I saw some publications using limma to analyze such datasets. However, I am getting strange results using LIMMA. Here is an example that is highly significant with limma with 39 samples. When inspecting this gene (see panel A), I noticed there were 12 samples that had zero counts (points were jittered). When I remove these 12 samples (see Panel B) and re-analyze using lm(), this gene is no longer significant. I have many other examples of significant hits that disappear after similar procedure. So I am wondering if limma is appropriate for circular RNA analysis. I would welcome any suggestions for other tools designed for analysis of such dataset.

Y axis label = log CPM

X axis label = case / control

Boxplot example is in the link provided:

https://imgur.com/a/Gue060o

limma edger R • 177 views
modified 6 months ago by Aaron Lun24k • written 6 months ago by tan_pei_fang0
Answer: using limma for circular RNA analyses
0
6 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

I'm not familiar with circular RNA-seq, but unless there's something radically different from standard RNA-seq, I'd say that your data set has some issues. The samples with negative log-CPMs are very different from the other replicates in the same group. This is usually a sign that something has gone wrong - label misassignment is the first thing that comes to mind, followed by a failure to remove low-quality samples. I mean... does the distribution of log-CPMs look Normal to you?

For this gene, the samples with negative log-CPMs are clearly different from those with positive log-CPMs, but you don't really explain why you decided to remove the former. For all we know, the non-zero samples are the wrong samples (e.g., due to PCR jackpot biases or something) and a near-zero expression is the correct assay value for this gene. You need to motivate your decision to remove samples with external information, either experimental (e.g., low RINs indicating low-quality samples) or across other genes (e.g., low total coverage). They should not be removed simply because they show up as negative log-CPMs for this gene.

Now, to answer your question. This gene shows up as DE in the full data set because (i) the group means are obviously quite different, as the second group has many more negative log-CPMs; and (ii) the inflation of the variance due to the negative log-CPMs is probably suppressed by empirical Bayes shrinkage, which avoids the loss of power that would otherwise accompany such variability in the replicates. Once you remove the negative log-CPMs, you reduce the difference in the group means, resulting in a larger p-value. If you end up deciding not to remove any samples, you may consider setting robust=TRUE in eBayes() to avoid effect ii and obtain a large p-value.

• circRNA detection can depend heavily on the library prep method for RNA-seq: do you have ribo-depleted samples?
• circRNA quantification can indeed deliver somewhat lower counts (or estimated counts, if you followed the sailfish-cir pipeline?), so it may happen that sometimes a circ transcript goes undetected in some samples. This would be the case in which you have low expressed circs...
• yet in this case, the logCPM values are quite high, so I'd reinforce Aaron's opinion that there might be something (at least) funky in your samples

Cheers,

Federico