Question: using limma for circular RNA analyses
gravatar for tan_pei_fang
8 months ago by
tan_pei_fang0 wrote:

Dear all, I am aligning my RNA-seq reads using bwa-mem and circular RNA quantification using sailfish-circ. I saw some publications using limma to analyze such datasets. However, I am getting strange results using LIMMA. Here is an example that is highly significant with limma with 39 samples. When inspecting this gene (see panel A), I noticed there were 12 samples that had zero counts (points were jittered). When I remove these 12 samples (see Panel B) and re-analyze using lm(), this gene is no longer significant. I have many other examples of significant hits that disappear after similar procedure. So I am wondering if limma is appropriate for circular RNA analysis. I would welcome any suggestions for other tools designed for analysis of such dataset.

Y axis label = log CPM

X axis label = case / control

Boxplot example is in the link provided:

limma edger R • 201 views
ADD COMMENTlink modified 8 months ago by Aaron Lun25k • written 8 months ago by tan_pei_fang0
Answer: using limma for circular RNA analyses
gravatar for Aaron Lun
8 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

I'm not familiar with circular RNA-seq, but unless there's something radically different from standard RNA-seq, I'd say that your data set has some issues. The samples with negative log-CPMs are very different from the other replicates in the same group. This is usually a sign that something has gone wrong - label misassignment is the first thing that comes to mind, followed by a failure to remove low-quality samples. I mean... does the distribution of log-CPMs look Normal to you?

For this gene, the samples with negative log-CPMs are clearly different from those with positive log-CPMs, but you don't really explain why you decided to remove the former. For all we know, the non-zero samples are the wrong samples (e.g., due to PCR jackpot biases or something) and a near-zero expression is the correct assay value for this gene. You need to motivate your decision to remove samples with external information, either experimental (e.g., low RINs indicating low-quality samples) or across other genes (e.g., low total coverage). They should not be removed simply because they show up as negative log-CPMs for this gene.

Now, to answer your question. This gene shows up as DE in the full data set because (i) the group means are obviously quite different, as the second group has many more negative log-CPMs; and (ii) the inflation of the variance due to the negative log-CPMs is probably suppressed by empirical Bayes shrinkage, which avoids the loss of power that would otherwise accompany such variability in the replicates. Once you remove the negative log-CPMs, you reduce the difference in the group means, resulting in a larger p-value. If you end up deciding not to remove any samples, you may consider setting robust=TRUE in eBayes() to avoid effect ii and obtain a large p-value.

ADD COMMENTlink modified 8 months ago • written 8 months ago by Aaron Lun25k

To add on Aaron's answer:

  • circRNA detection can depend heavily on the library prep method for RNA-seq: do you have ribo-depleted samples?
  • circRNA quantification can indeed deliver somewhat lower counts (or estimated counts, if you followed the sailfish-cir pipeline?), so it may happen that sometimes a circ transcript goes undetected in some samples. This would be the case in which you have low expressed circs...
    • yet in this case, the logCPM values are quite high, so I'd reinforce Aaron's opinion that there might be something (at least) funky in your samples



ADD REPLYlink written 8 months ago by Federico Marini110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 192 users visited in the last hour