Question

I would like to perform differential expression analysis of lncRNAs using DESeq2 and edgeR.

1

Entering edit mode

John ▴ 10

@0ccfb76d

Last seen 5 months ago

Hong Kong

Hi all,

I read an article titled "Poor Performance of Differential Gene Expression Analysis Tools for Long Non-coding RNA Sequencing Data" (https://pubmed.ncbi.nlm.nih.gov/30041657/). The article's results show that many differential expression analysis pipelines do not control the FDR well (Figure 4). Among those pipelines that relatively well control the FDR, many have very small TPR values. During a previous search, I came across a response from one of the authors of edgeR (https://www.biostars.org/p/9493810/). Based on the author's response, edgeR is capable of fulfilling the differential expression analysis requirements for lncRNA. It's hard for me to be sure which perspective is more accurate.

Furthermore, it has been observed that filterByExpr demonstrates a higher tendency to filter out lncRNAs, although these low expression may be attributed to their intrinsic characteristics. Should I filter the data of mRNA and lncRNA together?

lncRNA_data <- all_data[lncRNA_list,]

mRNA_data <- all_data[mRNA_list,]

lncRNA_filter <- filterByExpr(lncRNA_data)

mRNA_filter <- filterByExpr(mRNA_data )

or

all_filter <- filterByExpr(all_data)

I'm a bit confused now. First, I'm not sure which software is more suitable for conducting differential analysis of lncRNA. Second, I'm not clear whether I should analyze mRNA and lncRNA separately or combine them for analysis and then separate the results for both in the final part. Third, I'm not sure if the threshold for the difference between mRNA and lncRNA is the same, that is, |log2fc| > 1 and fdr value less than 0.05.

All opinions and experiences are greatly appreciated!

diffGeneAnalysis lncRNA • 3.3k views

ADD COMMENT • link 5 months ago John ▴ 10

score 1 · Answer 1 · 2025-07-02

The Genome Biology paper that you link to recommends limma as the best performing method. If you're worried about the results of that paper, why not follow their recommendation?

I am the author of limma as well of edgeR. I wrote the answer you link to from Biostars. I am also the author of the filterByExpr() function. My lab has analyzed over a thousand RNA-seq experiments over the past 20 years. We always include lncRNAs in our analyses and have never observed any problem in doing so. I have never seen any evidence that lncRNAs are systematically more noisy than mRNAs at the same read levels or that they need special treatment.

The only problem with lncRNAs is that they often have low read counts, and it is obviously going to be harder to get significant DE for low count genes than for those with higher counts. That is an intrinsic data limitation rather than an issue of performance of the DE methods, and the same issue is shared with mRNAs that have low counts, of which there always are many. In my opinion, the latest versions of limma (limma-voomLmFit) and edgeR (edgeR v4 QL) are both very reliable for low count genes and are also very robust to filtering (see https://doi.org/10.1093/nar/gkaf018 or https://doi.org/10.1101/2025.04.07.647659 ). I recommend the use of robust empirical Bayes (robust=TRUE) in both cases.

You should analyse mRNA and lncRNAs together, not separately. There is no need to apply any logFC cutoff.

If you have human data with lots of samples, then you the default settings of filterByExpr() are admitedly overly conservative. You could apply very little filtering, and limma-voomLmFit and edgeR4-QL will continue to work well. That would allow you keep all the data in the analysis.