Is it okay to increase filtering to get more significant Adjusted P Values for DEGs found using Limma?
Entering edit mode
Sara • 0
Last seen 5 months ago
United States


I am performing RNA-Seq Analysis on 6 samples: 3 from an infected group and 3 from control. I have performed alignment and quantification and removed ribosomal-RNA reads.

I use Edge-R to filter and the TMM-normalize my read counts. I typically remove all genes not expressed over 1 CPM in atleast 2 samples (smallest group - 1), as shown below. This filtering decreases my genes from 63k to 18k.

# Function to remove features not expressed over 1 CPM in at least 2 samples
keep <- rowSums(cpm(d)>1) >=2 
d <- d[keep,]
d <- calcNormFactors(d, method = "TMM")

I then perform the following in order to isolate DEGs using Limma:

design <- model.matrix(~d$samples$group) #model by infected vs control groups

# Calculate Weighted Likelihoods, prepare to be linearly modeled
v <- voomWithQualityWeights(d, design, plot = TRUE)

# fit linear model
vfit <- lmFit(v, design)
# Empirical Bayes Transform
vfit <- eBayes(vfit)
# Variance no longer dependent on mean

# Check # of up + down regulated genes
dt <- decideTests(vfit)

# TOP Differential expressed GENES
topTable(vfit, coef=2, = "P")
top <- topTable(vfit,coef=2,number=Inf,"P", adjust.method = "fdr")

conversions <- read.csv("./ENSG_symbol_conversions.csv")
top_gene_table <- left_join(tibble::rownames_to_column(top), conversions, by=c("rowname" = "ensembl_gene_id_version"))
head(top_gene_table, 40)

The issue is that I get no significant FDR adjusted p-values. However, I have 80+ significant p-values (although I know this can contain many false-positives). There is a lot of variation between my samples as can be seen in the image below:

enter image description here

However, When I change my filtering criteria to filter out genes not expressed over 5 CPM in at-least two samples (12.5k genes remaining), I get 13 Significant FDR adjusted p-values and the sample weights are shown above on the right and a slight decrease in variation is seen between some samples. Is there any reason that this extended filtering should be avoided?

Thanks, Sara

RNASeqPower edgeR DifferentialExpression limma RNASeq • 907 views
Entering edit mode
Last seen 15 hours ago
WEHI, Melbourne, Australia

Yes, it is fine to increase the filtering.

In fact, the limma documentation already advises you to do stronger filtering than your original strategy. The limma documentation advises to use filterByExpr(d, design). I wonder why you are not following the documented advice?

You could also try robust=TRUE when running eBayes(), which often gives more DE genes.

Entering edit mode

Hi Gordon Smyth ,

I was following direction from our local sequencing-center's pipeline. Even using basic filterbyExpr() and adding "robust=T" to ebayes() does not result in any significant DEGs on the FDR-adjusted p-value level.

The only thing that results in significant DEGs is further filtration of genes expressed over 5 CPM. So, this is more extensive filtration than I have seen in suggested tutorials, so are there any consequences to be concerned of when using deep filtration? I would not think so but I am a new bioinformatician and always like to hear from those more experienced.

Entering edit mode

As I said, it is fine to filter more deeply. The only real consequence is that you keep fewer genes and therefore lose the possibility of detecting DE at lower expression levels.


Login before adding your answer.

Traffic: 761 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6