Question

Is it okay to increase filtering to get more significant Adjusted P Values for DEGs found using Limma?

0

Entering edit mode

Sara • 0

@41d09ed8

Last seen 3 months ago

United States

Hello,

I am performing RNA-Seq Analysis on 6 samples: 3 from an infected group and 3 from control. I have performed alignment and quantification and removed ribosomal-RNA reads.

I use Edge-R to filter and the TMM-normalize my read counts. I typically remove all genes not expressed over 1 CPM in atleast 2 samples (smallest group - 1), as shown below. This filtering decreases my genes from 63k to 18k.

# Function to remove features not expressed over 1 CPM in at least 2 samples
keep <- rowSums(cpm(d)>1) >=2 
d <- d[keep,]
d <- calcNormFactors(d, method = "TMM")

I then perform the following in order to isolate DEGs using Limma:

design <- model.matrix(~d$samples$group) #model by infected vs control groups

# Calculate Weighted Likelihoods, prepare to be linearly modeled
v <- voomWithQualityWeights(d, design, plot = TRUE)

# fit linear model
vfit <- lmFit(v, design)
# Empirical Bayes Transform
vfit <- eBayes(vfit)
plotSA(vfit)
# Variance no longer dependent on mean

# Check # of up + down regulated genes
dt <- decideTests(vfit)
summary(dt)

# TOP Differential expressed GENES
topTable(vfit, coef=2, sort.by = "P")
top <- topTable(vfit,coef=2,number=Inf,sort.by="P", adjust.method = "fdr")
sum(top$adj.P.Val<0.05) 

conversions <- read.csv("./ENSG_symbol_conversions.csv")
top_gene_table <- left_join(tibble::rownames_to_column(top), conversions, by=c("rowname" = "ensembl_gene_id_version"))
head(top_gene_table, 40)

The issue is that I get no significant FDR adjusted p-values. However, I have 80+ significant p-values (although I know this can contain many false-positives). There is a lot of variation between my samples as can be seen in the image below:

enter image description here

However, When I change my filtering criteria to filter out genes not expressed over 5 CPM in at-least two samples (12.5k genes remaining), I get 13 Significant FDR adjusted p-values and the sample weights are shown above on the right and a slight decrease in variation is seen between some samples. Is there any reason that this extended filtering should be avoided?

Thanks, Sara

RNASeqPower edgeR DifferentialExpression limma RNASeq • 767 views

ADD COMMENT • link updated 11 months ago by Gordon Smyth 50k • written 11 months ago by Sara • 0

score 0 · Answer 1 · 2023-05-02

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Yes, it is fine to increase the filtering.

In fact, the limma documentation already advises you to do stronger filtering than your original strategy. The limma documentation advises to use filterByExpr(d, design). I wonder why you are not following the documented advice?

You could also try robust=TRUE when running eBayes(), which often gives more DE genes.

ADD COMMENT • link 11 months ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon Smyth ,

I was following direction from our local sequencing-center's pipeline. Even using basic filterbyExpr() and adding "robust=T" to ebayes() does not result in any significant DEGs on the FDR-adjusted p-value level.

The only thing that results in significant DEGs is further filtration of genes expressed over 5 CPM. So, this is more extensive filtration than I have seen in suggested tutorials, so are there any consequences to be concerned of when using deep filtration? I would not think so but I am a new bioinformatician and always like to hear from those more experienced.

ADD REPLY • link 11 months ago Sara • 0

1

Entering edit mode

As I said, it is fine to filter more deeply. The only real consequence is that you keep fewer genes and therefore lose the possibility of detecting DE at lower expression levels.

ADD REPLY • link 11 months ago Gordon Smyth 50k