Question

Is it okay to increase filtering to get more significant Adjusted P Values for DEGs found using Limma?

0

Entering edit mode

Sara • 0

@41d09ed8

Last seen 17 months ago

United States

Hello,

I am performing RNA-Seq Analysis on 6 samples: 3 from an infected group and 3 from control. I have performed alignment and quantification and removed ribosomal-RNA reads.

I use Edge-R to filter and the TMM-normalize my read counts. I typically remove all genes not expressed over 1 CPM in atleast 2 samples (smallest group - 1), as shown below. This filtering decreases my genes from 63k to 18k.

# Function to remove features not expressed over 1 CPM in at least 2 samples
keep <- rowSums(cpm(d)>1) >=2 
d <- d[keep,]
d <- calcNormFactors(d, method = "TMM")

I then perform the following in order to isolate DEGs using Limma:

design <- model.matrix(~d$samples$group) #model by infected vs control groups

# Calculate Weighted Likelihoods, prepare to be linearly modeled
v <- voomWithQualityWeights(d, design, plot = TRUE)

# fit linear model
vfit <- lmFit(v, design)
# Empirical Bayes Transform
vfit <- eBayes(vfit)
plotSA(vfit)
# Variance no longer dependent on mean

# Check # of up + down regulated genes
dt <- decideTests(vfit)
summary(dt)

# TOP Differential expressed GENES
topTable(vfit, coef=2, sort.by = "P")
top <- topTable(vfit,coef=2,number=Inf,sort.by="P", adjust.method = "fdr")
sum(top$adj.P.Val<0.05) 

conversions <- read.csv("./ENSG_symbol_conversions.csv")
top_gene_table <- left_join(tibble::rownames_to_column(top), conversions, by=c("rowname" = "ensembl_gene_id_version"))
head(top_gene_table, 40)

The issue is that I get no significant FDR adjusted p-values. However, I have 80+ significant p-values (although I know this can contain many false-positives). There is a lot of variation between my samples as can be seen in the image below:

enter image description here

However, When I change my filtering criteria to filter out genes not expressed over 5 CPM in at-least two samples (12.5k genes remaining), I get 13 Significant FDR adjusted p-values and the sample weights are shown above on the right and a slight decrease in variation is seen between some samples. Is there any reason that this extended filtering should be avoided?

Thanks, Sara

RNASeqPower edgeR DifferentialExpression limma RNASeq • 1.4k views

ADD COMMENT • link updated 2.2 years ago by Gordon Smyth 52k • written 2.2 years ago by Sara • 0

score 0 · Answer 1 · 2023-05-02

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 17 minutes ago

WEHI, Melbourne, Australia

Yes, it is fine to increase the filtering.

In fact, the limma documentation already advises you to do stronger filtering than your original strategy. The limma documentation advises to use filterByExpr(d, design). I wonder why you are not following the documented advice?

You could also try robust=TRUE when running eBayes(), which often gives more DE genes.

ADD COMMENT • link 2.2 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon Smyth ,

I was following direction from our local sequencing-center's pipeline. Even using basic filterbyExpr() and adding "robust=T" to ebayes() does not result in any significant DEGs on the FDR-adjusted p-value level.

The only thing that results in significant DEGs is further filtration of genes expressed over 5 CPM. So, this is more extensive filtration than I have seen in suggested tutorials, so are there any consequences to be concerned of when using deep filtration? I would not think so but I am a new bioinformatician and always like to hear from those more experienced.

ADD REPLY • link 2.2 years ago Sara • 0

1

Entering edit mode

As I said, it is fine to filter more deeply. The only real consequence is that you keep fewer genes and therefore lose the possibility of detecting DE at lower expression levels.

ADD REPLY • link 2.2 years ago Gordon Smyth 52k