Question

Question about the filterByExpr function inputs in edgeR

0

Entering edit mode

mohammedtoufiq91 ▴ 10

@mohammedtoufiq91-17679

Last seen 9 minutes ago

United States

Hi,

I am using edgeR for RNA-seq analysis and wanted to check: Is it acceptable to include Gender (Male and Female) as a factor in the design when using the filterByExpr function, in addition to the two-group comparison (Group A vs Group B)?

I include Gender as a covariate in the design (during dispersion estimation and fitting the model) like design.2 (see below). I am not sure during the filtering stage should I use design.1 or design.2 ?

Currently, I use the following approach to filter lowly expressed genes:

design.1 <- model.matrix(~0+Group)

keep <- filterByExpr(y, design.1)

table(keep)


design.2 <- model.matrix(~Gender+Group)

keep <- filterByExpr(y, design.2)

table(keep)

RNASeq filter design edgeR model.matrix • 124 views

ADD COMMENT • link 1 day ago • updated 9 minutes ago mohammedtoufiq91 ▴ 10

score 2 · Answer 1 · 2026-01-13

2

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Either of these choices is perfectly acceptable from an edgeR analysis point of view, it is more to do with which is more relevant for your scientific purposes, and which contrast comparison you are intending to make downstream.

I am guessing that Gender is more a nuisance factor in your experiment, and your primary interest is to compare groups. In that case, I would just use filterByExpr(y, group=Group). No need to make a design matrix at all.

In edgeR v4, the requirement for filtering is greatly reduced. In edgeR v4, you could take a different and more inclusive approach. You could specify the minimum number of samples that you would like a gene to be expressed, in order for a DE result to be of scientific interest to you. That might be the minimum group sample size, or it might be even less. Then just keep genes with a positive count in that many samples. I would recommend this sort of approach particularly for large-scale studies with lots of samples.

ADD COMMENT • link 1 day ago Gordon Smyth 53k

0

Entering edit mode

Gordon Smyth Yes, Gender is more a nuisance factor in our experiment, and our primary interest is to compare groups. I will just use filterByExpr(y, group=Group).

Regarding the edgeR v4 filtering, I understand that you are mentioning for instance as below; keep only rows that have a count of at least 10 for a minimal number of samples. A recommendation for the minimal number of samples is to specify the smallest group size. Here both counts and sample size can be customized as per the data.

keep <- rowSums(y$counts >= 10) >= 5

ADD REPLY • link 13 hours ago mohammedtoufiq91 ▴ 10

1

Entering edit mode

Regarding edgeR v4 filtering, I didn't mention anything about a minimum count of 10 but rather said "a positive count". So a minimalistic filtering would be

keep <- rowSums(y$counts > 0) >= k

where k is the minimum number of samples to be of interest.

I do not recommend the filtering formula in your comment because the uniform cutoff of 10 doesn't take into account differences in library size between samples. Also, for edgeR v4, the requirement for at least 50 counts overall is more than is necessary.