Hi,
I am using edgeR for RNA-seq analysis and wanted to check: Is it acceptable to include Gender (Male and Female) as a factor in the design when using the filterByExpr function, in addition to the two-group comparison (Group A vs Group B)?
I include Gender as a covariate in the design (during dispersion estimation and fitting the model) like design.2 (see below). I am not sure during the filtering stage should I use design.1 or design.2 ?
Currently, I use the following approach to filter lowly expressed genes:
design.1 <- model.matrix(~0+Group)
keep <- filterByExpr(y, design.1)
table(keep)
design.2 <- model.matrix(~Gender+Group)
keep <- filterByExpr(y, design.2)
table(keep)

Gordon Smyth Yes, Gender is more a nuisance factor in our experiment, and our primary interest is to compare groups. I will just use filterByExpr(y, group=Group).
Regarding the edgeR v4 filtering, I understand that you are mentioning for instance as below; keep only rows that have a count of at least 10 for a minimal number of samples. A recommendation for the minimal number of samples is to specify the smallest group size. Here both counts and sample size can be customized as per the data.
Regarding edgeR v4 filtering, I didn't mention anything about a minimum count of 10 but rather said "a positive count". So a minimalistic filtering would be
where
kis the minimum number of samples to be of interest.I do not recommend the filtering formula in your comment because the uniform cutoff of 10 doesn't take into account differences in library size between samples. Also, for edgeR v4, the requirement for at least 50 counts overall is more than is necessary.
Gordon Smyth Noted, thank you very much.