Hi,
I am working with RNA-Seq transcriptomics data, and have a question about setting thresholds using filtering by expression on genes in edgeR
. I like this function, and by default, I set the below for my experiment set-up:
keep <- filterByExpr(y, group=Treatment_Timepoint) ## Test case 1
table(keep)
In the help section of filterByExpr
, I used the below code and obtained the same number of genes passing like test case 1
.
keep.test <- filterByExpr(y, group=Treatment_Timepoint,
min.count = 10, min.total.count = 15, large.n = 10, min.prop = 0.7) ## Test case 2
table(keep.test)
From the test case 1
, I would like to know apart from function utilizing y
and group=Treatment_Timepoint
, is the filterByExpr
using any other thresholds or cut-off or it is by default considers arguments below when I run this function?
min.count = 10
min.total.count = 15
large.n = 10
min.prop = 0.7
dput(Sample.info)
#> Donor Treatment Timepoint Treatment_Timepoint
#> Sample.1 P1 Control 6hr Control_6hr
#> Sample.2 P2 Control 6hr Control_6hr
#> Sample.3 P3 Control 6hr Control_6hr
#> Sample.4 P4 Control 6hr Control_6hr
#> Sample.5 P1 High 6hr High_6hr
#> Sample.6 P2 High 6hr High_6hr
#> Sample.7 P3 High 6hr High_6hr
#> Sample.8 P4 High 6hr High_6hr
#> Sample.9 P1 Control 24hr Control_24hr
#> Sample.10 P2 Control 24hr Control_24hr
#> Sample.11 P3 Control 24hr Control_24hr
#> Sample.12 P4 Control 24hr Control_24hr
#> Sample.13 P1 High 24hr High_24hr
#> Sample.14 P2 High 24hr High_24hr
#> Sample.15 P3 High 24hr High_24hr
#> Sample.16 P4 High 24hr High_24hr
#> Sample.17 P1 Control 48hr Control_48hr
#> Sample.18 P2 Control 48hr Control_48hr
#> Sample.19 P3 Control 48hr Control_48hr
#> Sample.20 P4 Control 48hr Control_48hr
#> Sample.21 P1 High 48hr High_48hr
#> Sample.22 P2 High 48hr High_48hr
#> Sample.23 P3 High 48hr High_48hr
#> Sample.24 P4 High 48hr High_48hr
Best Regards,
Mohammed
Thank you James W. MacDonald
I frequently use
test case 1
for analysis. I believe the arguments by defaults are already set to optimal values for differential expression analyses so I do not need to change them at all.Are the already set arguments are right for my analysis set-up? If not, what are the best argument values according my sample information?
Yes, your test case 1 code is correct. You do not need to change any preset arguments.
Gordon Smyth perfect, thank you very much. I will use the below:
Additionally, referring the
help("filterByExpr")
min.count and min.total.count concept is clear, however, I am bit confused what actually,large.n=10
andmin.prop = 0.7
argument does? Is it considering genes detected in at least 70% of the samples? How shall I relate to my sample information, is it considering the smallest group from theTreatment_Timepoint
, anyways all groups have 4 sample each. Meaning to ensure at least 4 samples with a count of 10 or more, where 4 can be chosen as the sample size of the smallest group of samples.In case, if smaller group with only 2 samples existed in the
Treatment_Timepoint
, then it would be at least 2 samples with a count of 10 or more, where 2 can be chosen as the sample size of the smallest group of samples.As the help page explains, the
min.prop=7
cutoff only comes into play when all the groups are larger thanlarge.n=10
in size. The help page says:Your groups are all of size 4, and 4 is less than 10. Hence these arguments do not affect your experiment at all.
For your experiment,
filterByExpr
will simply keep genes that are expressed in at least 4 samples.If all your groups were of size 20 intead of 4, then
filterByExpr
would keep genes expressed in at least 17 samples (where 17 is equal to 100% of the first 10 plus 70% of the extra 10). If all your groups were of size 100, thenfilterByExpr
would keep genes expressed in at least 73 samples. In large sample situations, a gene is usually still of interest if it is expressed in most of the samples for at least one group.Gordon Smyth For larger groups it is approximately 70%, in my case
filterByExpr
will simply keep genes that are expressed in at least 4 samples. If I want to express this in terms of percentage, how shall I calculate?The filtering is not based on a percentage. It makes no sense to try to express it that way.