Question about setting threshold using filtering by expression on genes in edgeR
1
0
Entering edit mode
@mohammedtoufiq91-17679
Last seen 14 days ago
United States

Hi,

I am working with RNA-Seq transcriptomics data, and have a question about setting thresholds using filtering by expression on genes in edgeR. I like this function, and by default, I set the below for my experiment set-up:

keep <- filterByExpr(y, group=Treatment_Timepoint)  ## Test case 1
table(keep)

In the help section of filterByExpr, I used the below code and obtained the same number of genes passing like test case 1.

keep.test <- filterByExpr(y, group=Treatment_Timepoint,
             min.count = 10, min.total.count = 15, large.n = 10, min.prop = 0.7)    ## Test case 2
table(keep.test)

From the test case 1, I would like to know apart from function utilizing y and group=Treatment_Timepoint, is the filterByExpr using any other thresholds or cut-off or it is by default considers arguments below when I run this function?

min.count = 10 
min.total.count = 15
large.n = 10
min.prop = 0.7
dput(Sample.info)

#>           Donor Treatment Timepoint Treatment_Timepoint
#> Sample.1     P1   Control       6hr         Control_6hr
#> Sample.2     P2   Control       6hr         Control_6hr
#> Sample.3     P3   Control       6hr         Control_6hr
#> Sample.4     P4   Control       6hr         Control_6hr
#> Sample.5     P1      High       6hr            High_6hr
#> Sample.6     P2      High       6hr            High_6hr
#> Sample.7     P3      High       6hr            High_6hr
#> Sample.8     P4      High       6hr            High_6hr
#> Sample.9     P1   Control      24hr        Control_24hr
#> Sample.10    P2   Control      24hr        Control_24hr
#> Sample.11    P3   Control      24hr        Control_24hr
#> Sample.12    P4   Control      24hr        Control_24hr
#> Sample.13    P1      High      24hr           High_24hr
#> Sample.14    P2      High      24hr           High_24hr
#> Sample.15    P3      High      24hr           High_24hr
#> Sample.16    P4      High      24hr           High_24hr
#> Sample.17    P1   Control      48hr        Control_48hr
#> Sample.18    P2   Control      48hr        Control_48hr
#> Sample.19    P3   Control      48hr        Control_48hr
#> Sample.20    P4   Control      48hr        Control_48hr
#> Sample.21    P1      High      48hr           High_48hr
#> Sample.22    P2      High      48hr           High_48hr
#> Sample.23    P3      High      48hr           High_48hr
#> Sample.24    P4      High      48hr           High_48hr

Best Regards,

Mohammed

filtering RNA-Seq CPM edgeR filterByExpr • 1.1k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 44 minutes ago
United States

This information is provided in the help page.

Usage:

     ## S3 method for class 'DGEList'
     filterByExpr(y, design = NULL, group = NULL, lib.size = NULL, ...)
     ## S3 method for class 'SummarizedExperiment'
     filterByExpr(y, design = NULL, group = NULL, lib.size = NULL, ...)
     ## Default S3 method:
     filterByExpr(y, design = NULL, group = NULL, lib.size = NULL,
                  min.count = 10, min.total.count = 15, large.n = 10, min.prop = 0.7, ...)

The part where it says Default S3 method shows the default values that are used if you do not specify something different.

ADD COMMENT
0
Entering edit mode

Thank you James W. MacDonald

I frequently use test case 1 for analysis. I believe the arguments by defaults are already set to optimal values for differential expression analyses so I do not need to change them at all.

Are the already set arguments are right for my analysis set-up? If not, what are the best argument values according my sample information?

ADD REPLY
1
Entering edit mode

Yes, your test case 1 code is correct. You do not need to change any preset arguments.

ADD REPLY
0
Entering edit mode

Gordon Smyth perfect, thank you very much. I will use the below:

keep <- filterByExpr(y, group=Treatment_Timepoint)  ## Test case 1
table(keep)

Additionally, referring the help("filterByExpr") min.count and min.total.count concept is clear, however, I am bit confused what actually, large.n=10 and min.prop = 0.7 argument does? Is it considering genes detected in at least 70% of the samples? How shall I relate to my sample information, is it considering the smallest group from the Treatment_Timepoint, anyways all groups have 4 sample each. Meaning to ensure at least 4 samples with a count of 10 or more, where 4 can be chosen as the sample size of the smallest group of samples.

In case, if smaller group with only 2 samples existed in the Treatment_Timepoint, then it would be at least 2 samples with a count of 10 or more, where 2 can be chosen as the sample size of the smallest group of samples.

ADD REPLY
1
Entering edit mode

As the help page explains, the min.prop=7 cutoff only comes into play when all the groups are larger than large.n=10 in size. The help page says:

If all the group sizes are larger than large.n, then this is relaxed slightly, but with n always greater than min.prop of the smallest group size (70% by default).

Your groups are all of size 4, and 4 is less than 10. Hence these arguments do not affect your experiment at all.

For your experiment, filterByExpr will simply keep genes that are expressed in at least 4 samples.

If all your groups were of size 20 intead of 4, then filterByExpr would keep genes expressed in at least 17 samples (where 17 is equal to 100% of the first 10 plus 70% of the extra 10). If all your groups were of size 100, then filterByExpr would keep genes expressed in at least 73 samples. In large sample situations, a gene is usually still of interest if it is expressed in most of the samples for at least one group.

ADD REPLY
0
Entering edit mode

Gordon Smyth For larger groups it is approximately 70%, in my case filterByExpr will simply keep genes that are expressed in at least 4 samples. If I want to express this in terms of percentage, how shall I calculate?

ADD REPLY
1
Entering edit mode

The filtering is not based on a percentage. It makes no sense to try to express it that way.

ADD REPLY

Login before adding your answer.

Traffic: 865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6