EdgeR filtering, gene expression, cpm cutoff
1
0
Entering edit mode
@ilovesuperheroes1993-17038
Last seen 22 months ago

I am running a DE analysis on edgeR. I have 8 biological replicates, in groups of 2 (1 normal and 1 diseased)

What I want to do is keep those genes, for which the cpm is above 4 in at least 4 of the samples (of total 8), irrespective of the group.

Could anyone provide me with the necessary code?

Thank you

edger Tutorial • 659 views
ADD COMMENT
0
Entering edit mode
@hotz-hans-rudolf-3951
Last seen 6 months ago
Switzerland

see page 11 in the edgeR user guide (https://bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf)

 

Regards, Hans-Rudolf

ADD COMMENT
0
Entering edit mode

Yes, I have gone through that. Section 2.6 allows me to keep genes but a minimum no of samples in each group must have cpm above cutoff. I want to remove that restriction and apply that cutoff to a minimum of  ANY 4 samples.

Let's say my cpm values for a gene are :

Sample 1 Group 1--cpm 7

Sample 2 Group 1 -- cpm 8

Sample 3 Group 2-- cpm 5

Sample 4 Group 2 -- cpm 5

Sample 5 Group 3 -- cpm 1

Sample 6 Group 3 -- cpm 0

Sample 7 Group 4 -- cpm 15

Sample 8 Group 4 -- cpm 10

Say my cpm cutoff is 6. There are 4 samples with cpm above 6. So it should be retained even though in group 3, both samples have cpm below 6.

How do I modify the code given in 2.6 of edgeR manual?

ADD REPLY
1
Entering edit mode

No, I think you've misread the User's Guide. The code in Section 2.6 selects genes with cpm above the cutoff in a minimum number of ANY of the samples. Just looking at the code you can see that the group membership is not used in constructing the filter.

To apply your filter is the obvious modification:

keep <- rowSums( cpm(y) > 4 ) >=4

 

ADD REPLY
0
Entering edit mode

Thank you. Yes I think I had misread it. I've pasted the section of the manual below:

> y$samples
group lib.size norm.factors
Sample1 1 10880519 1
Sample2 1 9314747 1
Sample3 1 11959792 1
Sample4 2 7460595 1
Sample5 2 6714958 1
We filter out lowly expressed genes using the following commands:
> keep <- rowSums(cpm(y)>1) >= 2
> y <- y[keep, , keep.lib.sizes=FALSE]
Here, a CPM of 1 corresponds to a count of 6-7 in the smallest sample. A requirement for
expression in two or more libraries is used as the minimum number of samples in each group is two.
This ensures that a gene will be retained if it is only expressed in both samples in group 2. It is
also recommended to recalculate the library sizes of the DGEList object after the filtering though
the difference is usually negligible.  

See the line in bold. I think I was confused between "if it is only expressed" and "only if it is expressed". Just the position of one word changes it's meanings.

Thanks for your prompt reply.

ADD REPLY

Login before adding your answer.

Traffic: 393 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6