Question

threshold to filter lowly expressed genes

0

Entering edit mode

myprogramming2016 • 0

@myprogramming2016-9741

Last seen 8.7 years ago

Hello,

I would like to use edgeR for differential Gene expression analysis.

I have read counts data on 50 individuals in three biological replicates.

I would like to filter out lowly expressed genes. Is there a threshold to define express genes?

I was thinking to use CPM of >=2, and it should be in two of the three libraries.

How do I define R code?

Secondly, I would like to output TMM normalized counts to identify uniquely expressed genes in each of the individuals. How do I output the TMM counts in CPM?

Thanks in advance

edger rnaseq • 13k views

ADD COMMENT • link updated 2.0 years ago by Shaimaa Gamal • 0 • written 8.8 years ago by myprogramming2016 • 0

0

Entering edit mode

Thanks for sharing an updated workflow. It is very helpful.

Secondly, I do not have a group (control Vs treated). I would like to identify differentially expressed genes in a particular individual compared with a mean of all the individuals (Intercept). I would like to repeat it for all the individuals.

I was wondering if there is any tutorial or an R script.

Thanks

ADD REPLY • link 8.8 years ago myprogramming2016 • 0

0

Entering edit mode

Could you please reply to my post?

Thanks

ADD REPLY • link 8.8 years ago myprogramming2016 • 0

score 4 · Answer 1 · 2017-03-23

4

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

The section called "Filtering to remove low counts" in the workflow article

https://f1000research.com/articles/5-1438/v2

gives a clear rule of thumb for filtering that you could apply in your situation.

ADD COMMENT • link 8.8 years ago Gordon Smyth 53k

0

Entering edit mode

I think that the current wording is confusing. The explanation states "This ensures that a gene will be retained if it is expressed in both the libraries belonging to any of the six groups.". I read this as the two samples will belong to the same class, whereas the filtering is independent of samples' classes and the two samples with sufficient reads could belong to two different classes.

ADD REPLY • link 8.8 years ago Dario Strbenac ★ 1.6k

1

Entering edit mode

The form of the quoted sentence is "if A then B", but you are reading it to mean "if B then A".

You're not the first person to make the same interpretation, so I guess that making the reasoning longer and even more explicit may have been worthwhile. I'm always surprised by how much confusion the issue of filtering causes. However we are not planning to make changes to this published article. Making changes would require the whole article to be re-refereed.

ADD REPLY • link 8.8 years ago Gordon Smyth 53k

0

Entering edit mode

Indeed, after reading it again, I noticed that I interpreted it incorrectly.

ADD REPLY • link 8.8 years ago Dario Strbenac ★ 1.6k

0

Entering edit mode

How can I asses the performance of the filtering process? I plotted the raw data and the filtered data and it seems there is no change. enter image description here

cpm <- cpm(dge)
lcpm <- cpm(dge, log=TRUE)
L <- mean(dge$samples$lib.size) * 1e-6
M <- median(dge$samples$lib.size) * 1e-6
c(L, M)
lcpm.cutoff <- log2(10/M + 2/L)
library(RColorBrewer)
nsamples <- ncol(dge$counts)
col <- brewer.spectral(nsamples)
par(mfrow=c(1,2))
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.55), las=2, main="", xlab="")
title(main="A. Raw data", xlab="Log-cpm")
abline(v=lcpm.cutoff, lty=3)
for (i in 2:nsamples){
den <- density(lcpm[,i])
lines(den$x, den$y, col=col[i], lwd=2)
}


##FilterbyExpr
keep.exprs <- filterByExpr(dge, group = dge$samples$group)
dge <- dge[keep.exprs,, keep.lib.sizes=FALSE]
dim(dge)

lcpm <- cpm(dge, log=TRUE)
plot(density(lcpm[,1]), col=col[1], lwd=2, ylim=c(0,0.5), las=2, main="", xlab="")
title(main="B. Filtered data", xlab="Log-cpm")
abline(v=lcpm.cutoff, lty=3)
for (i in 2:nsamples){
den <- density(lcpm[,i])
lines(den$x, den$y, col=col[i], lwd=2)
}

ADD REPLY • link 2.0 years ago Shaimaa Gamal • 0