Filter by expression and keep probes in limma
Entering edit mode
Abdul ▴ 20
Last seen 4 months ago
United States


I am using limma for analyzing both Agilent, Illumina and Affymetrix expression microarray datasets. After normalization, I am interested in filtering out probes that are not expressed. For instance in Illumina and Agilent arrays (see R code below ) we this type of approach, instead is there a way to apply soft or general filter instead of using specific no. of minimum array replicates. Furthermore, I did-not find any specific filter for Affymetrix arrays in the manual. Please assist.

Illumina Arrays:

y <- neqc(x)
## we keep probes that are expressed in at least three arrays according to a detection p-values of 5%:
expressed <- rowSums(y$other$Detection < 0.05) >= 3
y <- y[expressed,]

Agilent Arrays:

y <- normalizeBetweenArrays(y, method="quantile")
## We will filter out control probes as indicated by the ControlType column:
Control <- y$genes$ControlType==1L

## We will also filter out probes with no Entrez Gene Id or Symbol
NoSymbol <-$genes$Symbol)

## Finally, we will filter probes that don’t appear to be expressed. We keep probes that are above background on at least four arrays (because there are four replicates of each treatment):

IsExpr <- rowSums(y$other$gIsWellAboveBG > 0) >= 4

## Now we select the probes to keep in a new data object yfilt:
yfilt <- y[!Control & !NoSymbol & IsExpr, ]

Thank you,


R filter limma Normalization Microarray • 488 views
Entering edit mode
Last seen 4 hours ago
United States

Keeping probes that are above background is a particularly fraught exercise, particularly for Affymetrix arrays. You don't mention which Affy array(s) you are using,but do note that the random primer based arrays all have what Affy calls negative genomic probes, which are probes that are not complementary to any known sequence and have GC content that varies from 0% - 100%. I have posted plots on this support site in the past (which I won't redo for this post), showing that the binding of these probes goes from very low to saturated as GC content goes up.

In other words, as GC content goes up, binding goes up as well, regardless of any complementary sequences that the probes are meant to bind to. Because of this, there is no consistent measure of background binding so you would need to exclude probes based on the expected background binding for probes with similar GC content. I suspect the same is true of Illumina and Agilent arrays, although given the greater probe length maybe it's not as bad.

You could hypothetically use GCRMA to estimate the background, in which case you might be able to use a naive measure of background binding to exclude. I believe there are ways to specify pseudo-MM probes for GCRMA (since Affy stopped using the PM/MM probe design years ago), but it's been years since I analyzed Affy arrays, and even longer since I used GCRMA, so I could be mistaken.

Entering edit mode

James W. MacDonald thank you very much for the reply. All the points are well noted.


Login before adding your answer.

Traffic: 557 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6