Negative values after normalizing an agilent microarray dataset with limma
Entering edit mode
Last seen 5 days ago

Dear Community,

i would like to address an "issue" i have discovered while importing and pre-processing an agilent microarray dataset  in R. More specifically, part of my code is the following:

SDRF <- read.delim("GSE12435.sdrf.txt",check.names=FALSE,stringsAsFactors=FALSE)
files <- list.files(pattern = "GSM")
dat <- read.maimages(files, source="agilent", green.only=T)

y <- backgroundCorrect(dat,method="normexp")
y <- normalizeBetweenArrays(y,method="quantile")

But then, when inspecting the range of the values of my normalized dataset, i noticed something strange:

> range(y$E)
[1] -1.54555 18.77689
> mat <- y$E
> length(mat[mat<0])
[1] 39
> mat[mat<0]
 [1] -0.797841119 -1.545549855 -0.138353557 -0.797841119 -0.797841119 -0.797841119 -1.123788523
 [8] -0.138353557 -1.123788523 -1.545549855 -0.138353557 -0.797841119 -0.138353557 -1.545549855
....(the rest of the negative values)

Thus, how should i deal with these negative values ? it has something to do with the background correction ? Maybe a naive solution is to add an offset, but of which value ? In other words, is a "general" approach on the offset, in not to change it in various other datasets with similar negative values" ?  I also share a histogram of my normalized expression values:

Any help would be great !!


limma agilent offset background correction microarray • 3.6k views
Entering edit mode
Last seen 42 minutes ago
United States

There isn't an issue here. Those are log transformed values, and any normalized value that is < 1 will end up being negative after taking logs.

Entering edit mode

Dear James,

thank you for your quick answer---

but these values should not oppose a problem in downstream analysis ? and i should not use an offset anyway ??

Entering edit mode

Well, I think Gordon Smyth's group likes to use an offset of 50, IIRC, for this type of data. The problem with low expressing genes like that is you start to have noise dominating any signal that may be there.

You might consider excluding any genes that are consistently low expressing, on the assumption that they aren't really being expressed (genes that are really low in a subset of samples shouldn't be excluded, because those may well be very interesting). But from a statistical standpoint the negative values don't pose any problem at all.


Login before adding your answer.

Traffic: 739 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6