Question: Negative values after normalizing an agilent microarray dataset with limma
0
3.8 years ago by
University of Salerno, Salerno, Italy
Konstantinos Yeles20 wrote:

Dear Community,

i would like to address an "issue" i have discovered while importing and pre-processing an agilent microarray dataset  in R. More specifically, part of my code is the following:

...
files <- list.files(pattern = "GSM")

y <- backgroundCorrect(dat,method="normexp")
y <- normalizeBetweenArrays(y,method="quantile")

But then, when inspecting the range of the values of my normalized dataset, i noticed something strange:

> range(y$E) [1] -1.54555 18.77689 > mat <- y$E
> length(mat[mat<0])
[1] 39
> mat[mat<0]
[1] -0.797841119 -1.545549855 -0.138353557 -0.797841119 -0.797841119 -0.797841119 -1.123788523
[8] -0.138353557 -1.123788523 -1.545549855 -0.138353557 -0.797841119 -0.138353557 -1.545549855
....(the rest of the negative values)

Thus, how should i deal with these negative values ? it has something to do with the background correction ? Maybe a naive solution is to add an offset, but of which value ? In other words, is a "general" approach on the offset, in not to change it in various other datasets with similar negative values" ?  I also share a histogram of my normalized expression values:

https://www.dropbox.com/s/hokwmhh1ib2jo8n/histogram.png?dl=0

Any help would be great !!

Konstantinos

modified 3.8 years ago by James W. MacDonald51k • written 3.8 years ago by Konstantinos Yeles20
Answer: Negative values after normalizing an agilent microarray dataset with limma
1
3.8 years ago by
United States
James W. MacDonald51k wrote:

There isn't an issue here. Those are log transformed values, and any normalized value that is < 1 will end up being negative after taking logs.

Dear James,

but these values should not oppose a problem in downstream analysis ? and i should not use an offset anyway ??

1

Well, I think Gordon Smyth's group likes to use an offset of 50, IIRC, for this type of data. The problem with low expressing genes like that is you start to have noise dominating any signal that may be there.

You might consider excluding any genes that are consistently low expressing, on the assumption that they aren't really being expressed (genes that are really low in a subset of samples shouldn't be excluded, because those may well be very interesting). But from a statistical standpoint the negative values don't pose any problem at all.