Negative values after normalizing an agilent microarray dataset with limma
1
0
Entering edit mode
@konstantinos-yeles-8961
Last seen 4 months ago
Italy

Dear Community,

i would like to address an "issue" i have discovered while importing and pre-processing an agilent microarray dataset  in R. More specifically, part of my code is the following:

...
SDRF <- read.delim("GSE12435.sdrf.txt",check.names=FALSE,stringsAsFactors=FALSE)
files <- list.files(pattern = "GSM")
dat <- read.maimages(files, source="agilent", green.only=T)

y <- backgroundCorrect(dat,method="normexp")
y <- normalizeBetweenArrays(y,method="quantile")

But then, when inspecting the range of the values of my normalized dataset, i noticed something strange:

> range(y$E)
[1] -1.54555 18.77689
> mat <- y$E
> length(mat[mat<0])
[1] 39
> mat[mat<0]
 [1] -0.797841119 -1.545549855 -0.138353557 -0.797841119 -0.797841119 -0.797841119 -1.123788523
 [8] -0.138353557 -1.123788523 -1.545549855 -0.138353557 -0.797841119 -0.138353557 -1.545549855
....(the rest of the negative values)

Thus, how should i deal with these negative values ? it has something to do with the background correction ? Maybe a naive solution is to add an offset, but of which value ? In other words, is a "general" approach on the offset, in not to change it in various other datasets with similar negative values" ?  I also share a histogram of my normalized expression values:

https://www.dropbox.com/s/hokwmhh1ib2jo8n/histogram.png?dl=0

Any help would be great !!

Konstantinos

limma agilent offset background correction microarray • 3.9k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 13 minutes ago
United States

There isn't an issue here. Those are log transformed values, and any normalized value that is < 1 will end up being negative after taking logs.
 

ADD COMMENT
0
Entering edit mode

Dear James,

thank you for your quick answer---

but these values should not oppose a problem in downstream analysis ? and i should not use an offset anyway ??

ADD REPLY
1
Entering edit mode

Well, I think Gordon Smyth's group likes to use an offset of 50, IIRC, for this type of data. The problem with low expressing genes like that is you start to have noise dominating any signal that may be there.

You might consider excluding any genes that are consistently low expressing, on the assumption that they aren't really being expressed (genes that are really low in a subset of samples shouldn't be excluded, because those may well be very interesting). But from a statistical standpoint the negative values don't pose any problem at all.

ADD REPLY

Login before adding your answer.

Traffic: 1092 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6