Question

Negative values after normalizing an agilent microarray dataset with limma

0

Entering edit mode

Konstantinos Yeles ▴ 80

@konstantinos-yeles-8961

Last seen 4 months ago

Italy

Dear Community,

i would like to address an "issue" i have discovered while importing and pre-processing an agilent microarray dataset in R. More specifically, part of my code is the following:

...
SDRF <- read.delim("GSE12435.sdrf.txt",check.names=FALSE,stringsAsFactors=FALSE)
files <- list.files(pattern = "GSM")
dat <- read.maimages(files, source="agilent", green.only=T)

y <- backgroundCorrect(dat,method="normexp")
y <- normalizeBetweenArrays(y,method="quantile")

But then, when inspecting the range of the values of my normalized dataset, i noticed something strange:

> range(y$E)
[1] -1.54555 18.77689

> mat <- y$E
> length(mat[mat<0])
[1] 39
> mat[mat<0]
 [1] -0.797841119 -1.545549855 -0.138353557 -0.797841119 -0.797841119 -0.797841119 -1.123788523
 [8] -0.138353557 -1.123788523 -1.545549855 -0.138353557 -0.797841119 -0.138353557 -1.545549855
....(the rest of the negative values)

Thus, how should i deal with these negative values ? it has something to do with the background correction ? Maybe a naive solution is to add an offset, but of which value ? In other words, is a "general" approach on the offset, in not to change it in various other datasets with similar negative values" ? I also share a histogram of my normalized expression values:

https://www.dropbox.com/s/hokwmhh1ib2jo8n/histogram.png?dl=0

Any help would be great !!

Konstantinos

limma agilent offset background correction microarray • 3.9k views

ADD COMMENT • link updated 8.2 years ago by James W. MacDonald 65k • written 8.2 years ago by Konstantinos Yeles ▴ 80

score 2 · Accepted Answer · 2016-02-04

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 minutes ago

United States

There isn't an issue here. Those are log transformed values, and any normalized value that is < 1 will end up being negative after taking logs.

ADD COMMENT • link 8.2 years ago James W. MacDonald 65k

0

Entering edit mode

Dear James,

thank you for your quick answer---

but these values should not oppose a problem in downstream analysis ? and i should not use an offset anyway ??

ADD REPLY • link 8.2 years ago Konstantinos Yeles ▴ 80

1

Entering edit mode

Well, I think Gordon Smyth's group likes to use an offset of 50, IIRC, for this type of data. The problem with low expressing genes like that is you start to have noise dominating any signal that may be there.

You might consider excluding any genes that are consistently low expressing, on the assumption that they aren't really being expressed (genes that are really low in a subset of samples shouldn't be excluded, because those may well be very interesting). But from a statistical standpoint the negative values don't pose any problem at all.

ADD REPLY • link 8.2 years ago James W. MacDonald 65k