Jose,
We are facing the same issue analyzing
array CGH data from pairs of single colour
NimbleGen chips. Since NimbleGen data formats
are new and fluid, most R and BioC packages
are still not able to routinely read in and process
the data. This has the beneficial effect of forcing
us to look at plots and re-examine the data and
correction / normalisation processing routines.
Our NimbleGen chips do not have MM for each PM, but
rather NimbleGen adds about a thousand "RANDOM"
probes with characteristics 'similar' to the probe set under
investigation. Thus we can not do the usual MM subtraction
PM(i) - MM(i). Subtracting the median background, or similar
variants (including the "half" algorithm),
induces strong structural changes at the left edge
of the MA data, resembling the shape of the '<' left angle bracket.
Irizarry et al (2003) p. 257 discuss this issue.
"Affymetrix also appears to have noticed that the linear scale is
not appropriate and, in the new version of their analysis algorithm
MAS 5.0, are no using a log scale measure. Specifically the MAS 5.0
signal (measure) is defined as
signal = Tukey Biweight{log(PM(j) - CT(j))}
with CT(j) a quantity derived from the MM that is never bigger than
its PM pair. See Hubbell (2001) for more details.
Each of these measures rely upon the difference PM - MM with
the intention of correcting for non-specific binding. However, the
exploratory analysis presented in Section 3 suggests that the
MM may be detecting signal as well as non-specific binding.
Some researchers (Naef et al, 2001) propose expression measures
based only on the PM."
I have not yet come across journal articles investigating
specific binding (references would be appreciated) to the MM
probes, but this may be part of the issue. Perhaps it is GC
related? We will be investigating this issue.
NimbleGen has adopted the RMA algorithm currently
used by AffyMetrix for background correction and
normalization. BioConductor has package gcrma that
also uses GC information - a review of gcrma correction
is available at
http://bioinf.ncl.ac.uk:16080/support/courses/genespring/RMA%20compari
son%20with%20MAS5.pdf
Not subtracting MM values appears to have much merit.
One obvious benefit is that data is not lost because of the
artificial phenomena of not being able to take the logarithm
of a negative number.
Apparently mismatch probes are doing more than had
been originally thought by many. Background correction is
still an evolving issue. Your observations illustrate the
continued importance of plotting data and thinking about
the available algorithms and their effects. Perhaps RMA
or GCRMA algorithms will produce more reasonable results?
Your latest set of arrays and your a-priori knowledge
should help sort out part of this puzzle. Let us know what
you discover about improved correction / normalisation
methods that allow your known DE genes to show themselves.
Reference:
Irizarry R., et al. (2003). "Exploration, normalization,
and summaries of high density oligonucleotide array probe
level data". Biostatistics Vol 4 (2), pp. 249 - 264
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
email: smckinney at bccrc.ca
tel: 604-675-8000 x7561
BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3
Canada
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch on behalf of
J.delasHeras@ed.ac.uk
Sent: Wed 8/2/2006 6:16 AM
To: <bioc mailing="" list="" subject:="" [bioc]="" strange="" effect="" of="" "half"="" bkg="" substraction="" in="" limma="" i="" use="" limma="" to="" analyse="" my="" 2-colour="" cdna="" arrays.="" i="" usually="" either="" simply="" substract="" background="" (method="" "subtract"),="" or="" don't="" correct="" for="" background="" at="" all="" (for="" a="" number="" of="" reasons="" i="" will="" not="" go="" into="" now).="" in="" one="" of="" my="" latest="" sets="" of="" arrays,="" i="" was="" fortunate="" enough="" to="" know="" some="" of="" teh="" expected="" genes="" to="" be="" differentially="" expressed="" a="" priori="" (from="" previous="" experiments="" and="" rt-pcr="" confirmation).="" i="" substracted="" the="" background,="" as="" i="" did="" for="" a="" similar="" set="" of="" arrays="" (same="" experiments="" on="" a="" different="" cell="" line),="" and="" looked="" for="" the="" genes="" i="" knew="" to="" be="" differentially="" expressed.="" they="" were="" not="" in="" the="" list.="" actually,="" they="" gave="" me="" na="" when="" i="" looked="" for="" them="" on="" my="" normalised="" data="" object.="" the="" reason="" for="" this,="" i="" found="" out,="" was="" that="" i="" was="" having="" slides="" with="" higher="" background="" than="" usual="" (especially="" on="" cy3="" channel),="" and="" the="" local="" background="" for="" that="" group="" of="" genes="" was="" higher="" than="" the="" actual="" signal="" measured="" on="" one="" of="" the="" channels.="" this="" gave="" me="" a="" negative="" intensity="" value="" after="" bkg="" substraction...="" and="" that's="" where="" the="" problem="" lies.="" okay...="" so="" i="" looked="" at="" how="" many="" spots="" had="" negative="" values="" after="" substraction="" in="" at="" least="" one="" channel.="" lots.="" i="" expect="" lots="" of="" spots="" to="" show="" no="" signal="" in="" either="" channel,="" so="" it's="" not="" surprising.="" but="" a="" good="" number="" will="" probably="" have="" no="" signal="" only="" on="" one="" channel.="" these="" are="" actually="" the="" genes="" i="" am="" mainly="" after:="" those="" that="" show="" no="" expression="" before="" my="" treatment,="" but="" get="" activated="" to="" some="" degree="" after="" the="" treatment.="" i="" decided="" to="" convert="" the="" negative="" intensities="" to="" some="" arbitrary="" number="" that="" wouldn't="" give="" me="" trouble.="" i="" decided="" to="" avoid="" a="" value="" between="" 0="" and="" 1="" (logs="" would="" be="" negative="" or="" zero)="" and="" chose="" 1.5.="" just="" because.="" i="" then="" used="" the="" rg="" data,="" corrected="" that="" way,="" to="" continue.="" i="" normalised="" within="" arrays="" (print-tip="" loess)="" and="" between="" arrays="" (scale).="" then="" i="" applied="" the="" linear="" model="" as="" usual="" plus="" ebayes="" on="" that.="" then="" i="" looked="" to="" see="" what="" happened="" to="" my="" group="" of="" known="" genes.="" they="" were="" not="" eliminated="" this="" time,="" that's="" good.="" but="" they="" were="" not="" marked="" as="" de="" (using="" fdr="" <="0.05)." in="" fact...="" every="" spot="" had="" fdr="" above="" 0.9!!!="" i="" thought="" maybe="" i="" had="" made="" a="" mistake="" in="" the="" correction...="" so="" i="" quickly="" repeated="" the="" procedure="" using="" the="" "half"="" methgod="" to="" substract="" background.="" this="" is="" essentially="" what="" i="" did="" before,="" but="" substituting="" negative="" values="" by="" 0.5,="" rather="" than="" 1.5.="" same="" thing!="" you="" can="" see="" the="" ma="" plots="" for="" one="" set="" of="" data,="" using="" either="" no="" background="" correction,="" the="" "substract"="" method,="" the="" "half"="" method,="" or="" my="" own="" correction="" choosing="" "0.5"="" (so="" it's="" the="" same="" as="" "half"...="" i="" only="" put="" it="" to="" make="" sure="" my="" method="" did="" what="" it="" was="" supposed="" to="" do):="" http:="" mcnach.com="" misc="" maplots1.png="" it="" seems="" that="" using="" the="" "half"="" method="" flattens="" all="" the="" differences,="" after="" normalisation...="" i="" am="" guessing="" this="" is="" some="" effect="" of="" the="" normalisation="" procedure...="" i="" used="" "half"="" on="" another="" set="" of="" data="" once,="" without="" this="" effect...="" the="" data="" was="" already="" "flattish"="" where="" all="" the="" m="" values="" were="" no="" bigger="" than="" 2.5,="" and="" the="" background="" was="" pretty="" low="" generally.="" any="" ideas="" about="" what's="" happening="" here?="" incidentally,="" i="" took="" the="" data="" and="" re-analysed="" it="" without="" any="" background="" correction="" at="" all.="" the="" ma="" plot="" for="" the="" same="" set="" looks="" like="" this:="" http:="" mcnach.com="" misc="" maplots2.png="" which="" is="" nice...="" i="" expect="" a="" relatively="" large="" number="" of="" genes="" to="" be="" upregulated,="" and="" many="" to="" be="" activated="" (going="" from="" no="" signal="" or="" almost="" nothing,="" to="" a="" clearly="" detectable="" signal),="" and="" these="" show="" nicely="" along="" the="" top="" upwards="" diagonal="" of="" the="" diamond-shaped="" plot="" (where="" genes="" that="" have="" signal="" only="" on="" my="" treatment="" are="" expected="" to="" cluster).="" my="" known="" genes="" show="" up="" also="" on="" that="" diagonal,="" and="" their="" relative="" position="" also="" fits="" nicely="" with="" the="" results="" obtained="" by="" rt="" (more="" strongly="" reactivated="" genes="" show="" higher="" on="" the="" diagonal).="" the="" fdr="" values="" also="" appeared="" reasonable.="" surprisingly="" (to="" me),="" nor="" removing="" background,="" even="" when="" i="" had="" some="" slides="" that="" didn't="" look="" so="" good,="" gives="" pretty="" solid="" results.="" jose="" --="" dr.="" jose="" i.="" de="" las="" heras="" email:="" j.delasheras="" at="" ed.ac.uk="" the="" wellcome="" trust="" centre="" for="" cell="" biology="" phone:="" +44="" (0)131="" 6513374="" institute="" for="" cell="" &="" molecular="" biology="" fax:="" +44="" (0)131="" 6507360="" swann="" building,="" mayfield="" road="" university="" of="" edinburgh="" edinburgh="" eh9="" 3jr="" uk="" _______________________________________________="" bioconductor="" mailing="" list="" bioconductor="" at="" stat.math.ethz.ch="" https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="" search="" the="" archives:="" http:="" news.gmane.org="" gmane.science.biology.informatics.conductor="" <="" div="">