Question

strange effect of "half" bkg substraction in Limma

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 8.7 years ago

United Kingdom

I use Limma to analyse my 2-colour cDNA arrays. I usually either simply substract background (method "subtract"), or don't correct for background at all (for a number of reasons I will not go into now). In one of my latest sets of arrays, I was fortunate enough to know some of teh expected genes to be differentially expressed a priori (from previous experiments and RT-PCR confirmation). I substracted the background, as I did for a similar set of arrays (same experiments on a different cell line), and looked for the genes I knew to be differentially expressed. They were not in the list. Actually, they gave me NA when I looked for them on my normalised data object. The reason for this, I found out, was that I was having slides with higher background than usual (especially on Cy3 channel), and the local background for that group of genes was higher than the actual signal measured on ONE of the channels. This gave me a negative intensity value after bkg substraction... and that's where the problem lies. Okay... so I looked at how many spots had negative values after substraction in at least one channel. Lots. I expect lots of spots to show no signal in either channel, so it's not surprising. But a good number will probably have no signal only on one channel. These are actually the genes I am mainly after: those that show no expression before my treatment, but get activated to some degree after the treatment. I decided to convert the negative intensities to some arbitrary number that wouldn't give me trouble. I decided to avoid a value between 0 and 1 (logs would be negative or zero) and chose 1.5. Just because. I then used the RG data, corrected that way, to continue. I normalised within arrays (print-tip loess) and between arrays (scale). Then I applied the linear model as usual plus eBayes on that. Then I looked to see what happened to my group of known genes. They were not eliminated this time, that's good. BUT they were not marked as DE (using FDR <= 0.05). In fact... EVERY spot had FDR above 0.9!!! I thought maybe I had made a mistake in the correction... so I quickly repeated the procedure using the "half" methgod to substract background. This is essentially what I did before, but substituting negative values by 0.5, rather than 1.5. same thing! You can see the MA plots for one set of data, using either no background correction, the "substract" method, the "half" method, or my own correction choosing "0.5" (so it's the same as "half"... I only put it to make sure my method did what it was supposed to do): http://mcnach.com/MISC/MAplots1.png It seems that using the "half" method flattens all the differences, after normalisation... I am guessing this is some effect of the normalisation procedure... I used "half" on another set of data once, without this effect... the data was already "flattish" where all the M values were no bigger than 2.5, and the background was pretty low generally. Any ideas about what's happening here? Incidentally, I took the data and re-analysed it without any background correction at all. The MA plot for the same set looks like this: http://mcnach.com/MISC/MAplots2.png which is nice... I expect a relatively large number of genes to be upregulated, and many to be activated (going from no signal or almost nothing, to a clearly detectable signal), and these show nicely along the top upwards diagonal of the diamond-shaped plot (where genes that have signal only on my treatment are expected to cluster). My known genes show up also on that diagonal, and their relative position also fits nicely with the results obtained by RT (more strongly reactivated genes show higher on the diagonal). The FDR values also appeared reasonable. Surprisingly (to me), nor removing background, even when I had some slides that didn't look so good, gives pretty solid results. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

limma convert limma convert • 1.2k views

ADD COMMENT • link updated 17.7 years ago by Steven McKinney ▴ 310 • written 17.7 years ago by J.delasHeras@ed.ac.uk ★ 1.9k

score 0 · Answer 1 · 2006-08-02

Jose, We are facing the same issue analyzing array CGH data from pairs of single colour NimbleGen chips. Since NimbleGen data formats are new and fluid, most R and BioC packages are still not able to routinely read in and process the data. This has the beneficial effect of forcing us to look at plots and re-examine the data and correction / normalisation processing routines. Our NimbleGen chips do not have MM for each PM, but rather NimbleGen adds about a thousand "RANDOM" probes with characteristics 'similar' to the probe set under investigation. Thus we can not do the usual MM subtraction PM(i) - MM(i). Subtracting the median background, or similar variants (including the "half" algorithm), induces strong structural changes at the left edge of the MA data, resembling the shape of the '<' left angle bracket. Irizarry et al (2003) p. 257 discuss this issue. "Affymetrix also appears to have noticed that the linear scale is not appropriate and, in the new version of their analysis algorithm MAS 5.0, are no using a log scale measure. Specifically the MAS 5.0 signal (measure) is defined as signal = Tukey Biweight{log(PM(j) - CT(j))} with CT(j) a quantity derived from the MM that is never bigger than its PM pair. See Hubbell (2001) for more details. Each of these measures rely upon the difference PM - MM with the intention of correcting for non-specific binding. However, the exploratory analysis presented in Section 3 suggests that the MM may be detecting signal as well as non-specific binding. Some researchers (Naef et al, 2001) propose expression measures based only on the PM." I have not yet come across journal articles investigating specific binding (references would be appreciated) to the MM probes, but this may be part of the issue. Perhaps it is GC related? We will be investigating this issue. NimbleGen has adopted the RMA algorithm currently used by AffyMetrix for background correction and normalization. BioConductor has package gcrma that also uses GC information - a review of gcrma correction is available at http://bioinf.ncl.ac.uk:16080/support/courses/genespring/RMA%20compari son%20with%20MAS5.pdf Not subtracting MM values appears to have much merit. One obvious benefit is that data is not lost because of the artificial phenomena of not being able to take the logarithm of a negative number. Apparently mismatch probes are doing more than had been originally thought by many. Background correction is still an evolving issue. Your observations illustrate the continued importance of plotting data and thinking about the available algorithms and their effects. Perhaps RMA or GCRMA algorithms will produce more reasonable results? Your latest set of arrays and your a-priori knowledge should help sort out part of this puzzle. Let us know what you discover about improved correction / normalisation methods that allow your known DE genes to show themselves. Reference: Irizarry R., et al. (2003). "Exploration, normalization, and summaries of high density oligonucleotide array probe level data". Biostatistics Vol 4 (2), pp. 249 - 264 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney at bccrc.ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch on behalf of J.delasHeras@ed.ac.uk Sent: Wed 8/2/2006 6:16 AM To: <bioc mailing="" list="" subject:="" [bioc]="" strange="" effect="" of="" "half"="" bkg="" substraction="" in="" limma="" i="" use="" limma="" to="" analyse="" my="" 2-colour="" cdna="" arrays.="" i="" usually="" either="" simply="" substract="" background="" (method="" "subtract"),="" or="" don't="" correct="" for="" background="" at="" all="" (for="" a="" number="" of="" reasons="" i="" will="" not="" go="" into="" now).="" in="" one="" of="" my="" latest="" sets="" of="" arrays,="" i="" was="" fortunate="" enough="" to="" know="" some="" of="" teh="" expected="" genes="" to="" be="" differentially="" expressed="" a="" priori="" (from="" previous="" experiments="" and="" rt-pcr="" confirmation).="" i="" substracted="" the="" background,="" as="" i="" did="" for="" a="" similar="" set="" of="" arrays="" (same="" experiments="" on="" a="" different="" cell="" line),="" and="" looked="" for="" the="" genes="" i="" knew="" to="" be="" differentially="" expressed.="" they="" were="" not="" in="" the="" list.="" actually,="" they="" gave="" me="" na="" when="" i="" looked="" for="" them="" on="" my="" normalised="" data="" object.="" the="" reason="" for="" this,="" i="" found="" out,="" was="" that="" i="" was="" having="" slides="" with="" higher="" background="" than="" usual="" (especially="" on="" cy3="" channel),="" and="" the="" local="" background="" for="" that="" group="" of="" genes="" was="" higher="" than="" the="" actual="" signal="" measured="" on="" one="" of="" the="" channels.="" this="" gave="" me="" a="" negative="" intensity="" value="" after="" bkg="" substraction...="" and="" that's="" where="" the="" problem="" lies.="" okay...="" so="" i="" looked="" at="" how="" many="" spots="" had="" negative="" values="" after="" substraction="" in="" at="" least="" one="" channel.="" lots.="" i="" expect="" lots="" of="" spots="" to="" show="" no="" signal="" in="" either="" channel,="" so="" it's="" not="" surprising.="" but="" a="" good="" number="" will="" probably="" have="" no="" signal="" only="" on="" one="" channel.="" these="" are="" actually="" the="" genes="" i="" am="" mainly="" after:="" those="" that="" show="" no="" expression="" before="" my="" treatment,="" but="" get="" activated="" to="" some="" degree="" after="" the="" treatment.="" i="" decided="" to="" convert="" the="" negative="" intensities="" to="" some="" arbitrary="" number="" that="" wouldn't="" give="" me="" trouble.="" i="" decided="" to="" avoid="" a="" value="" between="" 0="" and="" 1="" (logs="" would="" be="" negative="" or="" zero)="" and="" chose="" 1.5.="" just="" because.="" i="" then="" used="" the="" rg="" data,="" corrected="" that="" way,="" to="" continue.="" i="" normalised="" within="" arrays="" (print-tip="" loess)="" and="" between="" arrays="" (scale).="" then="" i="" applied="" the="" linear="" model="" as="" usual="" plus="" ebayes="" on="" that.="" then="" i="" looked="" to="" see="" what="" happened="" to="" my="" group="" of="" known="" genes.="" they="" were="" not="" eliminated="" this="" time,="" that's="" good.="" but="" they="" were="" not="" marked="" as="" de="" (using="" fdr="" <="0.05)." in="" fact...="" every="" spot="" had="" fdr="" above="" 0.9!!!="" i="" thought="" maybe="" i="" had="" made="" a="" mistake="" in="" the="" correction...="" so="" i="" quickly="" repeated="" the="" procedure="" using="" the="" "half"="" methgod="" to="" substract="" background.="" this="" is="" essentially="" what="" i="" did="" before,="" but="" substituting="" negative="" values="" by="" 0.5,="" rather="" than="" 1.5.="" same="" thing!="" you="" can="" see="" the="" ma="" plots="" for="" one="" set="" of="" data,="" using="" either="" no="" background="" correction,="" the="" "substract"="" method,="" the="" "half"="" method,="" or="" my="" own="" correction="" choosing="" "0.5"="" (so="" it's="" the="" same="" as="" "half"...="" i="" only="" put="" it="" to="" make="" sure="" my="" method="" did="" what="" it="" was="" supposed="" to="" do):="" http:="" mcnach.com="" misc="" maplots1.png="" it="" seems="" that="" using="" the="" "half"="" method="" flattens="" all="" the="" differences,="" after="" normalisation...="" i="" am="" guessing="" this="" is="" some="" effect="" of="" the="" normalisation="" procedure...="" i="" used="" "half"="" on="" another="" set="" of="" data="" once,="" without="" this="" effect...="" the="" data="" was="" already="" "flattish"="" where="" all="" the="" m="" values="" were="" no="" bigger="" than="" 2.5,="" and="" the="" background="" was="" pretty="" low="" generally.="" any="" ideas="" about="" what's="" happening="" here?="" incidentally,="" i="" took="" the="" data="" and="" re-analysed="" it="" without="" any="" background="" correction="" at="" all.="" the="" ma="" plot="" for="" the="" same="" set="" looks="" like="" this:="" http:="" mcnach.com="" misc="" maplots2.png="" which="" is="" nice...="" i="" expect="" a="" relatively="" large="" number="" of="" genes="" to="" be="" upregulated,="" and="" many="" to="" be="" activated="" (going="" from="" no="" signal="" or="" almost="" nothing,="" to="" a="" clearly="" detectable="" signal),="" and="" these="" show="" nicely="" along="" the="" top="" upwards="" diagonal="" of="" the="" diamond-shaped="" plot="" (where="" genes="" that="" have="" signal="" only="" on="" my="" treatment="" are="" expected="" to="" cluster).="" my="" known="" genes="" show="" up="" also="" on="" that="" diagonal,="" and="" their="" relative="" position="" also="" fits="" nicely="" with="" the="" results="" obtained="" by="" rt="" (more="" strongly="" reactivated="" genes="" show="" higher="" on="" the="" diagonal).="" the="" fdr="" values="" also="" appeared="" reasonable.="" surprisingly="" (to="" me),="" nor="" removing="" background,="" even="" when="" i="" had="" some="" slides="" that="" didn't="" look="" so="" good,="" gives="" pretty="" solid="" results.="" jose="" --="" dr.="" jose="" i.="" de="" las="" heras="" email:="" j.delasheras="" at="" ed.ac.uk="" the="" wellcome="" trust="" centre="" for="" cell="" biology="" phone:="" +44="" (0)131="" 6513374="" institute="" for="" cell="" &="" molecular="" biology="" fax:="" +44="" (0)131="" 6507360="" swann="" building,="" mayfield="" road="" university="" of="" edinburgh="" edinburgh="" eh9="" 3jr="" uk="" _______________________________________________="" bioconductor="" mailing="" list="" bioconductor="" at="" stat.math.ethz.ch="" https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="" search="" the="" archives:="" http:="" news.gmane.org="" gmane.science.biology.informatics.conductor="" <="" div="">