GCRMA/RMA bimodal distribution

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 11.4 years ago

Hi, This has been mentioned before in the context of rma and that it was an artifact of BG correction. http://files.protsuggest.org/biocond/html/5066.html I was very suprised to see that gcrma also gave a very pronouned bimodal distribution. When comparing samples, obviously the relative positions of the 2 peaks may influence observed expression changes. Would such peak shifts be more likely in divergent samples, and if anyone wants to comment on those.... ;-) This example is using 12 chips (biological reps). But I initially noticed it using 3 and 6 chips in rma. Hope attachment works. Cheers, Matt -------------- next part -------------- A non-text attachment was scrubbed... Name: gcrma_dist.png Type: image/png Size: 6633 bytes Desc: gcrma_dist.png Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20040825 /083e56a5/gcrma_dist.png

gcrma gcrma • 1.7k views

ADD COMMENT • link updated 21.5 years ago by Naomi Altman ★ 6.0k • written 21.5 years ago by Matthew Hannah ▴ 940

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

Matthew Hannah wrote: > Hi, > > Sorry for including the developers, but I guess you are the only ones > that will be able to answer this, (and I'm not sure BioC accepts .docs). > I saw a comment from Jean addressing the same question but couldn't find > the reply he referred to. > > https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-August/005 769. > html > > It seems the mouse chip exprs values have a double peak after gcrma > (looking at a density plot). I don't understand the concern here for the distribution of expression values on a given chip (maybe I am missing something?). Is there something inherently wrong with a bimodal distribution, or are you simply assuming that the distribution of expression values on a chip is supposed to be semi-normal (or at least unimodal)? Note that any statistical test that I can think of (if we assume columns = samples and rows = genes) is done on a row by row basis, so I could see a concern if the row data were not all unimodal but what assumption applies to the columns? Best, Jim -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109

ADD COMMENT • link 21.5 years ago James W. MacDonald 68k

0

Entering edit mode

Dear Group, I am trying to compute the statistc for MA and volcano plots. I modified from the following function comparing R code from a Bioconductor workshop. > stat.fx<-function(x,cl){ + index_ACC = x[cl==0] + index_NC=x[cl==3] + tmp<-t.test(x[index_NC],x[index_ACC],var.equal=TRUE) + c(mean(tmp$estimate),-diff(tmp$estimate,tmp$statistic,tmp$p.value)) + } > scores <-stat.fx(adrexp,adr.cl) Error in diff.default(tmp$estimate, tmp$statistic, tmp$p.value) : `lag' and `differences' must be integers >= 1 > Index_ACC = cancer population index_NC = normal population adrexp <-exprs(justRMA()) Why am I getting this error. Is this because my expression set has log fold changes. pleae suggest a solution. thanks ps > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > __________________________________ Y! Messenger - Communicate in real time. Download now.

ADD REPLY • link 21.5 years ago S Peri ▴ 320

0

Entering edit mode

Zhijin Wu ▴ 410

@zhijin-wu-438

Last seen 11.4 years ago

> > Sorry for including the developers, but I guess you are the only ones > that will be able to answer this, (and I'm not sure BioC accepts .docs). > I saw a comment from Jean addressing the same question but couldn't find > the reply he referred to. The original questioin I got was about the bimodal distribution of gcrma result from probe intensities with unimodel distribution. My answer was that the "change" was not necessarily surprising. For example , when you have "true log signal" from a bimodal distribution logS=c(rnorm(1000,3,1),rnorm(1000,8,2)) # You will see this has two peaks par(mfrow=c(2,2)) plot(density(logS)) #if the background, log(non-specific binding) come from logB=rnorm(2000,6,1) #then when you plot the histogram of convolution in log scale, plot(density(log(exp(logS)+exp(logB)))) #you see only one peak, and this would be "before gcrma".

ADD COMMENT • link 21.5 years ago Zhijin Wu ▴ 410

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 11.4 years ago

Hi, Sorry for including the developers, but I guess you are the only ones that will be able to answer this, (and I'm not sure BioC accepts .docs). I saw a comment from Jean addressing the same question but couldn't find the reply he referred to. https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-August/00576 9. html It seems the mouse chip exprs values have a double peak after gcrma (looking at a density plot). As I'd received no response I've been doing some investigating (see attached). Basically gcrma gives a single peaked distribution only for U95 human chips (optimised with these?). Double peaks for exprs estimates appear in the following - U133A(least) - Drosgenome1 - ATH1 (worst). To a lesser extent this also occurs with RMA. U133A has a single wide peak, and then they get worse in the order Dros1 - U95 - ATH1 (The last two have obvious double peaks). >From what has been said this is likely to be a problem of BG correction. I don't know if there are opportunities to change this for RMA, but in GCRMA there are tuning factors and I don't know if the ad-hoc estimate (rather than full model) is causing this to happen. Turning of optical correct had no effect. I wanted to play about with GCRMA to see if the distribution changed with the tuning factors but currently I seem to have an error (see below) with gcrma and justGCRMA not finding gcrma.bg.transformation, and I'm not sure how k should be expressed. I know people should look more at their data but with the ease of just(GC)RMA and RMAexpress I know a lot of people just computing expression measures for different chip types without looking at density of the returned expression. Clearly these people are going to be working with data that may be skewed in some way. I guess that each chip type will need its BG correction optimising for RMA and GCRMA to allow for a better estimate of true expression levels and changes. I really hope this can be fixed as RMA and GCRMA seem to be really useful expression measures and it would be a shame to have to find alternative methods just because they are not optimised for your chip type. Thanks in advance, Matt R devel 2.0, win2k affy 1.5.2 (I know it's not the latest but getBioC is not working for me at the moment) gcrma 1.1.0 <<exprs_meas_comp.doc>> > esetgcrma_slow <- gcrma(raw,fast=FALSE) Computing affinities.Done. Adjusting for optical effect.........Done. Adjusting for non-specific binding.Error in bg.adjust.fullmodel(pms[, i], mms[, i], pm.affinities, mm.affinities, : couldn't find function "gcrma.bg.transformation" > esetgcrma_slow <- justGCRMA(fast=FALSE) Computing affinities..Done. Adjusting for optical effect..........Done. Adjusting for non-specific binding.Error in bg.adjust.fullmodel(pms[, i], mms[, i], pm.affinities, mm.affinities, : couldn't find function "gcrma.bg.transformation" > esetgcrma_k4 <- justGCRMA(k=4*fast+0.5*(1-fast)) Computing affinities..Done. Adjusting for optical effect..........Done. Adjusting for non-specific binding.Error in gcrma.bg.transformation.fast(pms, bhat, var.y, k = k) : Object "fast" not found Hi, This has been mentioned before in the context of rma and that it was an artifact of BG correction. http://files.protsuggest.org/biocond/html/5066.html I was very suprised to see that gcrma also gave a very pronouned bimodal distribution. When comparing samples, obviously the relative positions of the 2 peaks may influence observed expression changes. Would such peak shifts be more likely in divergent samples, and if anyone wants to comment on those.... ;-) This example is using 12 chips (biological reps). But I initially noticed it using 3 and 6 chips in rma. Hope attachment works. Cheers, Matt -------------- next part -------------- A non-text attachment was scrubbed... Name: gcrma_dist.png Type: image/png Size: 6633 bytes Desc: gcrma_dist.png Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20040825/083e5 6a 5/gcrma_dist.png

ADD COMMENT • link 21.5 years ago Matthew Hannah ▴ 940

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.8 years ago

United States

I have used RMA and MAS on ATH arrays, and the distributions are bimodal (both probe-wise and probesets.) Setting a p-value threshold at about .05 (MAS) removes the lower peak. But, like others on this list, I do not really take the p-values too seriously. I am not sure why I should care about the bimodality. The methods I use like t-tests and limma require normality within genes across arrays, and (possibly) a distribution for the variance of the genes, but say nothing otherwise about the distribution of genes on the same array. --Naomi At 06:06 PM 8/31/2004 +0200, Matthew Hannah wrote: >Hi, > >Sorry for including the developers, but I guess you are the only ones >that will be able to answer this, (and I'm not sure BioC accepts .docs). >I saw a comment from Jean addressing the same question but couldn't find >the reply he referred to. > >https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-August/0057 69. >html > >It seems the mouse chip exprs values have a double peak after gcrma >(looking at a density plot). > >As I'd received no response I've been doing some investigating (see >attached). Basically gcrma gives a single peaked distribution only for >U95 human chips (optimised with these?). Double peaks for exprs >estimates appear in the following - U133A(least) - Drosgenome1 - ATH1 >(worst). > >To a lesser extent this also occurs with RMA. U133A has a single wide >peak, and then they get worse in the order Dros1 - U95 - ATH1 (The last >two have obvious double peaks). > > >From what has been said this is likely to be a problem of BG correction. >I don't know if there are opportunities to change this for RMA, but in >GCRMA there are tuning factors and I don't know if the ad-hoc estimate >(rather than full model) is causing this to happen. Turning of optical >correct had no effect. > >I wanted to play about with GCRMA to see if the distribution changed >with the tuning factors but currently I seem to have an error (see >below) with gcrma and justGCRMA not finding gcrma.bg.transformation, and >I'm not sure how k should be expressed. > >I know people should look more at their data but with the ease of >just(GC)RMA and RMAexpress I know a lot of people just computing >expression measures for different chip types without looking at density >of the returned expression. Clearly these people are going to be working >with data that may be skewed in some way. > >I guess that each chip type will need its BG correction optimising for >RMA and GCRMA to allow for a better estimate of true expression levels >and changes. I really hope this can be fixed as RMA and GCRMA seem to be >really useful expression measures and it would be a shame to have to >find alternative methods just because they are not optimised for your >chip type. > >Thanks in advance, >Matt > >R devel 2.0, win2k >affy 1.5.2 (I know it's not the latest but getBioC is not working for me >at the moment) >gcrma 1.1.0 > <<exprs_meas_comp.doc>> > > > esetgcrma_slow <- gcrma(raw,fast=FALSE) >Computing affinities.Done. >Adjusting for optical effect.........Done. >Adjusting for non-specific binding.Error in bg.adjust.fullmodel(pms[, >i], mms[, i], pm.affinities, mm.affinities, : > couldn't find function "gcrma.bg.transformation" > > esetgcrma_slow <- justGCRMA(fast=FALSE) >Computing affinities..Done. >Adjusting for optical effect..........Done. >Adjusting for non-specific binding.Error in bg.adjust.fullmodel(pms[, >i], mms[, i], pm.affinities, mm.affinities, : > couldn't find function "gcrma.bg.transformation" > > esetgcrma_k4 <- justGCRMA(k=4*fast+0.5*(1-fast)) >Computing affinities..Done. >Adjusting for optical effect..........Done. >Adjusting for non-specific binding.Error in >gcrma.bg.transformation.fast(pms, bhat, var.y, k = k) : > Object "fast" not found > > > > > > > > > > > > > >Hi, > >This has been mentioned before in the context of rma and that it was an >artifact of BG correction. > >http://files.protsuggest.org/biocond/html/5066.html > >I was very suprised to see that gcrma also gave a very pronouned bimodal >distribution. When comparing samples, obviously the relative positions >of the 2 peaks may influence observed expression changes. Would such >peak shifts be more likely in divergent samples, and if anyone wants to >comment on those.... ;-) > >This example is using 12 chips (biological reps). But I initially >noticed it using 3 and 6 chips in rma. > >Hope attachment works. > >Cheers, >Matt > > >-------------- next part -------------- >A non-text attachment was scrubbed... >Name: gcrma_dist.png >Type: image/png >Size: 6633 bytes >Desc: gcrma_dist.png >Url : >https://stat.ethz.ch/pipermail/bioconductor/attachments/20040825/083e 56a >5/gcrma_dist.png > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 21.5 years ago Naomi Altman ★ 6.0k

Login before adding your answer.