Group-wise gcrma normalization

0

Entering edit mode

Jun Yin ▴ 30

@jun-yin-2690

Last seen 9.6 years ago

Hi, all, I have a problem with normalizing my Affymetrix microarray data. We are using Affymetrix Zebrafish Genome Array. The experiment design includes three treatments, namely A, C and G. We have three biological replicates for each treatment, thus A1, A2, A3, C1, C2, C3 and G1, G2, G3. A1, C1 and G1 were from the first batch (microarray experiment was performed earlier). A2, A3, C2, C3 and G2, G3 were from the second batch. We have very strong batch effect. If I use gcrma to normalize the data, the only effect I can see is the batch effect, e.g. in the hierarchical clustering, A1, C1 and G1 are clustered together. Then, no matter what comparison I used, I cannot get any differentially expressed genes from the data. It is obviously because the batch effect (or background noise between batch) destroyed everything. By accident, I used gcrma to normalize the three replicates from each treatment separately. Something dramatically happened, like this: $data<-ReadAffy() $data1.gcrma<-gcrma(data[,1:3]) #A samples $data2.gcrma<-gcrma(data[,4:6]) #C samples $data3.gcrma<-gcrma(data[,7:9]) #G samples $data.gcrma.exprs<-cbind(exprs(data1.gcrma,data2.gcrma,data3.gcrma)) Then, all the batch effects were gone. The variance within each group/treatment was dramatically reduced. But then, I realized that gcrma/rma uses median polish to summarize the probe set value, which iteratively substracting row median and column median. The probe set signal is calculated by adding global median to column median, thus highly depends on the original column median of the probe set. It probably introduced artifact if I normalize different groups separately. The most interesting thing is that the genes we expected was in the gene list generated by the group-wise gcrma normalization. So, I just wonder if there is any reason that this group-wise gcrma is acceptable. I am kinda desperate on deciding whether to use the data or discard everything. Because the batch effect is so strong and also because of the small sample size, no normalization works so far (gcrma, rma, mas5, loess/quantile/contrasts/scale normalization). Thanks in advance. Jun Yin Ph.D. student in U.C.D. 2009-02-25 Bioinformatics Laboratory Conway Institute University College Dublin [[alternative HTML version deleted]]

Microarray Normalization Clustering zebrafish probe gcrma Microarray Normalization probe • 1.3k views

ADD COMMENT • link updated 15.2 years ago by Kasper Daniel Hansen ★ 6.5k • written 15.2 years ago by Jun Yin ▴ 30

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 9 months ago

United States

On Feb 25, 2009, at 14:57 , Jun Yin wrote: > Hi, all, > > I have a problem with normalizing my Affymetrix microarray data. We > are using Affymetrix Zebrafish Genome Array. The experiment design > includes three treatments, namely A, C and G. We have three > biological replicates for each treatment, thus A1, A2, A3, C1, C2, > C3 and G1, G2, G3. > > A1, C1 and G1 were from the first batch (microarray experiment was > performed earlier). A2, A3, C2, C3 and G2, G3 were from the second > batch. We have very strong batch effect. If I use gcrma to normalize > the data, the only effect I can see is the batch effect, e.g. in the > hierarchical clustering, A1, C1 and G1 are clustered together. Then, > no matter what comparison I used, I cannot get any differentially > expressed genes from the data. It is obviously because the batch > effect (or background noise between batch) destroyed everything. > > By accident, I used gcrma to normalize the three replicates from > each treatment separately. Something dramatically happened, like this: > > $data<-ReadAffy() > $data1.gcrma<-gcrma(data[,1:3]) #A samples > $data2.gcrma<-gcrma(data[,4:6]) #C samples > $data3.gcrma<-gcrma(data[,7:9]) #G samples > $data.gcrma.exprs<-cbind(exprs(data1.gcrma,data2.gcrma,data3.gcrma)) > > Then, all the batch effects were gone. The variance within each > group/treatment was dramatically reduced. But then, I realized that > gcrma/rma uses median polish to summarize the probe set value, which > iteratively substracting row median and column median. The probe set > signal is calculated by adding global median to column median, thus > highly depends on the original column median of the probe set. It > probably introduced artifact if I normalize different groups > separately. I would doubt the results from a group wise normalization. You will artificially make the samples look more homogeneous within the group and therefore get more DE. You could try to model the batch effect in limma, but it is not clear that you can get rid of it. And even if you model the batch effect, it is not clear that you will get much differential expression. If you are capable, I would recommend redoing the experiment. That decision of course depends on how many resources you will need to spend on this. Kasper > The most interesting thing is that the genes we expected was in the > gene list generated by the group-wise gcrma normalization. So, I > just wonder if there is any reason that this group-wise gcrma is > acceptable. I am kinda desperate on deciding whether to use the data > or discard everything. Because the batch effect is so strong and > also because of the small sample size, no normalization works so far > (gcrma, rma, mas5, loess/quantile/contrasts/scale normalization). > Thanks in advance. > > > > Jun Yin > Ph.D. student in U.C.D. > 2009-02-25 > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.2 years ago Kasper Daniel Hansen ★ 6.5k

Login before adding your answer.