Array normalisation with Limma: would this be reasonable?
1
0
Entering edit mode
@jdelasherasedacuk-1189
Last seen 8.7 years ago
United Kingdom
Hi, I am having trouble trying to normalise my data properly. Briefly, I have a number of 2-colour cDNA arrays. Every slide is hybridised to 1) a reference sample (non-transfected RNA from a cell line), and 2) a transfected sample (on teh same cell line). So the question is transfection vs. non-transfection. So far so good. What's the problem? The transfection is of a plasmid that will "activate" expression of many genes (it's a fusion protein between a DNA-binding domain that would target many gene promoters, especially silenced ones, and a potent transactivator domain). This means that a large proportion of genes are differentially expressed, with most going from no or very little expression, to a clearly detectable level. This means that loess normalisation doesn't work very well. Actually, it works "too well". On the raw data, if you plot Cy3 vs Cy5 (logged), there's the usual diagonal with the bulk of the data, and then a (usually) large spike with low Cy3 and varying Cy5 (parallel to the Cy5 axis), or viceversa, depending on how the transfection was labelled. (See http://mcnach.com/MISC/RG_scatterplots.png). BUt then, after print-tip group loess, what I see is that the spike gets severely distorted, pulled towards the bulk of the data in the diagonal, and this results in a clear underestimation of the number of real DE genes. I'm exploring alternatives, and I had an idea. It seems a bit "rough", so I wonder what more experienced people think. This is teh idea: I can identify most of the spots on the "spike" by virtue of their having just about background signal on one channel, and decent signal on teh other. This I can do on the raw data, either by looking and the foreground and background intensities on each slide, or at the signal to noise ratio (SNR) that Genepix produces. Once these are located, I can assign zero weight to them, which means that the normalisation (loess) is applied using only the bulk of the spots, that mostly don't change that much. My hope is that this would remove the distortion of the spike due to loess, but would still be adequate enough to "balance" the Cy3 and Cy5 channels appropriately. I have experimented trying different values for teh 'span' parameter in loess, from the default 0.3 up to 1.0. The higher the span, the smaller the distortion, although the angle of the spike varies and it's still not quite right. In the light of what the raw data scatterplots look like (attachment), does anyone have objections to my "solution"? I realise that the best would be to have a set of control spots for these arrays, but unfortunately I don't have that luxury. I have identified a small set of genes that do not change expression, consistently across experiments, even when done in another cell line. But these are only 7 genes, which cover the effective range of A values, and I don't think that 7 genes is enough (when I tried limma's normalisation method 'control' it gave me an error that appear to be due to too few spots used as controls). I'd be grateful for any comments. Thanks! Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK
ASSIGN ASSIGN • 972 views
ADD COMMENT
0
Entering edit mode
@jdelasherasedacuk-1189
Last seen 8.7 years ago
United Kingdom
Not sure if this would be of interest to anyone... but the approach I described below seemed to work pretty well. I used weights to exclude the "spike" from print-tip loess normalisation, and didn't use the weights to fit the linear model. The scatter plots before/after normalisation look very reasonable, and the results I am getting make sense. Jose Quoting J.delasHeras at ed.ac.uk: > Hi, > > I am having trouble trying to normalise my data properly. > Briefly, I have a number of 2-colour cDNA arrays. Every slide is > hybridised to 1) a reference sample (non-transfected RNA from a cell > line), and 2) a transfected sample (on teh same cell line). > So the question is transfection vs. non-transfection. So far so good. > > What's the problem? > The transfection is of a plasmid that will "activate" expression of > many genes (it's a fusion protein between a DNA-binding domain that > would target many gene promoters, especially silenced ones, and a > potent transactivator domain). This means that a large proportion of > genes are differentially expressed, with most going from no or very > little expression, to a clearly detectable level. > > This means that loess normalisation doesn't work very well. Actually, > it works "too well". On the raw data, if you plot Cy3 vs Cy5 (logged), > there's the usual diagonal with the bulk of the data, and then a > (usually) large spike with low Cy3 and varying Cy5 (parallel to the > Cy5 axis), or viceversa, depending on how the transfection was labelled. > (See http://mcnach.com/MISC/RG_scatterplots.png). > BUt then, after print-tip group loess, what I see is that the spike > gets severely distorted, pulled towards the bulk of the data in the > diagonal, and this results in a clear underestimation of the number of > real DE genes. > > I'm exploring alternatives, and I had an idea. It seems a bit "rough", > so I wonder what more experienced people think. > This is teh idea: I can identify most of the spots on the "spike" by > virtue of their having just about background signal on one channel, > and decent signal on teh other. This I can do on the raw data, either > by looking and the foreground and background intensities on each > slide, or at the signal to noise ratio (SNR) that Genepix produces. > Once these are located, I can assign zero weight to them, which means > that the normalisation (loess) is applied using only the bulk of the > spots, that mostly don't change that much. > My hope is that this would remove the distortion of the spike due to > loess, but would still be adequate enough to "balance" the Cy3 and Cy5 > channels appropriately. > > I have experimented trying different values for teh 'span' parameter > in loess, from the default 0.3 up to 1.0. The higher the span, the > smaller the distortion, although the angle of the spike varies and > it's still not quite right. > > In the light of what the raw data scatterplots look like (attachment), > does anyone have objections to my "solution"? > > I realise that the best would be to have a set of control spots for > these arrays, but unfortunately I don't have that luxury. I have > identified a small set of genes that do not change expression, > consistently across experiments, even when done in another cell line. > But these are only 7 genes, which cover the effective range of A > values, and I don't think that 7 genes is enough (when I tried limma's > normalisation method 'control' it gave me an error that appear to be > due to too few spots used as controls). > > I'd be grateful for any comments. > > Thanks! > > Jose > > -- > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK
ADD COMMENT

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6