Question

marray, weights and normalizations..

0

Entering edit mode

Henning Redestig ▴ 30

@henning-redestig-1206

Last seen 9.7 years ago

Hi, I am trying to use the Lapointe et al, PNAS 2004 data set from SMD consisting of 112 arrays. These are not as I understand it LIMMA compliant since the spots in the raw files are not directly in the spotting order (some spots have been left out) and therefore I decided to use the marray package which seem to be capable of handling even this kind of formatting. Using read.SMD() to import the data seems to work and image() can plot the spots in spatial order indicating that the spotting order information has been kept. Problem arise when I try to normalize the data using maNormMain() as I wish to weight the spots based on their flags. Setting w to the weights vector or NULL I get MA-plots as provided indicating a strong dependence between A and M in the lower intensity range when weights are used (lines are lowess fitted lines per print tip). Could anyone enlighten me as to why this is the case? Isnt the whole point of the normalization to remove any dependence between A and M? The weights vector was set to 1 for flag=0, 0.1 for flag<=-50 and 0.01 for flag<=-75 (GenePix flagging conventions, and weights chosen arbitrarily) Very thankful for help

marray marray • 1.1k views

ADD COMMENT • link updated 19.1 years ago by Gordon Smyth 50k • written 19.1 years ago by Henning Redestig ▴ 30

score 0 · Answer 1 · 2005-04-18

Hi, I am trying to use the Lapointe et al, PNAS 2004 data set from SMD consisting of 112 arrays. These are not as I understand it LIMMA compliant since the spots in the raw files are not directly in the spotting order (some spots have been left out) and therefore I decided to use the marray package which seem to be capable of handling even this kind of formatting. Using read.SMD() to import the data seems to work and image() can plot the spots in spatial order indicating that the spotting order information has been kept. Problem arise when I try to normalize the data using maNormMain() as I wish to weight the spots based on their flags. Setting w to the weights vector or NULL I get MA-plots as provided indicating a strong dependence between A and M in the lower intensity range when weights are used (lines are lowess fitted lines per print tip). Could anyone enlighten me as to why this is the case? Isnt the whole point of the normalization to remove any dependence between A and M? The weights vector was set to 1 for flag=0, 0.1 for flag<=-50 and 0.01 for flag<=-75 (GenePix flagging conventions, and weights chosen arbitrarily) Very thankful for help

score 0 · Answer 2 · 2005-04-20

>Date: Mon, 18 Apr 2005 12:13:03 +0200 >From: Henning Redestig <redestig@mpimp-golm.mpg.de> >Subject: [BioC] marray, weights and normalizations.. >To: bioconductor@stat.math.ethz.ch >Message-ID: <4263882F.6000807@mpimp-golm.mpg.de> >Content-Type: text/plain; charset=us-ascii; format=flowed > >Hi, > >I am trying to use the Lapointe et al, PNAS 2004 data set from SMD >consisting of 112 arrays. These are not as I understand it LIMMA >compliant since the spots in the raw files are not directly in the >spotting order (some spots have been left out) This is correct. Limma will do "loess" normalization for you but not print-tip-loess on such data. > and therefore I decided >to use the marray package which seem to be capable of handling even this >kind of formatting. >Using read.SMD() to import the data seems to work and image() can plot >the spots in spatial order indicating that the spotting order >information has been kept. > >Problem arise when I try to normalize the data using maNormMain() as I >wish to weight the spots based on their flags. Setting w to the weights >vector or NULL I get MA-plots as provided indicating a strong dependence >between A and M in the lower intensity range when weights are used >(lines are lowess fitted lines per print tip). Could anyone enlighten me >as to why this is the case? Isnt the whole point of the normalization to >remove any dependence between A and M? Yes it is, but it won't do so if you tell it to ignore all the low intensity spots, which is what you're doing when you set the weights to zero. In my opinion, filtering low intensity spots is against the spirit of loess normalization. If you want to filter low intensity spots, you should do it post normalization. Gordon >The weights vector was set to 1 for flag=0, 0.1 for flag<=-50 and 0.01 >for flag<=-75 (GenePix flagging conventions, and weights chosen arbitrarily) > >Very thankful for help

score 0 · Answer 3 · 2005-04-21

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

>Date: Mon, 18 Apr 2005 12:13:03 +0200 >From: Henning Redestig <redestig@mpimp-golm.mpg.de> >Subject: [BioC] marray, weights and normalizations.. >To: bioconductor@stat.math.ethz.ch > >Hi, > >I am trying to use the Lapointe et al, PNAS 2004 data set from SMD >consisting of 112 arrays. These are not as I understand it LIMMA >compliant since the spots in the raw files are not directly in the >spotting order (some spots have been left out) > and therefore I decided >to use the marray package which seem to be capable of handling even this >kind of formatting. >Using read.SMD() to import the data seems to work and image() can plot >the spots in spatial order indicating that the spotting order >information has been kept. > >Problem arise when I try to normalize the data using maNormMain() as I >wish to weight the spots based on their flags. As far as I know, maNormMain() only handles spot weights on a single- array basis. I assume you are aware of that already. Gordon > Setting w to the weights >vector or NULL I get MA-plots as provided indicating a strong dependence >between A and M in the lower intensity range when weights are used >(lines are lowess fitted lines per print tip). Could anyone enlighten me >as to why this is the case? Isnt the whole point of the normalization to >remove any dependence between A and M? >The weights vector was set to 1 for flag=0, 0.1 for flag<=-50 and 0.01 >for flag<=-75 (GenePix flagging conventions, and weights chosen arbitrarily) > >Very thankful for help

ADD COMMENT • link 19.1 years ago Gordon Smyth 50k

0

Entering edit mode

> This is correct. Limma will do "loess" normalization for you but not print-tip-loess on such data. I remedied this by "padding out" the Raw file with NA's which were systematically lacking spots in the last column of each block, this allowed me to do PT-loess in Limma which does not show the same strange behaviour as in marray when weights are used. Another issue has occured though which I have seen on several datasets now related to using zero weights. Distributionally I get a whole lot more outliers leading to M values ranging between e.g. -200, 200 an effect I cant see when using weights of say, 0.1 instead (for all negatively GenePix flagged genes). Is this to be expected or am I doing something wrong? > As far as I know, maNormMain() only handles spot weights on a > single-array basis. I assume you are aware of that already. Yes, I was aware of that since you have pointed that out on this list previously :) Thanks for the reply! /Henning Gordon Smyth wrote: > >> Date: Mon, 18 Apr 2005 12:13:03 +0200 >> From: Henning Redestig <redestig@mpimp-golm.mpg.de> >> Subject: [BioC] marray, weights and normalizations.. >> To: bioconductor@stat.math.ethz.ch >> >> Hi, >> >> I am trying to use the Lapointe et al, PNAS 2004 data set from SMD >> consisting of 112 arrays. These are not as I understand it LIMMA >> compliant since the spots in the raw files are not directly in the >> spotting order (some spots have been left out) >> and therefore I decided >> to use the marray package which seem to be capable of handling even this >> kind of formatting. >> Using read.SMD() to import the data seems to work and image() can plot >> the spots in spatial order indicating that the spotting order >> information has been kept. >> >> Problem arise when I try to normalize the data using maNormMain() as I >> wish to weight the spots based on their flags. > > > As far as I know, maNormMain() only handles spot weights on a > single-array basis. I assume you are aware of that already. > > Gordon > >> Setting w to the weights >> vector or NULL I get MA-plots as provided indicating a strong dependence >> between A and M in the lower intensity range when weights are used >> (lines are lowess fitted lines per print tip). Could anyone enlighten me >> as to why this is the case? Isnt the whole point of the normalization to >> remove any dependence between A and M? > > >> The weights vector was set to 1 for flag=0, 0.1 for flag<=-50 and 0.01 >> for flag<=-75 (GenePix flagging conventions, and weights chosen >> arbitrarily) >> >> Very thankful for help > > >

ADD REPLY • link 19.1 years ago Henning Redestig ▴ 30

0

Entering edit mode

At 05:28 PM 21/04/2005, Henning Redestig wrote: >Another issue has occured though which I have seen on several datasets now >related to using zero weights. Distributionally I get a whole lot more >outliers leading to M values ranging between e.g. -200, 200 an effect I >cant see when using weights of say, 0.1 instead (for all negatively >GenePix flagged genes). Is this to be expected or am I doing something wrong? Yes it is to be expected. The meaning of zero weights is that there is no penalty for the loess line not fitting these spots, and hence it is to be expected that it tends to fit these spots poorly. Hence large normalized M-values. Of course, if you really believe that a spot should get zero weight, you shouldn't care what M-value it gets, because it will play no role in the analysis. Because of this, the plotMA() function in limma hides spots with zero weights as a default. Btw, I am very much opposed to the idea that a spot should be considered poor quality merely because it is faint, e.g., gets a GenePix -50 flag. If a gene is not expressed in a particular sample, a faint spot is exactly what you want to observe. Gordon >Thanks for the reply! > >/Henning

ADD REPLY • link 19.1 years ago Gordon Smyth 50k