Pairs plots in lumi, plots look different?

0

Entering edit mode

Julien Bauer ▴ 20

@julien-bauer-837

Last seen 9.6 years ago

Hello, I am working for a microarray facility at Cambridge University, we have been using the Illumina platform for a while now. Lumi is a great package but I noticed something rather odd, when using the plot function "pairs" on my data, if I run it again the plots look different, some points are shifted or change location. The rest stay the same it just the look of the graphs that change. My guess is that it is because of the auto scaling of the graph but I would like to be sure. I look in the mailing list archive and in the vignette but I couldn't find the answer for this. Thanks in advance for your help, Julien Bauer

Microarray graph Microarray graph • 1.5k views

ADD COMMENT • link updated 16.0 years ago by Pan Du ★ 1.2k • written 16.0 years ago by Julien Bauer ▴ 20

0

Entering edit mode

Matthias Kohl ▴ 160

@matthias-kohl-1678

Last seen 9.6 years ago

Hello, the default pairs plot uses a random subset of the data pairs(x, ..., logMode = TRUE, subset = 5000) confer library(lumi) ?"pairs-methods" Hence, each call of pairs leads to different results. By setting the random seed or the argument "subset" appropriately you could obtain identical plots for each call. Best regards, Matthias Julien Bauer wrote: > Hello, > I am working for a microarray facility at Cambridge University, we > have been using the Illumina platform for a while now. > Lumi is a great package but I noticed something rather odd, when using > the plot function "pairs" on my data, if I run it again the plots look > different, some points are shifted or change location. The rest stay > the same it just the look of the graphs that change. > My guess is that it is because of the auto scaling of the graph but I > would like to be sure. I look in the mailing list archive and in the > vignette but I couldn't find the answer for this. > Thanks in advance for your help, > > Julien Bauer > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Dr. Matthias Kohl www.stamats.de

ADD COMMENT • link 16.0 years ago Matthias Kohl ▴ 160

0

Entering edit mode

Pan Du ★ 1.2k

@pan-du-2010

Last seen 9.6 years ago

Yes, as Matthias mentioned, we use random subset to increase the efficiency of plotting. To avoid variations over different plots, I have added the "seed" parameter to the these plot functions. Please check the latest developing version of lumi 1.7.3. Thanks for using lumi! Best regards, Pan On 5/8/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" <bioconductor-request at="" stat.math.ethz.ch=""> wrote: > Message: 1 > Date: Wed, 07 May 2008 12:41:48 +0200 > From: Matthias Kohl <matthias.kohl at="" stamats.de=""> > Subject: Re: [BioC] Pairs plots in lumi, plots look different? > To: Julien Bauer <jb393 at="" cam.ac.uk=""> > Cc: bioconductor at stat.math.ethz.ch > Message-ID: <4821876C.3020705 at stamats.de> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hello, > > the default pairs plot uses a random subset of the data > > pairs(x, ..., logMode = TRUE, subset = 5000) > > confer > library(lumi) > ?"pairs-methods" > > Hence, each call of pairs leads to different results. By setting the > random seed or the argument "subset" appropriately you could obtain > identical plots for each call. > > Best regards, > Matthias > > > Julien Bauer wrote: >> Hello, >> I am working for a microarray facility at Cambridge University, we >> have been using the Illumina platform for a while now. >> Lumi is a great package but I noticed something rather odd, when using >> the plot function "pairs" on my data, if I run it again the plots look >> different, some points are shifted or change location. The rest stay >> the same it just the look of the graphs that change. >> My guess is that it is because of the auto scaling of the graph but I >> would like to be sure. I look in the mailing list archive and in the >> vignette but I couldn't find the answer for this. >> Thanks in advance for your help, >> >> Julien Bauer >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Dr. Matthias Kohl > www.stamats.de >

ADD COMMENT • link 16.0 years ago Pan Du ★ 1.2k

0

Entering edit mode

Please don't put seed inside functions, it may mess up the random number stream. If someone wants reproducible plots you should either increase the number of points are let him set the seed himself. Kasper On May 8, 2008, at 7:35 AM, Pan Du wrote: > > Yes, as Matthias mentioned, we use random subset to increase the > efficiency > of plotting. To avoid variations over different plots, I have added > the > "seed" parameter to the these plot functions. Please check the latest > developing version of lumi 1.7.3. Thanks for using lumi! > > Best regards, > > > Pan > > > > On 5/8/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" > <bioconductor-request at="" stat.math.ethz.ch=""> wrote: > >> Message: 1 >> Date: Wed, 07 May 2008 12:41:48 +0200 >> From: Matthias Kohl <matthias.kohl at="" stamats.de=""> >> Subject: Re: [BioC] Pairs plots in lumi, plots look different? >> To: Julien Bauer <jb393 at="" cam.ac.uk=""> >> Cc: bioconductor at stat.math.ethz.ch >> Message-ID: <4821876C.3020705 at stamats.de> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Hello, >> >> the default pairs plot uses a random subset of the data >> >> pairs(x, ..., logMode = TRUE, subset = 5000) >> >> confer >> library(lumi) >> ?"pairs-methods" >> >> Hence, each call of pairs leads to different results. By setting the >> random seed or the argument "subset" appropriately you could obtain >> identical plots for each call. >> >> Best regards, >> Matthias >> >> >> Julien Bauer wrote: >>> Hello, >>> I am working for a microarray facility at Cambridge University, we >>> have been using the Illumina platform for a while now. >>> Lumi is a great package but I noticed something rather odd, when >>> using >>> the plot function "pairs" on my data, if I run it again the plots >>> look >>> different, some points are shifted or change location. The rest stay >>> the same it just the look of the graphs that change. >>> My guess is that it is because of the auto scaling of the graph >>> but I >>> would like to be sure. I look in the mailing list archive and in the >>> vignette but I couldn't find the answer for this. >>> Thanks in advance for your help, >>> >>> Julien Bauer >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> Dr. Matthias Kohl >> www.stamats.de >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 16.0 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

The "seed" is a function parameter. Users can easily change it. Thanks. Pan On 5/8/08 12:41 PM, "Kasper Daniel Hansen" <khansen at="" stat.berkeley.edu=""> wrote: > Please don't put seed inside functions, it may mess up the random > number stream. If someone wants reproducible plots you should either > increase the number of points are let him set the seed himself. > > Kasper > > On May 8, 2008, at 7:35 AM, Pan Du wrote: > >> >> Yes, as Matthias mentioned, we use random subset to increase the >> efficiency >> of plotting. To avoid variations over different plots, I have added >> the >> "seed" parameter to the these plot functions. Please check the latest >> developing version of lumi 1.7.3. Thanks for using lumi! >> >> Best regards, >> >> >> Pan >> >> >> >> On 5/8/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" >> <bioconductor-request at="" stat.math.ethz.ch=""> wrote: >> >>> Message: 1 >>> Date: Wed, 07 May 2008 12:41:48 +0200 >>> From: Matthias Kohl <matthias.kohl at="" stamats.de=""> >>> Subject: Re: [BioC] Pairs plots in lumi, plots look different? >>> To: Julien Bauer <jb393 at="" cam.ac.uk=""> >>> Cc: bioconductor at stat.math.ethz.ch >>> Message-ID: <4821876C.3020705 at stamats.de> >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> >>> Hello, >>> >>> the default pairs plot uses a random subset of the data >>> >>> pairs(x, ..., logMode = TRUE, subset = 5000) >>> >>> confer >>> library(lumi) >>> ?"pairs-methods" >>> >>> Hence, each call of pairs leads to different results. By setting the >>> random seed or the argument "subset" appropriately you could obtain >>> identical plots for each call. >>> >>> Best regards, >>> Matthias >>> >>> >>> Julien Bauer wrote: >>>> Hello, >>>> I am working for a microarray facility at Cambridge University, we >>>> have been using the Illumina platform for a while now. >>>> Lumi is a great package but I noticed something rather odd, when >>>> using >>>> the plot function "pairs" on my data, if I run it again the plots >>>> look >>>> different, some points are shifted or change location. The rest stay >>>> the same it just the look of the graphs that change. >>>> My guess is that it is because of the auto scaling of the graph >>>> but I >>>> would like to be sure. I look in the mailing list archive and in the >>>> vignette but I couldn't find the answer for this. >>>> Thanks in advance for your help, >>>> >>>> Julien Bauer >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> -- >>> Dr. Matthias Kohl >>> www.stamats.de >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 16.0 years ago Pan Du ★ 1.2k

0

Entering edit mode

I have two comments to this, a general and a specific. I'll start with the specific: in this case you are providing a pairs plot. Presumably to avoid overplotting you subsample the data points. Depending on what you want to use the plot for this may be quite ok - but the users need to know this! Clearly in this case the user was surprised to see it (perhaps it is highlighted on the help page, I don't know). For certain things - especially for QC I would say - I would personally prefer to plot all points (perhaps using a smoother like Wolfgang suggested). If users start interpreting these plots without knowing that it is only a fraction of the data they see, it is likely that they will misinterpret them. Setting the seed just addresses the symptom - that the plots are not "reproducible", not the underlying problem that this plots may not be suitable for whatever the original poster had in mind (otherwise he would not care that they look differently). What in my opinion should be done instead is 1) highlight it in the help page 2) provide some title on the plot like "based on 5000 samples" so that people do not get confused. 3) not set the seed And now for the general comment (I guess there are two points in the following): I believe it is very misleading to set the seed in essentially any package (see below for one special case though). The seed is essentially a global variable and when you mess with it, other parts of the analysis may get affected. If an analysis method depends on random sampling, the conclusions (or the method) should take this into account. That means that the conclusions should be completely unaffected by whatever random numbers were generated. If that is not the case the analysis is flawed. It can be fixed by fixing the method, increase the number of samples or finally by adjusting the conclusions of the analysis. In most cases setting the seed for reproducibility (as was done in gcrma, see older post on the email list) just hides the problem and worse - typically makes users unaware of the fact that they need to take the effect of the randomness into account. So my points are 1) any conclusion based on random sampling should be invariant to this sampling. 2) setting the seed affects a global variable which you should never do. Now, some people have a seed parameter to their function. In case this parameters has a default argument like .., seeed = 123,... I believe it is very dangerous based on the stuff above. If the default case of the seed parameter is to not set a seed (perhaps by doing something like) .., seed = NULL,.. or ..., seed = FALSE, ... you might as well not include it. There is not much difference between set.seed(123) myFunc() and myFunc(seed = 123) Finally I can only think of one case where a package might have a good reason to play with the seed: if you are trying to provide an update method for a resampling based method, like update(bootstrapObject, additonalSample = 1000) and even then it needs to be done with great care. Kasper On May 8, 2008, at 10:49 AM, Pan Du wrote: > The "seed" is a function parameter. Users can easily change it. > Thanks. > > > Pan > > > On 5/8/08 12:41 PM, "Kasper Daniel Hansen" <khansen at="" stat.berkeley.edu=""> > wrote: > >> Please don't put seed inside functions, it may mess up the random >> number stream. If someone wants reproducible plots you should either >> increase the number of points are let him set the seed himself. >> >> Kasper >> >> On May 8, 2008, at 7:35 AM, Pan Du wrote: >> >>> >>> Yes, as Matthias mentioned, we use random subset to increase the >>> efficiency >>> of plotting. To avoid variations over different plots, I have added >>> the >>> "seed" parameter to the these plot functions. Please check the >>> latest >>> developing version of lumi 1.7.3. Thanks for using lumi! >>> >>> Best regards, >>> >>> >>> Pan >>> >>> >>> >>> On 5/8/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" >>> <bioconductor-request at="" stat.math.ethz.ch=""> wrote: >>> >>>> Message: 1 >>>> Date: Wed, 07 May 2008 12:41:48 +0200 >>>> From: Matthias Kohl <matthias.kohl at="" stamats.de=""> >>>> Subject: Re: [BioC] Pairs plots in lumi, plots look different? >>>> To: Julien Bauer <jb393 at="" cam.ac.uk=""> >>>> Cc: bioconductor at stat.math.ethz.ch >>>> Message-ID: <4821876C.3020705 at stamats.de> >>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>>> >>>> Hello, >>>> >>>> the default pairs plot uses a random subset of the data >>>> >>>> pairs(x, ..., logMode = TRUE, subset = 5000) >>>> >>>> confer >>>> library(lumi) >>>> ?"pairs-methods" >>>> >>>> Hence, each call of pairs leads to different results. By setting >>>> the >>>> random seed or the argument "subset" appropriately you could obtain >>>> identical plots for each call. >>>> >>>> Best regards, >>>> Matthias >>>> >>>> >>>> Julien Bauer wrote: >>>>> Hello, >>>>> I am working for a microarray facility at Cambridge University, we >>>>> have been using the Illumina platform for a while now. >>>>> Lumi is a great package but I noticed something rather odd, when >>>>> using >>>>> the plot function "pairs" on my data, if I run it again the plots >>>>> look >>>>> different, some points are shifted or change location. The rest >>>>> stay >>>>> the same it just the look of the graphs that change. >>>>> My guess is that it is because of the auto scaling of the graph >>>>> but I >>>>> would like to be sure. I look in the mailing list archive and in >>>>> the >>>>> vignette but I couldn't find the answer for this. >>>>> Thanks in advance for your help, >>>>> >>>>> Julien Bauer >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> -- >>>> Dr. Matthias Kohl >>>> www.stamats.de >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 16.0 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Hi Kasper, It is a good idea to clearly indicate the plots were based on random samplings if subsetting was used, so I do not need to set random seed. I will also add smoothScatter as an option in the functions. Thanks for your comments! Best, Pan On 5/8/08 2:19 PM, "Kasper Daniel Hansen" <khansen at="" stat.berkeley.edu=""> wrote: > I have two comments to this, a general and a specific. > > I'll start with the specific: in this case you are providing a pairs > plot. Presumably to avoid overplotting you subsample the data points. > Depending on what you want to use the plot for this may be quite ok - > but the users need to know this! Clearly in this case the user was > surprised to see it (perhaps it is highlighted on the help page, I > don't know). For certain things - especially for QC I would say - I > would personally prefer to plot all points (perhaps using a smoother > like Wolfgang suggested). If users start interpreting these plots > without knowing that it is only a fraction of the data they see, it is > likely that they will misinterpret them. Setting the seed just > addresses the symptom - that the plots are not "reproducible", not the > underlying problem that this plots may not be suitable for whatever > the original poster had in mind (otherwise he would not care that they > look differently). What in my opinion should be done instead is > 1) highlight it in the help page > 2) provide some title on the plot like "based on 5000 samples" so that > people do not get confused. > 3) not set the seed > > And now for the general comment (I guess there are two points in the > following): I believe it is very misleading to set the seed in > essentially any package (see below for one special case though). The > seed is essentially a global variable and when you mess with it, other > parts of the analysis may get affected. If an analysis method depends > on random sampling, the conclusions (or the method) should take this > into account. That means that the conclusions should be completely > unaffected by whatever random numbers were generated. If that is not > the case the analysis is flawed. It can be fixed by fixing the method, > increase the number of samples or finally by adjusting the conclusions > of the analysis. In most cases setting the seed for reproducibility > (as was done in gcrma, see older post on the email list) just hides > the problem and worse - typically makes users unaware of the fact that > they need to take the effect of the randomness into account. So my > points are > 1) any conclusion based on random sampling should be invariant to this > sampling. > 2) setting the seed affects a global variable which you should never do. > > Now, some people have a seed parameter to their function. In case this > parameters has a default argument like > .., seeed = 123,... > I believe it is very dangerous based on the stuff above. If the > default case of the seed parameter is to not set a seed (perhaps by > doing something like) > .., seed = NULL,.. or ..., seed = FALSE, ... > you might as well not include it. There is not much difference between > set.seed(123) > myFunc() > and > myFunc(seed = 123) > > Finally I can only think of one case where a package might have a good > reason to play with the seed: if you are trying to provide an update > method for a resampling based method, like > update(bootstrapObject, additonalSample = 1000) > and even then it needs to be done with great care. > > Kasper >

ADD REPLY • link 16.0 years ago Pan Du ★ 1.2k

0

Entering edit mode

Dear Pan, as another option, you could explore the "smoothScatter" function from the "geneplotter" package. The examples in its manual page show how to use it as a panel function for "pairs". Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber Pan Du wrote: > Yes, as Matthias mentioned, we use random subset to increase the efficiency > of plotting. To avoid variations over different plots, I have added the > "seed" parameter to the these plot functions. Please check the latest > developing version of lumi 1.7.3. Thanks for using lumi! > > Best regards, > > > Pan > > > > On 5/8/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" > <bioconductor-request at="" stat.math.ethz.ch=""> wrote: > >> Message: 1 >> Date: Wed, 07 May 2008 12:41:48 +0200 >> From: Matthias Kohl <matthias.kohl at="" stamats.de=""> >> Subject: Re: [BioC] Pairs plots in lumi, plots look different? >> To: Julien Bauer <jb393 at="" cam.ac.uk=""> >> Cc: bioconductor at stat.math.ethz.ch >> Message-ID: <4821876C.3020705 at stamats.de> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Hello, >> >> the default pairs plot uses a random subset of the data >> >> pairs(x, ..., logMode = TRUE, subset = 5000) >> >> confer >> library(lumi) >> ?"pairs-methods" >> >> Hence, each call of pairs leads to different results. By setting the >> random seed or the argument "subset" appropriately you could obtain >> identical plots for each call. >> >> Best regards, >> Matthias >> >> >> Julien Bauer wrote: >>> Hello, >>> I am working for a microarray facility at Cambridge University, we >>> have been using the Illumina platform for a while now. >>> Lumi is a great package but I noticed something rather odd, when using >>> the plot function "pairs" on my data, if I run it again the plots look >>> different, some points are shifted or change location. The rest stay >>> the same it just the look of the graphs that change. >>> My guess is that it is because of the auto scaling of the graph but I >>> would like to be sure. I look in the mailing list archive and in the >>> vignette but I couldn't find the answer for this. >>> Thanks in advance for your help, >>> >>> Julien Bauer Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber

ADD REPLY • link 16.0 years ago Wolfgang Huber ★ 13k

Login before adding your answer.