outlier probes detection
2
0
Entering edit mode
Andrea Grilli ▴ 240
@andrea-grilli-4664
Last seen 8.8 years ago
Italy, Bologna, Rizzoli Orthopaedic Ins…
Dear all, I'm performing an analysis on HGU133plus2 arrays with 40 samples; looking at their surface with "affyPLM" package, I've seen a couple of arrays with small scratches and one more with a small bubble. Because I don't want to exclude these arrays (according to Murphys' law 2 on 3 belong to the class with less samples), I want to detect those probes and to exclude them. I was thinking in some outlier detection method, but because I'm new to this problem I don't know if this is the right method and which packages can be appropriate (did some research but I've no clear idea). Any help is really appreciated, andrea Dr. Andrea Grilli andrea.grilli at ior.it phone 051/63.66.756 Laboratory of Experimental Oncology, Development of Biomolecular Therapies unit, Rizzoli Orthopaedic Institute Codivilla Putti Research Center via di Barbiano 1/10 40136 - Bologna - Italy
hgu133plus2 hgu133plus2 • 1.6k views
ADD COMMENT
0
Entering edit mode
Guido Hooiveld ★ 3.9k
@guido-hooiveld-2020
Last seen 4 hours ago
Wageningen University, Wageningen, the …
Hi Andrea, If the affected area is relatively small (less than 5-10% of total area) we usually ignore these scratches/bubbles (because each probeset is comprised of multiple probes, and the robust summarization methods usually used within RMA (median polish or M-estimator) are able to handle these outliers pretty well). Alternatively, the package 'Harshlight' offers options to correct for various types of artefacts. http://www.bioconductor.org/packages/2.10/bioc/html/Harshlight.html Regards, Guido --------------------------------------------------------- Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 email: guido.hooiveld at wur.nl internet: http://nutrigene.4t.com http://scholar.google.com/citations?user=qFHaMnoAAAAJ http://www.researcherid.com/rid/F-4912-2010 -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of andrea.grilli@ior.it Sent: Tuesday, May 08, 2012 12:15 To: bioconductor at r-project.org Subject: [BioC] outlier probes detection Dear all, I'm performing an analysis on HGU133plus2 arrays with 40 samples; looking at their surface with "affyPLM" package, I've seen a couple of arrays with small scratches and one more with a small bubble. Because I don't want to exclude these arrays (according to Murphys' law 2 on 3 belong to the class with less samples), I want to detect those probes and to exclude them. I was thinking in some outlier detection method, but because I'm new to this problem I don't know if this is the right method and which packages can be appropriate (did some research but I've no clear idea). Any help is really appreciated, andrea Dr. Andrea Grilli andrea.grilli at ior.it phone 051/63.66.756 Laboratory of Experimental Oncology, Development of Biomolecular Therapies unit, Rizzoli Orthopaedic Institute Codivilla Putti Research Center via di Barbiano 1/10 40136 - Bologna - Italy _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Guido, thank you for your reply. I checked the package Harshlight you suggested. Although it detects outlier arrays (and not outlier probes) it works well for the case, because it gives a percentage of the defects and that's better than a simple visual evaluation. I have one question about the package evaluation of these defects. Because of the intense calculation, I tried either splitting the case study in two groups (20 and 20 arrays) and later on with the 40 chips all together: according to the package output, one array should be excluded only in the second case. Is there some sort of evaluation of the defects depending also on the set of arrays a chip is analyzed with? I flipped through the concerning paper but I didn't find any information about that.. I also checked the solution proposed by Okko (thank you for your suggestion), but because it's a stronger approach I'll need more time to evaluate it. Andrea "Hooiveld, Guido" <guido.hooiveld at="" wur.nl=""> ha scritto: > Hi Andrea, > > If the affected area is relatively small (less than 5-10% of total > area) we usually ignore these scratches/bubbles (because each > probeset is comprised of multiple probes, and the robust > summarization methods usually used within RMA (median polish or > M-estimator) are able to handle these outliers pretty well). > Alternatively, the package 'Harshlight' offers options to correct > for various types of artefacts. > http://www.bioconductor.org/packages/2.10/bioc/html/Harshlight.html > > Regards, > Guido > > --------------------------------------------------------- > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > Wageningen University > Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > email: guido.hooiveld at wur.nl > internet: http://nutrigene.4t.com > http://scholar.google.com/citations?user=qFHaMnoAAAAJ > http://www.researcherid.com/rid/F-4912-2010 > > > -----Original Message----- > From: bioconductor-bounces at r-project.org > [mailto:bioconductor-bounces at r-project.org] On Behalf Of > andrea.grilli at ior.it > Sent: Tuesday, May 08, 2012 12:15 > To: bioconductor at r-project.org > Subject: [BioC] outlier probes detection > > > Dear all, > I'm performing an analysis on HGU133plus2 arrays with 40 samples; > looking at their surface with "affyPLM" package, I've seen a couple > of arrays with small scratches and one more with a small bubble. > Because I don't want to exclude these arrays (according to Murphys' > law 2 on 3 belong to the class with less samples), I want to detect > those probes and to exclude them. > I was thinking in some outlier detection method, but because I'm new > to this problem I don't know if this is the right method and which > packages can be appropriate (did some research but I've no clear > idea). > Any help is really appreciated, > andrea > > > > > Dr. Andrea Grilli > andrea.grilli at ior.it > phone 051/63.66.756 > > Laboratory of Experimental Oncology, > Development of Biomolecular Therapies unit, Rizzoli Orthopaedic > Institute Codivilla Putti Research Center via di Barbiano 1/10 > 40136 - Bologna - Italy > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Andrea, I have to admit that it has been a while since I actively used Harshlight. However, AFAIK Harshligt detect both outlier probes. You can have Harslight automatically correct these outlier probes by having their (outlier) value replaced by either 'NA' or by the median value for all arrays. See '?Harshlight' for more details: "na.sub: If TRUE, the intensity values of the input affyBatch that are affected by defects will be changed in NA. If FALSE, the values will be substituted with the median of the intensity values of the other chips." If you have a brief look at Figure 2 of this Harshlight paper http://www.biomedcentral.com/1471-2105/6/294 you will see that a representative image of all arrays is obtained by creating a 'median image'. Each individual array is then compared to this median image, and defects are identified by deviations to the median image using a set of criteria (again see ?Harshlight for more details). Thus, the number of arrays that is analysed together will (slightly) affect the outcomes of Harshlight. If I were you I would analyse all arrays from an experiment together (in your case all 40), have Harshlight replace all outlier values by the median, and then continue with normalizing using e.g. (GC)RMA. HTH, Guido -----Original Message----- From: andrea.grilli@ior.it [mailto:andrea.grilli@ior.it] Sent: Wednesday, May 09, 2012 10:20 To: Hooiveld, Guido Cc: bioconductor at r-project.org Subject: Re: [BioC] outlier probes detection Hi Guido, thank you for your reply. I checked the package Harshlight you suggested. Although it detects outlier arrays (and not outlier probes) it works well for the case, because it gives a percentage of the defects and that's better than a simple visual evaluation. I have one question about the package evaluation of these defects. Because of the intense calculation, I tried either splitting the case study in two groups (20 and 20 arrays) and later on with the 40 chips all together: according to the package output, one array should be excluded only in the second case. Is there some sort of evaluation of the defects depending also on the set of arrays a chip is analyzed with? I flipped through the concerning paper but I didn't find any information about that.. I also checked the solution proposed by Okko (thank you for your suggestion), but because it's a stronger approach I'll need more time to evaluate it. Andrea "Hooiveld, Guido" <guido.hooiveld at="" wur.nl=""> ha scritto: > Hi Andrea, > > If the affected area is relatively small (less than 5-10% of total > area) we usually ignore these scratches/bubbles (because each probeset > is comprised of multiple probes, and the robust summarization methods > usually used within RMA (median polish or > M-estimator) are able to handle these outliers pretty well). > Alternatively, the package 'Harshlight' offers options to correct for > various types of artefacts. > http://www.bioconductor.org/packages/2.10/bioc/html/Harshlight.html > > Regards, > Guido > > --------------------------------------------------------- > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group Division of Human Nutrition > Wageningen University Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > email: guido.hooiveld at wur.nl > internet: http://nutrigene.4t.com > http://scholar.google.com/citations?user=qFHaMnoAAAAJ > http://www.researcherid.com/rid/F-4912-2010 > > > -----Original Message----- > From: bioconductor-bounces at r-project.org > [mailto:bioconductor-bounces at r-project.org] On Behalf Of > andrea.grilli at ior.it > Sent: Tuesday, May 08, 2012 12:15 > To: bioconductor at r-project.org > Subject: [BioC] outlier probes detection > > > Dear all, > I'm performing an analysis on HGU133plus2 arrays with 40 samples; > looking at their surface with "affyPLM" package, I've seen a couple of > arrays with small scratches and one more with a small bubble. > Because I don't want to exclude these arrays (according to Murphys' > law 2 on 3 belong to the class with less samples), I want to detect > those probes and to exclude them. > I was thinking in some outlier detection method, but because I'm new > to this problem I don't know if this is the right method and which > packages can be appropriate (did some research but I've no clear > idea). > Any help is really appreciated, > andrea > > > > > Dr. Andrea Grilli > andrea.grilli at ior.it > phone 051/63.66.756 > > Laboratory of Experimental Oncology, > Development of Biomolecular Therapies unit, Rizzoli Orthopaedic > Institute Codivilla Putti Research Center via di Barbiano 1/10 > 40136 - Bologna - Italy > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Yes, now I notice that point. Thank you so much for your help, Andrea "Hooiveld, Guido" <guido.hooiveld at="" wur.nl=""> ha scritto: > Hi Andrea, > I have to admit that it has been a while since I actively used > Harshlight. However, AFAIK Harshligt detect both outlier probes. You > can have Harslight automatically correct these outlier probes by > having their (outlier) value replaced by either 'NA' or by the > median value for all arrays. > See '?Harshlight' for more details: "na.sub: If TRUE, the intensity > values of the input affyBatch that are affected by defects will be > changed in NA. If FALSE, the values will be substituted with the > median of the intensity values of the other chips." > > If you have a brief look at Figure 2 of this Harshlight paper > http://www.biomedcentral.com/1471-2105/6/294 you will see that a > representative image of all arrays is obtained by creating a 'median > image'. Each individual array is then compared to this median image, > and defects are identified by deviations to the median image using a > set of criteria (again see ?Harshlight for more details). > Thus, the number of arrays that is analysed together will (slightly) > affect the outcomes of Harshlight. If I were you I would analyse all > arrays from an experiment together (in your case all 40), have > Harshlight replace all outlier values by the median, and then > continue with normalizing using e.g. (GC)RMA. > > HTH, > Guido > > -----Original Message----- > From: andrea.grilli at ior.it [mailto:andrea.grilli at ior.it] > Sent: Wednesday, May 09, 2012 10:20 > To: Hooiveld, Guido > Cc: bioconductor at r-project.org > Subject: Re: [BioC] outlier probes detection > > Hi Guido, > thank you for your reply. > I checked the package Harshlight you suggested. Although it detects > outlier arrays (and not outlier probes) it works well for the case, > because it gives a percentage of the defects and that's better than > a simple visual evaluation. > I have one question about the package evaluation of these defects. > Because of the intense calculation, I tried either splitting the > case study in two groups (20 and 20 arrays) and later on with the 40 > chips all together: according to the package output, one array > should be excluded only in the second case. Is there some sort of > evaluation of the defects depending also on the set of arrays a chip > is analyzed with? I flipped through the concerning paper but I > didn't find any information about that.. > > I also checked the solution proposed by Okko (thank you for your > suggestion), but because it's a stronger approach I'll need more > time to evaluate it. > > Andrea > > > "Hooiveld, Guido" <guido.hooiveld at="" wur.nl=""> ha scritto: > >> Hi Andrea, >> >> If the affected area is relatively small (less than 5-10% of total >> area) we usually ignore these scratches/bubbles (because each probeset >> is comprised of multiple probes, and the robust summarization methods >> usually used within RMA (median polish or >> M-estimator) are able to handle these outliers pretty well). >> Alternatively, the package 'Harshlight' offers options to correct for >> various types of artefacts. >> http://www.bioconductor.org/packages/2.10/bioc/html/Harshlight.html >> >> Regards, >> Guido >> >> --------------------------------------------------------- >> Guido Hooiveld, PhD >> Nutrition, Metabolism & Genomics Group Division of Human Nutrition >> Wageningen University Biotechnion, Bomenweg 2 >> NL-6703 HD Wageningen >> the Netherlands >> tel: (+)31 317 485788 >> fax: (+)31 317 483342 >> email: guido.hooiveld at wur.nl >> internet: http://nutrigene.4t.com >> http://scholar.google.com/citations?user=qFHaMnoAAAAJ >> http://www.researcherid.com/rid/F-4912-2010 >> >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org >> [mailto:bioconductor-bounces at r-project.org] On Behalf Of >> andrea.grilli at ior.it >> Sent: Tuesday, May 08, 2012 12:15 >> To: bioconductor at r-project.org >> Subject: [BioC] outlier probes detection >> >> >> Dear all, >> I'm performing an analysis on HGU133plus2 arrays with 40 samples; >> looking at their surface with "affyPLM" package, I've seen a couple of >> arrays with small scratches and one more with a small bubble. >> Because I don't want to exclude these arrays (according to Murphys' >> law 2 on 3 belong to the class with less samples), I want to detect >> those probes and to exclude them. >> I was thinking in some outlier detection method, but because I'm new >> to this problem I don't know if this is the right method and which >> packages can be appropriate (did some research but I've no clear >> idea). >> Any help is really appreciated, >> andrea >> >> >> >> >> Dr. Andrea Grilli >> andrea.grilli at ior.it >> phone 051/63.66.756 >> >> Laboratory of Experimental Oncology, >> Development of Biomolecular Therapies unit, Rizzoli Orthopaedic >> Institute Codivilla Putti Research Center via di Barbiano 1/10 >> 40136 - Bologna - Italy >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor Dr. Andrea Grilli andrea.grilli at ior.it phone 051/63.66.756 Laboratory of Experimental Oncology, Development of Biomolecular Therapies unit, Rizzoli Orthopaedic Institute Codivilla Putti Research Center via di Barbiano 1/10 40136 - Bologna - Italy
ADD REPLY
0
Entering edit mode
Hi Andrea, I'm happy to help you in this issue, just let me know if you need any further details regarding the farms algorithm or its assessment. If I recall correct are the probes of a probe-set spatial distributed over the whole HGU133plus2 array surface, so I don't expect that small scratches or bubbles have an impact on the farms-summarization as they will affect only very few probes within the probe-set. Cheers, Okko -- djork clevert | gleimstr. 13a | d-10437 berlin e: okko at clevert.de p: +49.30.4432 4702 f: +49.30.6883 5307 Am 09.05.2012 um 10:20 schrieb andrea.grilli at ior.it: > Hi Guido, > thank you for your reply. > I checked the package Harshlight you suggested. Although it detects outlier arrays (and not outlier probes) it works well for the case, because it gives a percentage of the defects and that's better than a simple visual evaluation. > I have one question about the package evaluation of these defects. Because of the intense calculation, I tried either splitting the case study in two groups (20 and 20 arrays) and later on with the 40 chips all together: according to the package output, one array should be excluded only in the second case. Is there some sort of evaluation of the defects depending also on the set of arrays a chip is analyzed with? I flipped through the concerning paper but I didn't find any information about that.. > > I also checked the solution proposed by Okko (thank you for your suggestion), but because it's a stronger approach I'll need more time to evaluate it. > > Andrea > > > "Hooiveld, Guido" <guido.hooiveld at="" wur.nl=""> ha scritto: > >> Hi Andrea, >> >> If the affected area is relatively small (less than 5-10% of total area) we usually ignore these scratches/bubbles (because each probeset is comprised of multiple probes, and the robust summarization methods usually used within RMA (median polish or M-estimator) are able to handle these outliers pretty well). >> Alternatively, the package 'Harshlight' offers options to correct for various types of artefacts. >> http://www.bioconductor.org/packages/2.10/bioc/html/Harshlight.html >> >> Regards, >> Guido >> >> --------------------------------------------------------- >> Guido Hooiveld, PhD >> Nutrition, Metabolism & Genomics Group >> Division of Human Nutrition >> Wageningen University >> Biotechnion, Bomenweg 2 >> NL-6703 HD Wageningen >> the Netherlands >> tel: (+)31 317 485788 >> fax: (+)31 317 483342 >> email: guido.hooiveld at wur.nl >> internet: http://nutrigene.4t.com >> http://scholar.google.com/citations?user=qFHaMnoAAAAJ >> http://www.researcherid.com/rid/F-4912-2010 >> >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of andrea.grilli at ior.it >> Sent: Tuesday, May 08, 2012 12:15 >> To: bioconductor at r-project.org >> Subject: [BioC] outlier probes detection >> >> >> Dear all, >> I'm performing an analysis on HGU133plus2 arrays with 40 samples; looking at their surface with "affyPLM" package, I've seen a couple of arrays with small scratches and one more with a small bubble. Because I don't want to exclude these arrays (according to Murphys' law 2 on 3 belong to the class with less samples), I want to detect those probes and to exclude them. >> I was thinking in some outlier detection method, but because I'm new to this problem I don't know if this is the right method and which packages can be appropriate (did some research but I've no clear idea). >> Any help is really appreciated, >> andrea >> >> >> >> >> Dr. Andrea Grilli >> andrea.grilli at ior.it >> phone 051/63.66.756 >> >> Laboratory of Experimental Oncology, >> Development of Biomolecular Therapies unit, Rizzoli Orthopaedic Institute Codivilla Putti Research Center via di Barbiano 1/10 >> 40136 - Bologna - Italy >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Djork Clevert ▴ 210
@djork-clevert-422
Last seen 9.6 years ago
Hi Andrea, I suggest using the farms package to summarize your data. FARMS is a probabilistic latent variable model that decomposes the data variance into signal and noise variance. Thus, up to five probe outliers per probe-set will not impair the summarization as they will be explained as noise. Another nice feature is FARMS' informative/ non-informative (I/NI) call that allows you to filter out probe-sets that are relevant for your experiment. Check out: farms: http://bioinformatics.oxfordjournals.org/content/22/8/943.abstract I/NI: http://bioinformatics.oxfordjournals.org/content/23/21/2897.abstract Cheers, Okko -- djork clevert | gleimstr. 13a | d-10437 berlin e: okko at clevert.de p: +49.30.4432 4702 f: +49.30.6883 5307 Am 08.05.2012 um 12:15 schrieb andrea.grilli at ior.it: > > Dear all, > I'm performing an analysis on HGU133plus2 arrays with 40 samples; > looking at their surface with "affyPLM" package, I've seen a couple of > arrays with small scratches and one more with a small bubble. Because > I don't want to exclude these arrays (according to Murphys' law 2 on 3 > belong to the class with less samples), I want to detect those probes > and to exclude them. > I was thinking in some outlier detection method, but because I'm new > to this problem I don't know if this is the right method and which > packages can be appropriate (did some research but I've no clear idea). > Any help is really appreciated, > andrea > > > > > Dr. Andrea Grilli > andrea.grilli at ior.it > phone 051/63.66.756 > > Laboratory of Experimental Oncology, > Development of Biomolecular Therapies unit, > Rizzoli Orthopaedic Institute > Codivilla Putti Research Center > via di Barbiano 1/10 > 40136 - Bologna - Italy > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6