Statistics for Diagnostic Microarrays

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 10.6 years ago

Hi Obviously the greatest use for Microarrays is for gene expression studies, but increasingly scientists wish to use Microarrays for a variety of diagnostic studies, which centre more around "Is it there or not?" type questions rather than "How much of it is there?". Does anyone know of any statistical tools or software that can be used specifically for diagnostic microarrays? Thanks Mick

• 1.4k views

ADD COMMENT • link updated 20.8 years ago by A.J. Rossini ▴ 810 • written 20.8 years ago by michael watson IAH-C ★ 3.4k

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 10.6 years ago

Dear Mick, I think there is a gold field of opportunities for statistics in this field. With more and more companies advertising disease-specific chips, there are still questions to be answers, namely : a) gene selection : Only several hundreds or thousands of genes are going to be selected for their discriminating ability. b) normalisation : The assumption that majority (90-95%) of the genes unchanged will not hold here. If you are going to use "housekeeping" genes, which ones to use and how to use them. So far, the main normalisation methods (justifiably) ignore housekeeping genes as they vary from sample to sample. c) multiple spots : If you are going to spot, say 2000 genes, then you can spot 10 of each at random positions on the chip. This not only affects the normalisation (highly correlated spots) but also the analysis aspect (is there a better approach than averaging?). d) classification : How does one assign the probability that a patient has a disease given the expression profile of thousands of genes. I think we may require pattern recognition techniques or machine learning approaches and a large enough learning set. e) better classification : Is the diagnostic chip better than existing tests (if any) and is it cost efficient. Sorry for pointing out more questions than answers but I feel that more people should be be asking these before buying/designing a designer boutique arrays. I think what people are currently doing is using microarrays as filtering tool along with other knowledge to obtain a marker gene/protien that they can easily test for. The relevant key word is metabolonomics. HTH, Adai. On Thu, 2004-07-08 at 09:12, michael watson (IAH-C) wrote: > Hi > > Obviously the greatest use for Microarrays is for gene expression > studies, but increasingly scientists wish to use Microarrays for a > variety of diagnostic studies, which centre more around "Is it there or > not?" type questions rather than "How much of it is there?". Does > anyone know of any statistical tools or software that can be used > specifically for diagnostic microarrays? > > Thanks > > Mick > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.8 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 10.6 years ago

Actually, a lot of the work for pattern recognition is already there - from classical statistics and from use with proteomics data: -----Original Message----- From: Adaikalavan Ramasamy [mailto:ramasamy@cancer.org.uk] Sent: 08 July 2004 13:37 To: michael watson (IAH-C) Cc: BioConductor mailing list Subject: Re: [BioC] Statistics for Diagnostic Microarrays Dear Mick, I think there is a gold field of opportunities for statistics in this field. With more and more companies advertising disease-specific chips, there are still questions to be answers, namely : a) gene selection : Only several hundreds or thousands of genes are going to be selected for their discriminating ability. b) normalisation : The assumption that majority (90-95%) of the genes unchanged will not hold here. If you are going to use "housekeeping" genes, which ones to use and how to use them. So far, the main normalisation methods (justifiably) ignore housekeeping genes as they vary from sample to sample. c) multiple spots : If you are going to spot, say 2000 genes, then you can spot 10 of each at random positions on the chip. This not only affects the normalisation (highly correlated spots) but also the analysis aspect (is there a better approach than averaging?). d) classification : How does one assign the probability that a patient has a disease given the expression profile of thousands of genes. I think we may require pattern recognition techniques or machine learning approaches and a large enough learning set. e) better classification : Is the diagnostic chip better than existing tests (if any) and is it cost efficient. Sorry for pointing out more questions than answers but I feel that more people should be be asking these before buying/designing a designer boutique arrays. I think what people are currently doing is using microarrays as filtering tool along with other knowledge to obtain a marker gene/protien that they can easily test for. The relevant key word is metabolonomics. HTH, Adai. On Thu, 2004-07-08 at 09:12, michael watson (IAH-C) wrote: > Hi > > Obviously the greatest use for Microarrays is for gene expression > studies, but increasingly scientists wish to use Microarrays for a > variety of diagnostic studies, which centre more around "Is it there > or not?" type questions rather than "How much of it is there?". Does > anyone know of any statistical tools or software that can be used > specifically for diagnostic microarrays? > > Thanks > > Mick > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.8 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 10.6 years ago

I'll try not to accidentally press "Send" this time! Actually, a lot of the work for pattern recognition is already there - from classical statistics and from use with proteomics data: http://bioinformatics.med.yale.edu/proteomics/BioSupp1.html Strangely I am not so worried about this. What does seem to be missing, as you have pointed out, is things like normalisation techniques, and also answering the question "is this a real signal or just background noise?". Some diagnostic uses of microarrays simply ask the question whether an mRNA is present or not in the sample. This could be a problem with low copy mRNAs. Presumably, to answer this question, one must have a robust distribution of values of what a spot looks like when the mRNA is ABSENT, and then you would compare your normalised observed intensity value against this distribution to decide whether that mRNA is there or not. Again thoughts and comments are appreciated Mick -----Original Message----- From: michael watson (IAH-C) Sent: 08 July 2004 13:42 To: 'Adaikalavan Ramasamy' Cc: BioConductor mailing list Subject: RE: [BioC] Statistics for Diagnostic Microarrays Actually, a lot of the work for pattern recognition is already there - from classical statistics and from use with proteomics data: -----Original Message----- From: Adaikalavan Ramasamy [mailto:ramasamy@cancer.org.uk] Sent: 08 July 2004 13:37 To: michael watson (IAH-C) Cc: BioConductor mailing list Subject: Re: [BioC] Statistics for Diagnostic Microarrays Dear Mick, I think there is a gold field of opportunities for statistics in this field. With more and more companies advertising disease-specific chips, there are still questions to be answers, namely : a) gene selection : Only several hundreds or thousands of genes are going to be selected for their discriminating ability. b) normalisation : The assumption that majority (90-95%) of the genes unchanged will not hold here. If you are going to use "housekeeping" genes, which ones to use and how to use them. So far, the main normalisation methods (justifiably) ignore housekeeping genes as they vary from sample to sample. c) multiple spots : If you are going to spot, say 2000 genes, then you can spot 10 of each at random positions on the chip. This not only affects the normalisation (highly correlated spots) but also the analysis aspect (is there a better approach than averaging?). d) classification : How does one assign the probability that a patient has a disease given the expression profile of thousands of genes. I think we may require pattern recognition techniques or machine learning approaches and a large enough learning set. e) better classification : Is the diagnostic chip better than existing tests (if any) and is it cost efficient. Sorry for pointing out more questions than answers but I feel that more people should be be asking these before buying/designing a designer boutique arrays. I think what people are currently doing is using microarrays as filtering tool along with other knowledge to obtain a marker gene/protien that they can easily test for. The relevant key word is metabolonomics. HTH, Adai. On Thu, 2004-07-08 at 09:12, michael watson (IAH-C) wrote: > Hi > > Obviously the greatest use for Microarrays is for gene expression > studies, but increasingly scientists wish to use Microarrays for a > variety of diagnostic studies, which centre more around "Is it there > or not?" type questions rather than "How much of it is there?". Does > anyone know of any statistical tools or software that can be used > specifically for diagnostic microarrays? > > Thanks > > Mick > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.8 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

A.J. Rossini ▴ 810

@aj-rossini-209

Last seen 10.6 years ago

Sure, but then you've got a high-dimensional "diagnostic statistics" problem; these are still not fully worked out, though see Margaret Pepe's recent book on the topic for a start. best, -tony "michael watson (IAH-C)" <michael.watson@bbsrc.ac.uk> writes: > Actually, a lot of the work for pattern recognition is already there - > from classical statistics and from use with proteomics data: > > > -----Original Message----- > From: Adaikalavan Ramasamy [mailto:ramasamy@cancer.org.uk] > Sent: 08 July 2004 13:37 > To: michael watson (IAH-C) > Cc: BioConductor mailing list > Subject: Re: [BioC] Statistics for Diagnostic Microarrays > > > Dear Mick, > > I think there is a gold field of opportunities for statistics in this > field. With more and more companies advertising disease-specific chips, > there are still questions to be answers, namely : > > a) gene selection : Only several hundreds or thousands of genes are > going to be selected for their discriminating ability. > > b) normalisation : The assumption that majority (90-95%) of the genes > unchanged will not hold here. If you are going to use "housekeeping" > genes, which ones to use and how to use them. So far, the main > normalisation methods (justifiably) ignore housekeeping genes as they > vary from sample to sample. > > c) multiple spots : If you are going to spot, say 2000 genes, then you > can spot 10 of each at random positions on the chip. This not only > affects the normalisation (highly correlated spots) but also the > analysis aspect (is there a better approach than averaging?). > > d) classification : How does one assign the probability that a patient > has a disease given the expression profile of thousands of genes. I > think we may require pattern recognition techniques or machine learning > approaches and a large enough learning set. > > e) better classification : Is the diagnostic chip better than existing > tests (if any) and is it cost efficient. > > Sorry for pointing out more questions than answers but I feel that more > people should be be asking these before buying/designing a designer > boutique arrays. > > I think what people are currently doing is using microarrays as > filtering tool along with other knowledge to obtain a marker > gene/protien that they can easily test for. The relevant key word is > metabolonomics. > > HTH, Adai. > > > On Thu, 2004-07-08 at 09:12, michael watson (IAH-C) wrote: >> Hi >> >> Obviously the greatest use for Microarrays is for gene expression >> studies, but increasingly scientists wish to use Microarrays for a >> variety of diagnostic studies, which centre more around "Is it there >> or not?" type questions rather than "How much of it is there?". Does >> anyone know of any statistical tools or software that can be used >> specifically for diagnostic microarrays? >> >> Thanks >> >> Mick >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > -- Anthony Rossini Research Associate Professor rossini@u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

ADD COMMENT • link 20.8 years ago A.J. Rossini ▴ 810

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 10.6 years ago

Of course I agree - things are certainly not clear cut in this area! However, I would like to see the simpler problem of normalisation for diagnostic arrays solved first :-) -----Original Message----- From: A.J. Rossini [mailto:rossini@blindglobe.net] Sent: 08 July 2004 13:49 To: michael watson (IAH-C) Cc: Adaikalavan Ramasamy; BioConductor mailing list Subject: Re: [BioC] Statistics for Diagnostic Microarrays Sure, but then you've got a high-dimensional "diagnostic statistics" problem; these are still not fully worked out, though see Margaret Pepe's recent book on the topic for a start. best, -tony "michael watson (IAH-C)" <michael.watson@bbsrc.ac.uk> writes: > Actually, a lot of the work for pattern recognition is already there - > from classical statistics and from use with proteomics data: > > > -----Original Message----- > From: Adaikalavan Ramasamy [mailto:ramasamy@cancer.org.uk] > Sent: 08 July 2004 13:37 > To: michael watson (IAH-C) > Cc: BioConductor mailing list > Subject: Re: [BioC] Statistics for Diagnostic Microarrays > > > Dear Mick, > > I think there is a gold field of opportunities for statistics in this > field. With more and more companies advertising disease-specific > chips, there are still questions to be answers, namely : > > a) gene selection : Only several hundreds or thousands of genes are > going to be selected for their discriminating ability. > > b) normalisation : The assumption that majority (90-95%) of the genes > unchanged will not hold here. If you are going to use "housekeeping" > genes, which ones to use and how to use them. So far, the main > normalisation methods (justifiably) ignore housekeeping genes as they > vary from sample to sample. > > c) multiple spots : If you are going to spot, say 2000 genes, then you > can spot 10 of each at random positions on the chip. This not only > affects the normalisation (highly correlated spots) but also the > analysis aspect (is there a better approach than averaging?). > > d) classification : How does one assign the probability that a patient > has a disease given the expression profile of thousands of genes. I > think we may require pattern recognition techniques or machine > learning approaches and a large enough learning set. > > e) better classification : Is the diagnostic chip better than existing > tests (if any) and is it cost efficient. > > Sorry for pointing out more questions than answers but I feel that > more people should be be asking these before buying/designing a > designer boutique arrays. > > I think what people are currently doing is using microarrays as > filtering tool along with other knowledge to obtain a marker > gene/protien that they can easily test for. The relevant key word is > metabolonomics. > > HTH, Adai. > > > On Thu, 2004-07-08 at 09:12, michael watson (IAH-C) wrote: >> Hi >> >> Obviously the greatest use for Microarrays is for gene expression >> studies, but increasingly scientists wish to use Microarrays for a >> variety of diagnostic studies, which centre more around "Is it there >> or not?" type questions rather than "How much of it is there?". Does >> anyone know of any statistical tools or software that can be used >> specifically for diagnostic microarrays? >> >> Thanks >> >> Mick >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > -- Anthony Rossini Research Associate Professor rossini@u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

ADD COMMENT • link 20.8 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

Surely, you mean "more fundamental" and not "simpler". I agree with you that there are very good works in classical pattern recognition. Building a classifier that works well on one dataset is not too difficult. What is more difficult is its ability to be generalised to whole population or sub-populations. Rigorous testing for large enough sample size is important. This is important if microarrays are to be part of the standard diagnostic kit. If this was a drug, there are various organisations policing its testing and sales. On Thu, 2004-07-08 at 13:51, michael watson (IAH-C) wrote: > Of course I agree - things are certainly not clear cut in this area! > However, I would like to see the simpler problem of normalisation for > diagnostic arrays solved first :-) > > -----Original Message----- > From: A.J. Rossini [mailto:rossini@blindglobe.net] > Sent: 08 July 2004 13:49 > To: michael watson (IAH-C) > Cc: Adaikalavan Ramasamy; BioConductor mailing list > Subject: Re: [BioC] Statistics for Diagnostic Microarrays > > > > Sure, but then you've got a high-dimensional "diagnostic statistics" > problem; these are still not fully worked out, though see Margaret > Pepe's recent book on the topic for a start. > > best, > -tony > > > "michael watson (IAH-C)" <michael.watson@bbsrc.ac.uk> writes: > > > Actually, a lot of the work for pattern recognition is already there - > > > from classical statistics and from use with proteomics data: > > > > > > -----Original Message----- > > From: Adaikalavan Ramasamy [mailto:ramasamy@cancer.org.uk] > > Sent: 08 July 2004 13:37 > > To: michael watson (IAH-C) > > Cc: BioConductor mailing list > > Subject: Re: [BioC] Statistics for Diagnostic Microarrays > > > > > > Dear Mick, > > > > I think there is a gold field of opportunities for statistics in this > > field. With more and more companies advertising disease-specific > > chips, there are still questions to be answers, namely : > > > > a) gene selection : Only several hundreds or thousands of genes are > > going to be selected for their discriminating ability. > > > > b) normalisation : The assumption that majority (90-95%) of the genes > > > unchanged will not hold here. If you are going to use "housekeeping" > > genes, which ones to use and how to use them. So far, the main > > normalisation methods (justifiably) ignore housekeeping genes as they > > vary from sample to sample. > > > > c) multiple spots : If you are going to spot, say 2000 genes, then you > > > can spot 10 of each at random positions on the chip. This not only > > affects the normalisation (highly correlated spots) but also the > > analysis aspect (is there a better approach than averaging?). > > > > d) classification : How does one assign the probability that a patient > > > has a disease given the expression profile of thousands of genes. I > > think we may require pattern recognition techniques or machine > > learning approaches and a large enough learning set. > > > > e) better classification : Is the diagnostic chip better than existing > > > tests (if any) and is it cost efficient. > > > > Sorry for pointing out more questions than answers but I feel that > > more people should be be asking these before buying/designing a > > designer boutique arrays. > > > > I think what people are currently doing is using microarrays as > > filtering tool along with other knowledge to obtain a marker > > gene/protien that they can easily test for. The relevant key word is > > metabolonomics. > > > > HTH, Adai. > > > > > > On Thu, 2004-07-08 at 09:12, michael watson (IAH-C) wrote: > >> Hi > >> > >> Obviously the greatest use for Microarrays is for gene expression > >> studies, but increasingly scientists wish to use Microarrays for a > >> variety of diagnostic studies, which centre more around "Is it there > >> or not?" type questions rather than "How much of it is there?". Does > > >> anyone know of any statistical tools or software that can be used > >> specifically for diagnostic microarrays? > >> > >> Thanks > >> > >> Mick > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > >> > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > >

ADD REPLY • link 20.8 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

A.J. Rossini ▴ 810

@aj-rossini-209

Last seen 10.6 years ago

Adaikalavan Ramasamy <ramasamy@cancer.org.uk> writes: > If this was a drug, there are various organisations policing its testing > and sales. At least in the USA, the FDA regulates some diagnostic tests (and at least 1-2 years ago, was starting to evaluate some proposed tests using high-throughput gene expression technologies). best, -tony -- Anthony Rossini Research Associate Professor rossini@u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

ADD COMMENT • link 20.8 years ago A.J. Rossini ▴ 810

Login before adding your answer.