Question

Combining replicate spots in CGH data

0

Entering edit mode

João Fadista ▴ 500

@joao-fadista-1942

Last seen 11.3 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061206/ 2d1a70be/attachment.pl

• 1.2k views

ADD COMMENT • link updated 19.0 years ago by Ramon Diaz ★ 1.1k • written 19.0 years ago by João Fadista ▴ 500

score 0 · Answer 1 · 2006-12-06

On Wednesday 06 December 2006 11:12, Jo?o Fadista wrote: > Dear all, > > I was wondering if there are other methods for combining replicate spots > other than the average or the median. I am asking this in concern with CGH > data analysis because I do not know how, and if, we can take advantage of > the genomic structure of the array CGH data for combining replicate spots. > > For the sake of the argument I put below two hypothetical examples: > - Combining replicate spots in a different way depending on what region of > the chromosome or genome they are; - Or give more weight to spots that we > know that have more reliability. I don't think there are included in Bioconductor. However, you can aggregate the data however you see fit, though it will mean writing some code to do so. Sean

score 0 · Answer 2 · 2006-12-07

0

Entering edit mode

Ramon Diaz ★ 1.1k

@ramon-diaz-159

Last seen 11.3 years ago

On Wednesday 06 December 2006 17:12, Jo?o Fadista wrote: > Dear all, > > I was wondering if there are other methods for combining replicate spots > other than the average or the median. I am asking this in concern with CGH > data analysis because I do not know how, and if, we can take advantage of > the genomic structure of the array CGH data for combining replicate spots. > > For the sake of the argument I put below two hypothetical examples: > - Combining replicate spots in a different way depending on what region of > the chromosome or genome they are; - Or give more weight to spots that we > know that have more reliability. > > Something like this if you know what I mean. Dear Joao, This is nothing ellaborate; just a couple of thoughts. 1. I assume you mean true replicate spots. In other words, these are the exact same DNA piece, and they map to exactly the same locations in the chromosome. 2. Ideally, I'd like a method that can deal with replicate spots without even asking you to take the mean or the median. One problem I find with means or medians is that, if you do not have the exact same number of replicates for all locations, then you are estimating a value that has different variances over different locations. I think (non-homogeneous) HMMs and related techniques are suited for dealing with arbitrary (and different) number of replicate spots: at location "t" you happen to have more than one observation, and you are fitting a model where those observed log ratios come from an emission function, blablabla. By not taking means/medians/whatever, you do not violate assumptions related to the variance of the emission functions. In other words, conditional on being on state "k" you are log ratios are, say, ~ N(mu, sigma). (I'll admit we have a "hidden agenda", with our RJaCGH package :-). R. > > > Best regards > > Jo?o Fadista > Ph.d. student > > > > Danish Institute of Agricultural Sciences > Research Centre Foulum > Dept. of Genetics and Biotechnology > Blichers All? 20, P.O. BOX 50 > DK-8830 Tjele > > Phone: +45 8999 1900 > Direct: +45 8999 8999 > E-mail: Joao.Fadista at agrsci.dk <mailto:joao.fadista at="" agrsci.dk=""> > Web: www.agrsci.org <http: www.agrsci.org=""/> > ________________________________ > > News and news media <http: www.agrsci.org="" navigation="" nyheder_og_presse=""> . > > This email may contain information that is confidential. Any use or > publication of this email without written permission from DIAS is not > allowed. If you are not the intended recipient, please notify DIAS > immediately and delete this email. > > > [[alternative HTML version deleted]] -- Ram?n D?az-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en s...{{dropped}}

ADD COMMENT • link 19.0 years ago Ramon Diaz ★ 1.1k

0

Entering edit mode

Dear Ramon, Thanks for the insights about the replicate spots. About the RJaCGH package, I would like to know what are the main features of your heterogeneous HMM algorithm. I am asking this because I would like to compare it with the only other heterogeneous HMM algorithm that I know that was made for CGH analysis. This algorithm is implemented in snapCGH package and it is called BioHMM. It incorporates the distance between clones into the model assigning a higher probability of state change to clones that are a larger distance apart on a chromosome. Best regards Jo?o Fadista Ph.d. student Danish Institute of Agricultural Sciences Research Centre Foulum Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50 DK-8830 Tjele Phone: +45 8999 1900 Direct: +45 8999 8999 E-mail: Joao.Fadista at agrsci.dk Web: http://www.agrsci.org This email may contain information that is confidential. Any use or publication of this email without written permission from DIAS is not allowed. If you are not the intended recipient, please notify DIAS immediately and delete this email. -----Original Message----- From: Ramon Diaz-Uriarte [mailto:rdiaz@cnio.es] Sent: Thursday, December 07, 2006 12:18 PM To: bioconductor at stat.math.ethz.ch Cc: Jo?o Fadista Subject: Re: [BioC] Combining replicate spots in CGH data On Wednesday 06 December 2006 17:12, Jo?o Fadista wrote: > Dear all, > > I was wondering if there are other methods for combining replicate > spots other than the average or the median. I am asking this in > concern with CGH data analysis because I do not know how, and if, we > can take advantage of the genomic structure of the array CGH data for combining replicate spots. > > For the sake of the argument I put below two hypothetical examples: > - Combining replicate spots in a different way depending on what > region of the chromosome or genome they are; - Or give more weight to > spots that we know that have more reliability. > > Something like this if you know what I mean. Dear Joao, This is nothing ellaborate; just a couple of thoughts. 1. I assume you mean true replicate spots. In other words, these are the exact same DNA piece, and they map to exactly the same locations in the chromosome. 2. Ideally, I'd like a method that can deal with replicate spots without even asking you to take the mean or the median. One problem I find with means or medians is that, if you do not have the exact same number of replicates for all locations, then you are estimating a value that has different variances over different locations. I think (non-homogeneous) HMMs and related techniques are suited for dealing with arbitrary (and different) number of replicate spots: at location "t" you happen to have more than one observation, and you are fitting a model where those observed log ratios come from an emission function, blablabla. By not taking means/medians/whatever, you do not violate assumptions related to the variance of the emission functions. In other words, conditional on being on state "k" you are log ratios are, say, ~ N(mu, sigma). (I'll admit we have a "hidden agenda", with our RJaCGH package :-). R. > > > Best regards > > Jo?o Fadista > Ph.d. student > > > > Danish Institute of Agricultural Sciences Research Centre Foulum > Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50 > DK-8830 Tjele > > Phone: +45 8999 1900 > Direct: +45 8999 8999 > E-mail: Joao.Fadista at agrsci.dk <mailto:joao.fadista at="" agrsci.dk=""> > Web: www.agrsci.org <http: www.agrsci.org=""/> > ________________________________ > > News and news media <http: www.agrsci.org="" navigation="" nyheder_og_presse=""> . > > This email may contain information that is confidential. Any use or > publication of this email without written permission from DIAS is not > allowed. If you are not the intended recipient, please notify DIAS > immediately and delete this email. > > > [[alternative HTML version deleted]] -- Ram?n D?az-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en s...{{dropped}}

ADD REPLY • link 19.0 years ago João Fadista ▴ 500

0

Entering edit mode

On Thursday 07 December 2006 13:55, Jo?o Fadista wrote: > Dear Ramon, > > Thanks for the insights about the replicate spots. > > About the RJaCGH package, I would like to know what are the main features > of your heterogeneous HMM algorithm. I am asking this because I would like > to compare it with the only other heterogeneous HMM algorithm that I know > that was made for CGH analysis. > > This algorithm is implemented in snapCGH package and it is called BioHMM. > It incorporates the distance between clones into the model assigning a > higher probability of state change to clones that are a larger distance > apart on a chromosome. > We use a Bayesian model fitted with MCMC and reversible jump, and incorporate uncertainty via Bayesian Model Averaging. There are several differences with BioHMM. First, because we use MCMC, BioHMM is a lot faster. However, RJaCGH provides posterior probabilities of alteration. Also, we use reversible jump (instead of an AIC-based approach as in BioHMM) for dealing with the unknown number of hidden states problem. I'd say these are the main differences. There are also some other differences in how the non-homogenous part is implemented, but I'd say these are minor compared to the previous ones. Further details, comparisons with BioHMM (and other methods), etc, are provided in the tech. report available from COBRA (http://biostats.bepress.com/cobra/ps/art9/) or from my web page (http://www.ligarto.org/rdiaz/Papers/rjhmm-report-plus-sup-mat.pdf). Best, R. > > Best regards > > Jo?o Fadista > Ph.d. student > > > Danish Institute of Agricultural Sciences > Research Centre Foulum > Dept. of Genetics and Biotechnology > Blichers All? 20, P.O. BOX 50 > DK-8830 Tjele > > Phone: +45 8999 1900 > Direct: +45 8999 8999 > > E-mail: Joao.Fadista at agrsci.dk > Web: http://www.agrsci.org > > This email may contain information that is confidential. > Any use or publication of this email without written permission from DIAS > is not allowed. If you are not the intended recipient, please notify DIAS > immediately and delete this email. > > > > > > -----Original Message----- > From: Ramon Diaz-Uriarte [mailto:rdiaz at cnio.es] > Sent: Thursday, December 07, 2006 12:18 PM > To: bioconductor at stat.math.ethz.ch > Cc: Jo?o Fadista > Subject: Re: [BioC] Combining replicate spots in CGH data > > On Wednesday 06 December 2006 17:12, Jo?o Fadista wrote: > > Dear all, > > > > I was wondering if there are other methods for combining replicate > > spots other than the average or the median. I am asking this in > > concern with CGH data analysis because I do not know how, and if, we > > can take advantage of the genomic structure of the array CGH data for > > combining replicate spots. > > > > For the sake of the argument I put below two hypothetical examples: > > - Combining replicate spots in a different way depending on what > > region of the chromosome or genome they are; - Or give more weight to > > spots that we know that have more reliability. > > > > Something like this if you know what I mean. > > Dear Joao, > > This is nothing ellaborate; just a couple of thoughts. > > 1. I assume you mean true replicate spots. In other words, these are the > exact same DNA piece, and they map to exactly the same locations in the > chromosome. > > 2. Ideally, I'd like a method that can deal with replicate spots without > even asking you to take the mean or the median. One problem I find with > means or medians is that, if you do not have the exact same number of > replicates for all locations, then you are estimating a value that has > different variances over different locations. > > I think (non-homogeneous) HMMs and related techniques are suited for > dealing with arbitrary (and different) number of replicate spots: at > location "t" you happen to have more than one observation, and you are > fitting a model where those observed log ratios come from an emission > function, blablabla. By not taking means/medians/whatever, you do not > violate assumptions related to the variance of the emission functions. In > other words, conditional on being on state "k" you are log ratios are, say, > ~ N(mu, sigma). > > > (I'll admit we have a "hidden agenda", with our RJaCGH package :-). > > R. > > > Best regards > > > > Jo?o Fadista > > Ph.d. student > > > > > > > > Danish Institute of Agricultural Sciences Research Centre Foulum > > Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50 > > DK-8830 Tjele > > > > Phone: +45 8999 1900 > > Direct: +45 8999 8999 > > E-mail: Joao.Fadista at agrsci.dk <mailto:joao.fadista at="" agrsci.dk=""> > > Web: www.agrsci.org <http: www.agrsci.org=""/> > > ________________________________ > > > > News and news media <http: www.agrsci.org="" navigation="" nyheder_og_presse=""> > > . > > > > This email may contain information that is confidential. Any use or > > publication of this email without written permission from DIAS is not > > allowed. If you are not the intended recipient, please notify DIAS > > immediately and delete this email. > > > > > > [[alternative HTML version deleted]] > > -- > Ram?n D?az-Uriarte > Bioinformatics > Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National > Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) > Fax: +-34-91-224-6972 > Phone: +-34-91-224-6900 > > http://ligarto.org/rdiaz > PGP KeyID: 0xE89B3462 > (http://ligarto.org/rdiaz/0xE89B3462.asc) > > > > **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso los > ficheros adjuntos, pueden contener informaci?n protegida para el uso > exclusivo de su destinatario. Se proh?be la distribuci?n, reproducci?n o > cualquier otro tipo de transmisi?n por parte de otra persona que no sea el > destinatario. Si usted recibe por error este correo, se ruega comunicarlo > al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This > email communication and any attachments may contain confidential and > privileged information for the sole use of the designated recipient named > above. Distribution, reproduction or any other use of this transmission by > any party other than the intended recipient is prohibited. If you are not > the intended recipient please contact the sender and delete all copies. -- Ram?n D?az-Uriarte Bioinformatics Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en s...{{dropped}}

ADD REPLY • link 19.0 years ago Ramon Diaz ★ 1.1k