Call for comments on analyzing aCGH data with huge number of probes on a single chromosome

0

Entering edit mode

pingzhao Hu ▴ 210

@pingzhao-hu-685

Last seen 11.4 years ago

Sean, Thanks! The gold is to identify copy number variation from normal human samples. I have tried CBS, cghFLasso (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxm013v1 ) our own method (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxl035v1 ), etc methods. Pingzhao At 11:45 AM 4/4/2008, Sean Davis wrote: >On Fri, Apr 4, 2008 at 11:38 AM, pingzhao Hu <phu at="" sickkids.ca=""> wrote: > > > > Hi All, > > I have a question about analyzing aCGH data with huge number of > > probes on a single chromosome. > > We have a set of customized NimbleGen aCGH human sample data. Each sample > > has 40 million probes. Even a single chromosome has >3M probes. > > > > I tried some R-based and Matlab-based aCGH analysis software to > > analyze just a single chromosome in > > a single sample using our supercomputer, but no hopes! Some software > > just show error messages (works fine for small > > data sets) and some software can not complete the analysis even after > > 1-2 days CPU time. > > > > I am wondering whether any people in the list have experience in > > analyzing the aCGH data with such a scale. > > If you have, can you share some your experience with me? > > > > Will it be a good idea to first divide the chromosome into some small > > pieces (say each pieice has 10,000 probes) and then run the algorithm > > on each piece of the chromosome? > >What are the goals of the analysis? What types of samples (cancer, >comparative genomics, normal DNA)? And what methods have you tried? > >Sean ======================================== Pingzhao Hu Statistical Analysis Facility The Centre for Applied Genomics (TCAG) The Hospital for Sick Children Research Institute MaRS Centre - East Tower 101 College Street, Room 15-705 Toronto, Ontario, M5G 1L7, Canada Tel.: (416) 813-7654 x6016 Email: phu at sickkids.ca Web: http://www.tcag.ca/statisticalAnalysis.html

aCGH aCGH aCGH aCGH • 1.6k views

ADD COMMENT • link 17.8 years ago pingzhao Hu ▴ 210

0

Entering edit mode

William Shannon ▴ 280

@william-shannon-1787

Last seen 11.4 years ago

I routinely use process control methods for analyzing aCGH data (and began using this with Nimblegen data where the number of probes overwhelmed the available R code designed for significantly less dense arrays). Process control can run through Nimblegen data in the matter of minutes (I use SAS for this however) for a chromosome and a few hours for a large number of samples. Basically the expression level for a copy number of 2 can be considered 'in control' and any amplified or deleted region 'out of control'. These methods have been developed and applied very productively over the last 50 years. The second step is to select out regions with 'special cause' called regions (process control jargon) and score them by a ratio of the MSE of the called region to the MSE of the adjoining in control regions where the MSE is calculated around the expression level for a normal 2 copies. Would be happy to send a manuscript if you email me at the address below. Thanks Bill Shannon, PhD Associate Professor of Biostatistics in Medicine Washington University School of Medicine wshannon at wustl.edu --- pingzhao Hu <phu at="" sickkids.ca=""> wrote: > > Sean, > Thanks! > The gold is to identify copy number variation from > normal human samples. > I have tried CBS, cghFLasso > (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxm013v1 ) > our own method > (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxl035v1 ), > > etc methods. > > Pingzhao > > > At 11:45 AM 4/4/2008, Sean Davis wrote: > >On Fri, Apr 4, 2008 at 11:38 AM, pingzhao Hu > <phu at="" sickkids.ca=""> wrote: > > > > > > Hi All, > > > I have a question about analyzing aCGH data > with huge number of > > > probes on a single chromosome. > > > We have a set of customized NimbleGen aCGH > human sample data. Each sample > > > has 40 million probes. Even a single chromosome > has >3M probes. > > > > > > I tried some R-based and Matlab-based aCGH > analysis software to > > > analyze just a single chromosome in > > > a single sample using our supercomputer, but no > hopes! Some software > > > just show error messages (works fine for small > > > data sets) and some software can not complete > the analysis even after > > > 1-2 days CPU time. > > > > > > I am wondering whether any people in the list > have experience in > > > analyzing the aCGH data with such a scale. > > > If you have, can you share some your experience > with me? > > > > > > Will it be a good idea to first divide the > chromosome into some small > > > pieces (say each pieice has 10,000 probes) and > then run the algorithm > > > on each piece of the chromosome? > > > >What are the goals of the analysis? What types of > samples (cancer, > >comparative genomics, normal DNA)? And what > methods have you tried? > > > >Sean > > > > ======================================== > Pingzhao Hu > Statistical Analysis Facility > The Centre for Applied Genomics (TCAG) > The Hospital for Sick Children Research Institute > MaRS Centre - East Tower > 101 College Street, Room 15-705 > Toronto, Ontario, M5G 1L7, Canada > Tel.: (416) 813-7654 x6016 > Email: phu at sickkids.ca > Web: http://www.tcag.ca/statisticalAnalysis.html > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 17.8 years ago William Shannon ▴ 280

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 8 days ago

United States

On Fri, Apr 4, 2008 at 12:09 PM, pingzhao Hu <phu at="" sickkids.ca=""> wrote: > > Sean, > Thanks! > The gold is to identify copy number variation from normal human samples. > I have tried CBS, cghFLasso > (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxm01 3v1) > our own method > (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxl03 5v1), > etc methods. You probably have a few options. First, you could try "smoothing" the data by using a moving window average or some such thing to reduce noise and reduce the number of probes. I think Nimblegen does this for data that they give back to customers when they do CGH for service. With the reduced-dimensionality data, you could then apply your method of choice. Obviously, you loose resolution doing this. Another alternative is an algorithm called "stepgram" developed by Doron Lipson. It is used in the CGHAnalytics commercial package available from Agilent (where it is called ADM-1). It is also available as a windows executable from here: http://bioinfo.cs.technion.ac.il/stepgram/ I have an R package that uses that algorithm that, unfortunately, I am not allowed to distribute. That said, it is by far the fastest algorithm that I have tested for CGH analysis. For comparison, for 200k probes, Stepgram runs in 4 seconds, aCGH in about 50 seconds, DNAcopy (CBS) and GLAD in about 400 seconds. Hope that helps, Sean > Pingzhao > > > At 11:45 AM 4/4/2008, Sean Davis wrote: > >On Fri, Apr 4, 2008 at 11:38 AM, pingzhao Hu <phu at="" sickkids.ca=""> wrote: > > > > > > Hi All, > > > I have a question about analyzing aCGH data with huge number of > > > probes on a single chromosome. > > > We have a set of customized NimbleGen aCGH human sample data. Each sample > > > has 40 million probes. Even a single chromosome has >3M probes. > > > > > > I tried some R-based and Matlab-based aCGH analysis software to > > > analyze just a single chromosome in > > > a single sample using our supercomputer, but no hopes! Some software > > > just show error messages (works fine for small > > > data sets) and some software can not complete the analysis even after > > > 1-2 days CPU time. > > > > > > I am wondering whether any people in the list have experience in > > > analyzing the aCGH data with such a scale. > > > If you have, can you share some your experience with me? > > > > > > Will it be a good idea to first divide the chromosome into some small > > > pieces (say each pieice has 10,000 probes) and then run the algorithm > > > on each piece of the chromosome? > > > >What are the goals of the analysis? What types of samples (cancer, > >comparative genomics, normal DNA)? And what methods have you tried? > > > >Sean > > > > ======================================== > Pingzhao Hu > Statistical Analysis Facility > The Centre for Applied Genomics (TCAG) > The Hospital for Sick Children Research Institute > MaRS Centre - East Tower > 101 College Street, Room 15-705 > Toronto, Ontario, M5G 1L7, Canada > Tel.: (416) 813-7654 x6016 > Email: phu at sickkids.ca > Web: http://www.tcag.ca/statisticalAnalysis.html > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 17.8 years ago Sean Davis 21k

0

Entering edit mode

pingzhao Hu ▴ 210

@pingzhao-hu-685

Last seen 11.4 years ago

Sean, Thanks,This is really helpful! I just test the chromosome with 3.5M probes in a single sample, it took less than 20 minutes to get the job done. Dr. Shannon, I also very thank for your useful comments! Have a great weekend. Pingzhao At 12:35 PM 4/4/2008, Sean Davis wrote: >On Fri, Apr 4, 2008 at 12:09 PM, pingzhao Hu <phu at="" sickkids.ca=""> wrote: > > > > Sean, > > Thanks! > > The gold is to identify copy number variation from normal human samples. > > I have tried CBS, cghFLasso > > (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxm 013v1) > > our own method > > (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxl 035v1), > > etc methods. > >You probably have a few options. First, you could try "smoothing" the >data by using a moving window average or some such thing to reduce >noise and reduce the number of probes. I think Nimblegen does this >for data that they give back to customers when they do CGH for >service. With the reduced-dimensionality data, you could then apply >your method of choice. Obviously, you loose resolution doing this. >Another alternative is an algorithm called "stepgram" developed by >Doron Lipson. It is used in the CGHAnalytics commercial package >available from Agilent (where it is called ADM-1). It is also >available as a windows executable from here: > >http://bioinfo.cs.technion.ac.il/stepgram/ > >I have an R package that uses that algorithm that, unfortunately, I am >not allowed to distribute. That said, it is by far the fastest >algorithm that I have tested for CGH analysis. For comparison, for >200k probes, Stepgram runs in 4 seconds, aCGH in about 50 seconds, >DNAcopy (CBS) and GLAD in about 400 seconds. > >Hope that helps, > >Sean > > > > Pingzhao > > > > > > At 11:45 AM 4/4/2008, Sean Davis wrote: > > >On Fri, Apr 4, 2008 at 11:38 AM, pingzhao Hu <phu at="" sickkids.ca=""> wrote: > > > > > > > > Hi All, > > > > I have a question about analyzing aCGH data with huge number of > > > > probes on a single chromosome. > > > > We have a set of customized NimbleGen aCGH human sample > data. Each sample > > > > has 40 million probes. Even a single chromosome has >3M probes. > > > > > > > > I tried some R-based and Matlab-based aCGH analysis software to > > > > analyze just a single chromosome in > > > > a single sample using our supercomputer, but no hopes! Some software > > > > just show error messages (works fine for small > > > > data sets) and some software can not complete the analysis even after > > > > 1-2 days CPU time. > > > > > > > > I am wondering whether any people in the list have experience in > > > > analyzing the aCGH data with such a scale. > > > > If you have, can you share some your experience with me? > > > > > > > > Will it be a good idea to first divide the chromosome into some small > > > > pieces (say each pieice has 10,000 probes) and then run the algorithm > > > > on each piece of the chromosome? > > > > > >What are the goals of the analysis? What types of samples (cancer, > > >comparative genomics, normal DNA)? And what methods have you tried? > > > > > >Sean > > > > > > > > ======================================== > > Pingzhao Hu > > Statistical Analysis Facility > > The Centre for Applied Genomics (TCAG) > > The Hospital for Sick Children Research Institute > > MaRS Centre - East Tower > > 101 College Street, Room 15-705 > > Toronto, Ontario, M5G 1L7, Canada > > Tel.: (416) 813-7654 x6016 > > Email: phu at sickkids.ca > > Web: http://www.tcag.ca/statisticalAnalysis.html > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ======================================== Pingzhao Hu Statistical Analysis Facility The Centre for Applied Genomics (TCAG) The Hospital for Sick Children Research Institute MaRS Centre - East Tower 101 College Street, Room 15-705 Toronto, Ontario, M5G 1L7, Canada Tel.: (416) 813-7654 x6016 Email: phu at sickkids.ca Web: http://www.tcag.ca/statisticalAnalysis.html

ADD COMMENT • link 17.8 years ago pingzhao Hu ▴ 210

Login before adding your answer.