Circular binary segmentation for allele-specific CN data
2
0
Entering edit mode
Christine Ho ▴ 20
@christine-ho-4226
Last seen 9.7 years ago
Good afternoon! I hope this question is not redundant - I've tried searching the mailing list archives and doing a Google search. I've just finished using aroma.affymetrix() to produce allele-specific copy number estimates. So, right now, I've got the allele frequency data, i.e. what the vignettes call "fracB" data. I would like to run circular binary segmentation on this to find breakpoints (so I can identify regions of LOH), but it seems that all of the related packages on Bioconductor just segment aCGH data. So, I was wondering: are the segmentation algorithms in these packages (for ex., snapCGH) able to handle any dataset, or are they specific to aCGH data? If they are specific to aCGH data, would anyone happen to know where I can obtain code (or better yet, a package) for running CBS on any data set? Thank you for your time! I really appreciate it. Best, Christine Ho Graduate student in Statistics at UC Berkeley E-mail: cho at stat.berkeley.edu
aCGH aCGH aCGH aCGH • 1.5k views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 10 months ago
United States
CBS just does a segmentation on any kind of vector data using a conceptually simple algorithm, there is no reason why you couldn't run it on whatever data you have. Of course, your data might be behaving a bit different wrt. scale, outliers etc. that can all have effects on the quality of the segmentation. Often the devils are in the details. If you look at the structure of the DNA objects in DNAcopy it ought to be simple to see how you can put whatever you have in there. It is essentially a vector of positions and a matrix of data. Having said all of this, you should ask on the aroma.affymetrix email list, I am sure that there are people with experience with running CBS on alle-specific copy number. Kasper On Thu, Aug 19, 2010 at 3:54 PM, Christine Ho <cho at="" stat.berkeley.edu=""> wrote: > Good afternoon! > > I hope this question is not redundant - I've tried searching the mailing > list archives and doing a Google search. > > I've just finished using aroma.affymetrix() to produce allele- specific copy > number estimates. So, right now, I've got the allele frequency data, i.e. > what the vignettes call "fracB" data. I would like to run circular binary > segmentation on this to find breakpoints (so I can identify regions of LOH), > but it seems that all of the related packages on Bioconductor just segment > aCGH data. > > So, I was wondering: are the segmentation algorithms in these packages (for > ex., snapCGH) able to handle any dataset, or are they specific to aCGH data? > If they are specific to aCGH data, would anyone happen to know where I can > obtain code (or better yet, a package) for running CBS on any data set? > > Thank you for your time! I really appreciate it. > > Best, > Christine Ho > Graduate student in Statistics at UC Berkeley > E-mail: cho at stat.berkeley.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
@oosting-j-path-412
Last seen 9.7 years ago
The problem with segmenting fracB data is that it does not behave as a certain value + noise in segments. This violates the assumption for any segmentation algorithm. In normal samples neighboring SNPs can have fracB values of 0 (for AA genotype), 0.5 (AB) or 1 (BB). In the ideal situation you can filter out the uninformative SNPs (AA,BB) using a paired normal sample. Then you have to transform the fracB from the informative heterozygous SNPs data so it changes in 1 direction when genomic rearrangements occur, and after that you can apply the CBS to the remaining data. Corver et.al. Cancer Res 2008; 68: (24). December 15, 2008 describes a method to transform fracB-like data so it can be segmented. Jan > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- > bounces at stat.math.ethz.ch] On Behalf Of Christine Ho > Sent: donderdag 19 augustus 2010 21:55 > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Circular binary segmentation for allele-specific CN data > > Good afternoon! > > I hope this question is not redundant - I've tried searching the > mailing list archives and doing a Google search. > > I've just finished using aroma.affymetrix() to produce allele- specific > copy number estimates. So, right now, I've got the allele frequency > data, i.e. what the vignettes call "fracB" data. I would like to run > circular binary segmentation on this to find breakpoints (so I can > identify regions of LOH), but it seems that all of the related > packages on Bioconductor just segment aCGH data. > > So, I was wondering: are the segmentation algorithms in these packages > (for ex., snapCGH) able to handle any dataset, or are they specific to > aCGH data? If they are specific to aCGH data, would anyone happen to > know where I can obtain code (or better yet, a package) for running > CBS on any data set? > > Thank you for your time! I really appreciate it. > > Best, > Christine Ho > Graduate student in Statistics at UC Berkeley > E-mail: cho at stat.berkeley.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi everyone, Thank you for your constructive replies. I followed Kasper's suggestion of using the package DNAcopy. My initial problem with aroma.affymetrix was just that I could not figure out how to get my data in the correct format to apply their function CbsModel. But DNAcopy does the trick nicely with its function "segment." Jan, thanks for the help with the fracB data. Weeks ago, I ran TumorBoost to reduce the signal-to-noise ratio in the fracB data. Then, I used aroma.affymetrix to extract all of the informative SNPs (AB, BB). After this, I took these allele frequencies and subtracted by 0.5, then I took the max of this difference and 0 (so I only retained the bands above 0.5). I ran CBS on this, even with the noise at 0 (after doing the subtraction), since not everything that "should" be at 0.5 is actually at 0.5. I'm comparing the regions it found with data from exome capture, and it seemed to have performed satisfactorily. Thanks again! I appreciate all of the help I've received on this mailing list. Christine On Aug 23, 2010, at 12:49 AM, <j.oosting at="" lumc.nl=""> <j.oosting at="" lumc.nl=""> wrote: > The problem with segmenting fracB data is that it does not behave as a > certain value + noise in segments. This violates the assumption for > any > segmentation algorithm. > In normal samples neighboring SNPs can have fracB values of 0 (for AA > genotype), 0.5 (AB) or 1 (BB). > In the ideal situation you can filter out the uninformative SNPs > (AA,BB) > using a paired normal sample. Then you have to transform the fracB > from > the informative heterozygous SNPs data so it changes in 1 direction > when > genomic rearrangements occur, and after that you can apply the CBS to > the remaining data. > Corver et.al. Cancer Res 2008; 68: (24). December 15, 2008 describes a > method to transform fracB-like data so it can be segmented. > > > Jan > >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- >> bounces at stat.math.ethz.ch] On Behalf Of Christine Ho >> Sent: donderdag 19 augustus 2010 21:55 >> To: bioconductor at stat.math.ethz.ch >> Subject: [BioC] Circular binary segmentation for allele-specific CN > data >> >> Good afternoon! >> >> I hope this question is not redundant - I've tried searching the >> mailing list archives and doing a Google search. >> >> I've just finished using aroma.affymetrix() to produce allele- >> specific >> copy number estimates. So, right now, I've got the allele frequency >> data, i.e. what the vignettes call "fracB" data. I would like to run >> circular binary segmentation on this to find breakpoints (so I can >> identify regions of LOH), but it seems that all of the related >> packages on Bioconductor just segment aCGH data. >> >> So, I was wondering: are the segmentation algorithms in these >> packages >> (for ex., snapCGH) able to handle any dataset, or are they specific >> to >> aCGH data? If they are specific to aCGH data, would anyone happen to >> know where I can obtain code (or better yet, a package) for running >> CBS on any data set? >> >> Thank you for your time! I really appreciate it. >> >> Best, >> Christine Ho >> Graduate student in Statistics at UC Berkeley >> E-mail: cho at stat.berkeley.edu >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6