DNACopy Question what is passed to the maploc argument.

0

Entering edit mode

Nikolas Balanis ▴ 30

@nikolas-balanis-6680

Last seen 9.6 years ago

Sorry to keep posting this, I am very new user of Bioconductor and I cant seem to find any clarification on this anywhere.. This is somewhat related to this question listed below, I just want some more clarification on using the CNA function. This is probably very elementary, I apologize in advance https://stat.ethz.ch/pipermail/bioconductor/2007-September/019138.html >From an agilent data set the SystematicName column is composed of elements of the form chr3:175483690-175485000. So for example for this element the chromosome # is 3, so the vector of these #s for all elements is passed as chrom in the CNA function. But what is the position numeric passed to maploc in the CNA function? From the documentation the maploc argument is "the locations of marker on the genome. Vector of length same as the number of rows of genomdat. This has to be numeric." So it has to be numeric from the DNAcopy documentation. Is it the first value 175483690, the second value 175485000 , or the average of the two 175484345, or something else that I am missing completely? I [[alternative HTML version deleted]]

DNAcopy DNAcopy • 1.8k views

ADD COMMENT • link updated 9.7 years ago by Valerie Obenchain ★ 6.8k • written 9.7 years ago by Nikolas Balanis ▴ 30

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.3 years ago

United States

Hi Nikolas, Have your tried contacting the authors of DNAcopy? I've cc'd Venkatraman. I don't have experience with this package but after reading Sean's response in the 2007 post and looking at CNA() my understanding is that 'maploc' should be the start position of the target sequence. In your case, 175483690. As an fyi, to see source code for any (non-generic) function just type the name. Sometimes useful if you have questions about how a variable is being handled ... >> CNA > function (genomdat, chrom, maploc, data.type = c("logratio", > "binary"), sampleid = NULL, presorted = FALSE) > { > if (is.data.frame(genomdat)) > genomdat <- as.matrix(genomdat) > if (!is.numeric(genomdat)) > stop("genomdat must be numeric") > if (!is.numeric(maploc)) > stop("maploc must be numeric") ... Valerie On 07/31/2014 08:12 AM, Nikolas Balanis wrote: > Sorry to keep posting this, I am very new user of Bioconductor and I cant > seem to find any clarification on this anywhere.. This is somewhat related > to this question listed below, I just want some more clarification on using > the CNA function. This is probably very elementary, I apologize in advance > > https://stat.ethz.ch/pipermail/bioconductor/2007-September/019138.html > >>From an agilent data set the SystematicName column is composed of elements > of the form chr3:175483690-175485000. So for example for this element the > chromosome # is 3, so the vector of these #s for all elements is passed as > chrom in the CNA function. But what is the position numeric passed to > maploc in the CNA function? From the documentation the maploc argument is > "the locations of marker on the genome. Vector of length same as the number > of rows of genomdat. This has to be numeric." So it has to be numeric from > the DNAcopy documentation. Is it the first value 175483690, the second > value 175485000 , or the average of the two 175484345, or something else > that I am missing completely? I > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158

ADD COMMENT • link 9.7 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Hi Nikolas, Here is what the help for CNA says: maploc: the locations of marker on the genome. Vector of length same as the number of rows of genomdat. This has to be numeric. That is, it is the genomic position at which the copy number information is being measured. In the link that you provided the probe on the array being referred to is chr3:175483690-175483749. Thus it measures the copy number information in a small interval encompassing the probe. However, as Valerie pointed out, maploc is a numeric variable used to represent this interval. Thus any value in the interval is valid for maploc as long as two different (overlapping) probes do not get the same position to represent it. Venkat -- Venkatraman E. Seshan, Ph.D. | Attending Biostatistician Director of Biostatistics Computer-Intensive Support Services Department of Epidemiology and Biostatistics | MSKCC 307 E 63rd St 3rd Floor Room 351 | New York, NY 10065 Phone: 646-735-8126 | Fax: 646-735-0010 On 7/31/14 12:54 PM, "Valerie Obenchain" <vobencha at="" fhcrc.org=""> wrote: >Hi Nikolas, > >Have your tried contacting the authors of DNAcopy? I've cc'd Venkatraman. > >I don't have experience with this package but after reading Sean's >response in the 2007 post and looking at CNA() my understanding is that >'maploc' should be the start position of the target sequence. In your >case, 175483690. > >As an fyi, to see source code for any (non-generic) function just type >the name. Sometimes useful if you have questions about how a variable is >being handled ... > >>> CNA >> function (genomdat, chrom, maploc, data.type = c("logratio", >> "binary"), sampleid = NULL, presorted = FALSE) >> { >> if (is.data.frame(genomdat)) >> genomdat <- as.matrix(genomdat) >> if (!is.numeric(genomdat)) >> stop("genomdat must be numeric") >> if (!is.numeric(maploc)) >> stop("maploc must be numeric") >... > >Valerie > > > >On 07/31/2014 08:12 AM, Nikolas Balanis wrote: >> Sorry to keep posting this, I am very new user of Bioconductor and I >>cant >> seem to find any clarification on this anywhere.. This is somewhat >>related >> to this question listed below, I just want some more clarification on >>using >> the CNA function. This is probably very elementary, I apologize in >>advance >> >> https://stat.ethz.ch/pipermail/bioconductor/2007-September/019138.html >> >>>From an agilent data set the SystematicName column is composed of >>>elements >> of the form chr3:175483690-175485000. So for example for this element >>the >> chromosome # is 3, so the vector of these #s for all elements is passed >>as >> chrom in the CNA function. But what is the position numeric passed to >> maploc in the CNA function? From the documentation the maploc argument >>is >> "the locations of marker on the genome. Vector of length same as the >>number >> of rows of genomdat. This has to be numeric." So it has to be numeric >>from >> the DNAcopy documentation. Is it the first value 175483690, the second >> value 175485000 , or the average of the two 175484345, or something else >> that I am missing completely? I >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >-- >Valerie Obenchain >Program in Computational Biology >Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N, Seattle, WA 98109 > >Email: vobencha at fhcrc.org >Phone: (206) 667-3158 > ===================================================================== Please note that this e-mail and any files transmitted from Memorial Sloan-Kettering Cancer Center may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer.

ADD REPLY • link 9.7 years ago SeshanV@mskcc.org ▴ 40

0

Entering edit mode

Thank you, this is kind of what i suspected. Basically I could map the positions to 1,2,3,4,5 etc and it would still work in segmenting, its just for determining relative positions that we will use later in segmentation. Obviously for it to be biologically relevant i want to choose a unique position within the chr3:175483690-175485000 interval so i can map it with some reference genome after segmentation. . This gets to my next question. Assuming you use the start position i.e for this element 175483690, when you do segmentation with this program each segment is from a probe start position to another probe start position. When you visualize the segment file along with a reference genome with lets say IGV for example you miss out on the knowledge that for each segment there actually is a little more on its end up to the stop position, Basically in this case each segment is shorter by the final probe length. The same would be true if you used stop position, you'd lose the knowledge that really there is some more sequence involved in the beginning of the segment. If this is correct then i fully understand it. Please let me know if this is confusing. On Thu, Jul 31, 2014 at 10:20 AM, <seshanv@mskcc.org> wrote: > Hi Nikolas, > > Here is what the help for CNA says: > > maploc: the locations of marker on the genome. Vector of length same > as the number of rows of genomdat. This has to be numeric. > > That is, it is the genomic position at which the copy number information > is being measured. > > In the link that you provided the probe on the array being referred to is > chr3:175483690-175483749. Thus it measures the copy number information in > a small interval encompassing the probe. However, as Valerie pointed out, > maploc is a numeric variable used to represent this interval. Thus any > value in the interval is valid for maploc as long as two different > (overlapping) probes do not get the same position to represent it. > > > Venkat > > -- > > Venkatraman E. Seshan, Ph.D. | Attending Biostatistician > Director of Biostatistics Computer-Intensive Support Services > Department of Epidemiology and Biostatistics | MSKCC > 307 E 63rd St 3rd Floor Room 351 | New York, NY 10065 > Phone: 646-735-8126 | Fax: 646-735-0010 > > > > > On 7/31/14 12:54 PM, "Valerie Obenchain" <vobencha@fhcrc.org> wrote: > > >Hi Nikolas, > > > >Have your tried contacting the authors of DNAcopy? I've cc'd Venkatraman. > > > >I don't have experience with this package but after reading Sean's > >response in the 2007 post and looking at CNA() my understanding is that > >'maploc' should be the start position of the target sequence. In your > >case, 175483690. > > > >As an fyi, to see source code for any (non-generic) function just type > >the name. Sometimes useful if you have questions about how a variable is > >being handled ... > > > >>> CNA > >> function (genomdat, chrom, maploc, data.type = c("logratio", > >> "binary"), sampleid = NULL, presorted = FALSE) > >> { > >> if (is.data.frame(genomdat)) > >> genomdat <- as.matrix(genomdat) > >> if (!is.numeric(genomdat)) > >> stop("genomdat must be numeric") > >> if (!is.numeric(maploc)) > >> stop("maploc must be numeric") > >... > > > >Valerie > > > > > > > >On 07/31/2014 08:12 AM, Nikolas Balanis wrote: > >> Sorry to keep posting this, I am very new user of Bioconductor and I > >>cant > >> seem to find any clarification on this anywhere.. This is somewhat > >>related > >> to this question listed below, I just want some more clarification on > >>using > >> the CNA function. This is probably very elementary, I apologize in > >>advance > >> > >> https://stat.ethz.ch/pipermail/bioconductor/2007-September/019138.html > >> > >>>From an agilent data set the SystematicName column is composed of > >>>elements > >> of the form chr3:175483690-175485000. So for example for this element > >>the > >> chromosome # is 3, so the vector of these #s for all elements is passed > >>as > >> chrom in the CNA function. But what is the position numeric passed to > >> maploc in the CNA function? From the documentation the maploc argument > >>is > >> "the locations of marker on the genome. Vector of length same as the > >>number > >> of rows of genomdat. This has to be numeric." So it has to be numeric > >>from > >> the DNAcopy documentation. Is it the first value 175483690, the second > >> value 175485000 , or the average of the two 175484345, or something else > >> that I am missing completely? I > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >>http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > >-- > >Valerie Obenchain > >Program in Computational Biology > >Fred Hutchinson Cancer Research Center > >1100 Fairview Ave. N, Seattle, WA 98109 > > > >Email: vobencha@fhcrc.org > >Phone: (206) 667-3158 > > > > > ===================================================================== > > > > Please note that this e-mail and any files transmitted from > > Memorial Sloan-Kettering Cancer Center may be privileged, > confidential, > > and protected from disclosure under applicable law. If the reader of > > this message is not the intended recipient, or an employee or agent > > responsible for delivering this message to the intended recipient, > > you are hereby notified that any reading, dissemination, distribution, > > copying, or other use of this communication or any of its attachments > > is strictly prohibited. If you have received this communication in > > error, please notify the sender immediately by replying to this > message > > and deleting this message, any attachments, and all copies and backups > > from your computer. > > [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago Nikolas Balanis ▴ 30

0

Entering edit mode

As you have figured, technically any numeric value that preserves the probe ordering along a chromosome is valid. You only need the actual value to map it back to the genome. Since you don't observe anything between the end of a probe and the start of the next probe, you can only say that the break happened somewhere in between. Now if you add in the measurements you can only say that the change is somewhere in the vicinity. Venkat From: Nikolas Balanis <ngb4@case.edu<mailto:ngb4@case.edu>> Date: Thursday, July 31, 2014 3:18 PM To: Venkatraman Seshan <seshanv@mskcc.org<mailto:seshanv@mskcc.org>> Cc: "vobencha@fhcrc.org<mailto:vobencha@fhcrc.org>" <vobencha@fhcrc.org<mailto:vobencha@fhcrc.org>>, "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] DNACopy Question what is passed to the maploc argument. Thank you, this is kind of what i suspected. Basically I could map the positions to 1,2,3,4,5 etc and it would still work in segmenting, its just for determining relative positions that we will use later in segmentation. Obviously for it to be biologically relevant i want to choose a unique position within the chr3:175483690-175485000 interval so i can map it with some reference genome after segmentation. . This gets to my next question. Assuming you use the start position i.e for this element 175483690, when you do segmentation with this program each segment is from a probe start position to another probe start position. When you visualize the segment file along with a reference genome with lets say IGV for example you miss out on the knowledge that for each segment there actually is a little more on its end up to the stop position, Basically in this case each segment is shorter by the final probe length. The same would be true if you used stop position, you'd lose the knowledge that really there is some more sequence involved in the beginning of the segment. If this is correct then i fully understand it. Please let me know if this is confusing. On Thu, Jul 31, 2014 at 10:20 AM, <seshanv@mskcc.org<mailto:seshanv@mskcc.org>> wrote: Hi Nikolas, Here is what the help for CNA says: maploc: the locations of marker on the genome. Vector of length same as the number of rows of genomdat. This has to be numeric. That is, it is the genomic position at which the copy number information is being measured. In the link that you provided the probe on the array being referred to is chr3:175483690-175483749. Thus it measures the copy number information in a small interval encompassing the probe. However, as Valerie pointed out, maploc is a numeric variable used to represent this interval. Thus any value in the interval is valid for maploc as long as two different (overlapping) probes do not get the same position to represent it. Venkat -- Venkatraman E. Seshan, Ph.D. | Attending Biostatistician Director of Biostatistics Computer-Intensive Support Services Department of Epidemiology and Biostatistics | MSKCC 307 E 63rd St 3rd Floor Room 351 | New York, NY 10065 Phone: 646-735-8126<tel:646-735-8126> | Fax: 646-735-0010<tel:646-735-0010> On 7/31/14 12:54 PM, "Valerie Obenchain" <vobencha@fhcrc.org<mailto:vobencha@fhcrc.org>> wrote: >Hi Nikolas, > >Have your tried contacting the authors of DNAcopy? I've cc'd Venkatraman. > >I don't have experience with this package but after reading Sean's >response in the 2007 post and looking at CNA() my understanding is that >'maploc' should be the start position of the target sequence. In your >case, 175483690. > >As an fyi, to see source code for any (non-generic) function just type >the name. Sometimes useful if you have questions about how a variable is >being handled ... > >>> CNA >> function (genomdat, chrom, maploc, data.type = c("logratio", >> "binary"), sampleid = NULL, presorted = FALSE) >> { >> if (is.data.frame(genomdat)) >> genomdat <- as.matrix(genomdat) >> if (!is.numeric(genomdat)) >> stop("genomdat must be numeric") >> if (!is.numeric(maploc)) >> stop("maploc must be numeric") >... > >Valerie > > > >On 07/31/2014 08:12 AM, Nikolas Balanis wrote: >> Sorry to keep posting this, I am very new user of Bioconductor and I >>cant >> seem to find any clarification on this anywhere.. This is somewhat >>related >> to this question listed below, I just want some more clarification on >>using >> the CNA function. This is probably very elementary, I apologize in >>advance >> >> https://stat.ethz.ch/pipermail/bioconductor/2007-September/019138.html >> >>>From an agilent data set the SystematicName column is composed of >>>elements >> of the form chr3:175483690-175485000. So for example for this element >>the >> chromosome # is 3, so the vector of these #s for all elements is passed >>as >> chrom in the CNA function. But what is the position numeric passed to >> maploc in the CNA function? From the documentation the maploc argument >>is >> "the locations of marker on the genome. Vector of length same as the >>number >> of rows of genomdat. This has to be numeric." So it has to be numeric >>from >> the DNAcopy documentation. Is it the first value 175483690, the second >> value 175485000 , or the average of the two 175484345, or something else >> that I am missing completely? I >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org<mailto:bioconductor@r-project.org> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >-- >Valerie Obenchain >Program in Computational Biology >Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N, Seattle, WA 98109 > >Email: vobencha@fhcrc.org<mailto:vobencha@fhcrc.org> >Phone: (206) 667-3158<tel:%28206%29%20667-3158> > ===================================================================== Please note that this e-mail and any files transmitted from Memorial Sloan-Kettering Cancer Center may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer. [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago SeshanV@mskcc.org ▴ 40

Login before adding your answer.