Query

0

Entering edit mode

Yogesh Kumar ▴ 10

@yogesh-kumar-3944

Last seen 11.3 years ago

Respected Sir/Madam, Good morning, I just started my PhD in Next generation sequencing datasets.. I have one query , can you help me to handle this query. we have sequence of 1MB region from Human chr 22 (haploid). we want to cut the length of max 450-500bp ( given use Normal distribution, Mean 500 and SD=?). *I want to know how can we generate reads from our sequence* ( paired end of 75bp at both ends, Assuming perfect reads). we want to do simulation of NGS run and reassemble it again without using reference sequence. Can we use R for simulation work also? if yes please provide me which package we can use for it and possible send me R script too. 1. To establish simulation of reassembling sequence from NGS data. This will build from re-assembling a simple sequence of 1 Mb with no repeats in the haploid state, to inclusion of genetic variation and polyploidy. -simulate a NGS run from a 1 Mb segment of human with little/no repeats. Average fragment size 500 bp with normal distribution. Paired end with 75 bp reads. Assume perfect sequencing. Check out other simulation methods - align the reads back to the 1 Mb sequence. How much variation in coverage - reassemble the reads WITHOUT using the reference sequence. http://maq.sourceforge.net/maq-manpage.shtml#7 I think we can use above link but not be able to work out . can you check it and send me solution Regards Yogesh Kumar University of Otago Dunedin, NZ 0064226149242 [[alternative HTML version deleted]]

Sequencing Sequencing • 1.6k views

ADD COMMENT • link updated 15.8 years ago by Martin Morgan 25k • written 15.8 years ago by Yogesh Kumar ▴ 10

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 11 months ago

United States

On 02/23/2010 09:43 PM, Yogesh Kumar wrote: > Respected Sir/Madam, > > Good morning, I just started my PhD in Next generation sequencing datasets.. > I have one query , can you help me to handle this query. > > we have sequence of 1MB region from Human chr 22 (haploid). we want to cut > the length of max 450-500bp ( given use Normal distribution, Mean 500 and > SD=?). *I want to know how can we generate reads from our sequence* ( > paired end of 75bp at both ends, Assuming perfect reads). we want to do > simulation of NGS run and reassemble it again without using reference > sequence. Can we use R for simulation work also? if yes please provide me > which package we can use for it and possible send me R script too. Good morning -- >From here http://bioconductor.org/packages/release/BiocViews.html The BSgenome software package and the appropriate BSgenome annotation (e.g., BSgenome.Hsapiens.UCSC.hg19) would be a good starting point for reference sequence. The Biostrings and IRanges software packages, especially 'Views', coupled with standard R random number generators e.g., rnorm would be a good place to simulate reads. These could be written as fasta files. Basic code might be library(BSgenome.Hsapiens.UCSC.hg19) N <- 1e8 # 10M reads start <- 5e7 + runif(N, 0, 1e7) # start of fragment, genome coords width <- rnorm(N, 500, 50) # width of fragment v <- Views(Hsapiens[["chr1"]], start=start, width=width) Each element of v is a fragment r1 <- narrow(v, 1, 80) r2 <- narrow(v, width(v) - 80 + 1, width(v)) r1 and r2 are the pairs of each read. The 10M reads didn't quite fit in my 8Gb of RAM, so likely I'd need to generate reads in batches of, e.g., 5M reads. Good luck. Martin > > 1. To establish simulation of reassembling sequence from NGS data. This will > build from re-assembling a simple sequence of 1 Mb with no repeats in the > haploid state, to inclusion of genetic variation and polyploidy. > -simulate a NGS run from a 1 Mb segment of human with little/no > repeats. Average fragment size 500 bp with normal distribution. Paired end > with 75 bp reads. Assume perfect sequencing. Check out other simulation > methods > - align the reads back to the 1 Mb sequence. How much variation in > coverage > - reassemble the reads WITHOUT using the reference sequence. > > > > > http://maq.sourceforge.net/maq-manpage.shtml#7 > > I think we can use above link but not be able to work out . can you check it > and send me solution > > Regards > Yogesh Kumar > University of Otago > Dunedin, NZ > 0064226149242 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 15.8 years ago Martin Morgan 25k

0

Entering edit mode

I would say at this point that having a problem and then asking a mailing list how to solve it does not constitute a PhD :) I hope you attribute the code below to Martin in your thesis. Like many people on this list, I learned R all by myself, and the ability to learn new technologies is a valuable skill you will need when you become a researcher. Mailing lists are there to help, but one would hope you'd spent a few weeks trying to do it yourself in R before asking the mailing list :) ________________________________________ From: bioconductor-bounces@stat.math.ethz.ch [bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Martin Morgan [mtmorgan@fhcrc.org] Sent: 24 February 2010 14:53 To: Yogesh Kumar Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Query On 02/23/2010 09:43 PM, Yogesh Kumar wrote: > Respected Sir/Madam, > > Good morning, I just started my PhD in Next generation sequencing datasets.. > I have one query , can you help me to handle this query. > > we have sequence of 1MB region from Human chr 22 (haploid). we want to cut > the length of max 450-500bp ( given use Normal distribution, Mean 500 and > SD=?). *I want to know how can we generate reads from our sequence* ( > paired end of 75bp at both ends, Assuming perfect reads). we want to do > simulation of NGS run and reassemble it again without using reference > sequence. Can we use R for simulation work also? if yes please provide me > which package we can use for it and possible send me R script too. Good morning -- >From here http://bioconductor.org/packages/release/BiocViews.html The BSgenome software package and the appropriate BSgenome annotation (e.g., BSgenome.Hsapiens.UCSC.hg19) would be a good starting point for reference sequence. The Biostrings and IRanges software packages, especially 'Views', coupled with standard R random number generators e.g., rnorm would be a good place to simulate reads. These could be written as fasta files. Basic code might be library(BSgenome.Hsapiens.UCSC.hg19) N <- 1e8 # 10M reads start <- 5e7 + runif(N, 0, 1e7) # start of fragment, genome coords width <- rnorm(N, 500, 50) # width of fragment v <- Views(Hsapiens[["chr1"]], start=start, width=width) Each element of v is a fragment r1 <- narrow(v, 1, 80) r2 <- narrow(v, width(v) - 80 + 1, width(v)) r1 and r2 are the pairs of each read. The 10M reads didn't quite fit in my 8Gb of RAM, so likely I'd need to generate reads in batches of, e.g., 5M reads. Good luck. Martin > > 1. To establish simulation of reassembling sequence from NGS data. This will > build from re-assembling a simple sequence of 1 Mb with no repeats in the > haploid state, to inclusion of genetic variation and polyploidy. > -simulate a NGS run from a 1 Mb segment of human with little/no > repeats. Average fragment size 500 bp with normal distribution. Paired end > with 75 bp reads. Assume perfect sequencing. Check out other simulation > methods > - align the reads back to the 1 Mb sequence. How much variation in > coverage > - reassemble the reads WITHOUT using the reference sequence. > > > > > http://maq.sourceforge.net/maq-manpage.shtml#7 > > I think we can use above link but not be able to work out . can you check it > and send me solution > > Regards > Yogesh Kumar > University of Otago > Dunedin, NZ > 0064226149242 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 15.8 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 11.3 years ago

I believe maq has the facility to simulate NGS reads http://maq.sourceforge.net/ ________________________________________ From: bioconductor-bounces@stat.math.ethz.ch [bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Yogesh Kumar [yogesh.srmc@gmail.com] Sent: 24 February 2010 05:43 To: bioconductor at stat.math.ethz.ch Subject: [BioC] Query Respected Sir/Madam, Good morning, I just started my PhD in Next generation sequencing datasets.. I have one query , can you help me to handle this query. we have sequence of 1MB region from Human chr 22 (haploid). we want to cut the length of max 450-500bp ( given use Normal distribution, Mean 500 and SD=?). *I want to know how can we generate reads from our sequence* ( paired end of 75bp at both ends, Assuming perfect reads). we want to do simulation of NGS run and reassemble it again without using reference sequence. Can we use R for simulation work also? if yes please provide me which package we can use for it and possible send me R script too. 1. To establish simulation of reassembling sequence from NGS data. This will build from re-assembling a simple sequence of 1 Mb with no repeats in the haploid state, to inclusion of genetic variation and polyploidy. -simulate a NGS run from a 1 Mb segment of human with little/no repeats. Average fragment size 500 bp with normal distribution. Paired end with 75 bp reads. Assume perfect sequencing. Check out other simulation methods - align the reads back to the 1 Mb sequence. How much variation in coverage - reassemble the reads WITHOUT using the reference sequence. http://maq.sourceforge.net/maq-manpage.shtml#7 I think we can use above link but not be able to work out . can you check it and send me solution Regards Yogesh Kumar University of Otago Dunedin, NZ 0064226149242 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.8 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

Ok you already found maq. Do you have access to a (preferably 64 bit) Linux or Unix computer? ________________________________________ From: bioconductor-bounces@stat.math.ethz.ch [bioconductor- bounces@stat.math.ethz.ch] On Behalf Of michael watson (IAH-C) [michael.watson@bbsrc.ac.uk] Sent: 24 February 2010 07:57 To: Yogesh Kumar; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Query I believe maq has the facility to simulate NGS reads http://maq.sourceforge.net/ ________________________________________ From: bioconductor-bounces@stat.math.ethz.ch [bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Yogesh Kumar [yogesh.srmc@gmail.com] Sent: 24 February 2010 05:43 To: bioconductor at stat.math.ethz.ch Subject: [BioC] Query Respected Sir/Madam, Good morning, I just started my PhD in Next generation sequencing datasets.. I have one query , can you help me to handle this query. we have sequence of 1MB region from Human chr 22 (haploid). we want to cut the length of max 450-500bp ( given use Normal distribution, Mean 500 and SD=?). *I want to know how can we generate reads from our sequence* ( paired end of 75bp at both ends, Assuming perfect reads). we want to do simulation of NGS run and reassemble it again without using reference sequence. Can we use R for simulation work also? if yes please provide me which package we can use for it and possible send me R script too. 1. To establish simulation of reassembling sequence from NGS data. This will build from re-assembling a simple sequence of 1 Mb with no repeats in the haploid state, to inclusion of genetic variation and polyploidy. -simulate a NGS run from a 1 Mb segment of human with little/no repeats. Average fragment size 500 bp with normal distribution. Paired end with 75 bp reads. Assume perfect sequencing. Check out other simulation methods - align the reads back to the 1 Mb sequence. How much variation in coverage - reassemble the reads WITHOUT using the reference sequence. http://maq.sourceforge.net/maq-manpage.shtml#7 I think we can use above link but not be able to work out . can you check it and send me solution Regards Yogesh Kumar University of Otago Dunedin, NZ 0064226149242 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 15.8 years ago michael watson IAH-C ★ 3.4k

Login before adding your answer.