Accessing next gen sequence data remotely via biocondcutor

0

Entering edit mode

Ruppert Valentino ▴ 270

@ruppert-valentino-1376

Last seen 9.6 years ago

Hello, I am trying to access next gen sequencing data remotely via R/bioconductor but I can't seem to send queries to it like using biomaRt. I tried Rsamtools but even with that there is no way to query the sequence file directly. What I am trying to do is to get sequence data for specific regions e.g. chrom5 150100000 to 150101000 from http://www.1000genomes.org/ cases e.g. NA19240, however there doesn't seem to be any tool to this easily. In the Rsamtools they mention that initially they downloaded this using samtools view bamfile Does anyone know of a way to access next gen sequence data remotely without having to download them locally, if so I would appreciate it if they email me the R script for that. Thanks [[alternative HTML version deleted]]

Sequencing Rsamtools Sequencing Rsamtools • 834 views

ADD COMMENT • link updated 13.4 years ago by Martin Morgan 25k • written 13.4 years ago by Ruppert Valentino ▴ 270

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 4 days ago

United States

On 12/17/2010 07:23 AM, Ruppert Valentino wrote: > > Hello, > > I am trying to access next gen sequencing data remotely via > R/bioconductor but I can't seem to send queries to it like using > biomaRt. I tried Rsamtools but even with that there is no way to > query the sequence file directly. > > What I am trying to do is to get sequence data for specific regions > e.g. chrom5 150100000 to 150101000 from http://www.1000genomes.org/ > cases e.g. NA19240, however there doesn't seem to be any tool to this > easily. > > In the Rsamtools they mention that initially they downloaded this > using samtools view bamfile > > Does anyone know of a way to access next gen sequence data remotely > without having to download them locally, if so I would appreciate it > if they email me the R script for that. Pointing to the bam url as the 'file' argument to scanBam will first download the index and then perform the query. Better to download the index ('.bai') file then scanBam(remoteUrl, localIndex). It also makes sense to do the arithmetic about volume of data to be downloaded -- if you're going to download most of the data anyway, then far better to use the 'aspera' plugin provided by 1000genomes to pull the bam files, quickly, down, and do local access. The basic work flow is sketched in the Rsamtools vignette; look for na19240url. Martin > > Thanks [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD COMMENT • link 13.4 years ago Martin Morgan 25k

Login before adding your answer.