Get genomic sequences
3
0
Entering edit mode
@johannes-waage-3852
Last seen 10.3 years ago
Hi all, Is there a way to fetch genomic sequences via Bioconductor directly? (Using galaxy, but I would like to automate) I tried rtracklayer and biomaRt - rtracklayer doesn't seem to have an interface for fetching sequences, and biomaRt only seems to fetch sequences from a subset of gene ID's, while I just to need to fetch sequence from a genomic range. fetchSequence(chr, strand, start, end) -> sequence Any suggestions? Thank you in advance!! Best regards, JW, Uni. of Copenhagen [[alternative HTML version deleted]]
biomaRt rtracklayer biomaRt rtracklayer • 1.8k views
ADD COMMENT
0
Entering edit mode
@cei-abreu-goodger-4433
Last seen 9.8 years ago
Mexico
Hi Johannes, I thought 'getSequence' from the biomaRt package should allow you to do what you want. But, I can't seem to get it to work for genomic coordinates either: library(biomaRt) ens <- useMart("ensembl","hsapiens_gene_ensembl") getSequence(chromosome=1,start=10000,end=11000, seqType="genomic", mart=ens) Error in getSequence(chromosome = 1, start = 10000, end = 11000, seqType = "genomic", : Please specify the type of sequence that needs to be retrieved when using biomaRt in web service mode. Choose either gene_exon, transcript_exon,transcript_exon_intron, gene_exon_intron, cdna, coding,coding_transcript_flank,coding_gene_flank,transcript_flank,gene _flank,peptide, 3utr or 5utr > sessionInfo() R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] tools stats graphics grDevices datasets utils methods [8] base other attached packages: [1] biomaRt_2.2.0 Biobase_2.6.0 loaded via a namespace (and not attached): [1] RCurl_1.2-1 XML_2.6-0 Cheers, Cei Johannes Waage wrote: > Hi all, > > Is there a way to fetch genomic sequences via Bioconductor directly? (Using > galaxy, but I would like to automate) > > I tried rtracklayer and biomaRt - rtracklayer doesn't seem to have an > interface for fetching sequences, and biomaRt only seems to fetch sequences > from a subset of gene ID's, while I just to need to fetch sequence from a > genomic range. > > fetchSequence(chr, strand, start, end) -> sequence > > Any suggestions? > > Thank you in advance!! > > Best regards, > JW, > Uni. of Copenhagen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@johannes-waage-3852
Last seen 10.3 years ago
Ah, simple but perfect. Thanks! /JW On Mon, Jan 11, 2010 at 11:15 AM, Paul Leo <p.leo@uq.edu.au> wrote: > If you have a bed file then all you need are the BSgemome.* packages to > get the sequences .... > > library("BSgenome.Mmusculus.UCSC.mm9") > all.genomic<-getSeq(Mmusculus, the.chrom, starts, ends) > > where > the.chrom[1:5] > [1] "chr1" "chr1" "chr1" "chr1" "chr1"' > starts[1:5] > 3187526 3487463 3777276 4144186 4274111 > > ends[1:5] > [1] 3187790 3487763 3777555 4144499 4274416 > > > etc etc... > myb<-"YAACKG" > length(all.genomic) > system.time(x<- XStringViews(all.genomic, "DNAString")) > x.labels<-paste(the.chrom,starts,ends,sep=":") > names(x)<-x.labels > ###################### forward counts ################# > all.matches<-matchPattern(myb.dna,x,max.mismatch=0, fixed=FALSE) # needs > a stringView to vectorize > the.cov<-coverage(all.matches) > > counts<-aggregate(the.cov,start=start(x),end=end(x),FUN=sum)/length( myb.dna) > ###################################################### > > -----Original Message----- > From: Johannes Waage <johannes.waage@bric.dk> > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] Get genomic sequences > Date: Mon, 11 Jan 2010 10:56:21 +0100 > > Hi all, > > Is there a way to fetch genomic sequences via Bioconductor directly? (Using > galaxy, but I would like to automate) > > I tried rtracklayer and biomaRt - rtracklayer doesn't seem to have an > interface for fetching sequences, and biomaRt only seems to fetch sequences > from a subset of gene ID's, while I just to need to fetch sequence from a > genomic range. > > fetchSequence(chr, strand, start, end) -> sequence > > Any suggestions? > > Thank you in advance!! > > Best regards, > JW, > Uni. of Copenhagen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Paul Leo ▴ 970
@paul-leo-2092
Last seen 10.3 years ago
If you have a bed file then all you need are the BSgenome.* packages to get the sequences .... library("BSgenome.Mmusculus.UCSC.mm9") all.genomic<-getSeq(Mmusculus, the.chrom, starts, ends) where the.chrom[1:5] [1] "chr1" "chr1" "chr1" "chr1" "chr1"' starts[1:5] 3187526 3487463 3777276 4144186 4274111 > ends[1:5] [1] 3187790 3487763 3777555 4144499 4274416 etc etc... myb<-"YAACKG" length(all.genomic) system.time(x<- XStringViews(all.genomic, "DNAString")) x.labels<-paste(the.chrom,starts,ends,sep=":") names(x)<-x.labels ###################### forward counts ################# all.matches<-matchPattern(myb.dna,x,max.mismatch=0, fixed=FALSE) # needs a stringView to vectorize the.cov<-coverage(all.matches) counts<-aggregate(the.cov,start=start(x),end=end(x),FUN=sum)/length(my b.dna) ###################################################### -----Original Message----- From: Johannes Waage <johannes.waage@bric.dk> To: bioconductor at stat.math.ethz.ch Subject: [BioC] Get genomic sequences Date: Mon, 11 Jan 2010 10:56:21 +0100 Hi all, Is there a way to fetch genomic sequences via Bioconductor directly? (Using galaxy, but I would like to automate) I tried rtracklayer and biomaRt - rtracklayer doesn't seem to have an interface for fetching sequences, and biomaRt only seems to fetch sequences from a subset of gene ID's, while I just to need to fetch sequence from a genomic range. fetchSequence(chr, strand, start, end) -> sequence Any suggestions? Thank you in advance!! Best regards, JW, Uni. of Copenhagen [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6