Hi Galina,
With biomaRt you can currently only specify either an upstream or
downstream flank in one query. So you'll need at least two queries to
do this. If you do ?getSequence, the help page will tell you that
seqType "gene_exon_intron' gives the exons + introns of a gene. Note
that if you retrieve seqType gene_exon_intron, you are already
retrieving the 5' and 3' UTRs flanking the coding region. If you also
want to include the promotor region in this query you could set
upstream=4000. If you need sequences downstream the transcribed
region, you'll have to do a second query and match up both query
results.
Cheers,
Steffen
----- Original Message -----
From: "Glazko, Galina" <galina_glazko@urmc.rochester.edu>
Date: Thursday, February 7, 2008 12:48 pm
Subject: [BioC] boiomaRt 'getSequence' question
To: bioconductor at stat.math.ethz.ch
> Dear all,
>
>
>
> I have a list of ensemble gene IDs and I need to get corresponding
> sequences together with 5' upstream (4000 bp), 3'downstream (4000
bp)
> and all introns.
>
> I know that I probably can do this using a combination of commands:
>
>
>
> Tmp1<-getSequence(id=
> "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank"
,ups
> tream=4000,mart=human)
>
> Tmp2<-getSequence(id=
> "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank"
,dow
> nstream=4000,mart=human)
>
> Tmp3<- getSequence(id=
> "ENSG00000128714",type="ensembl_gene_id",seqType="cdna", mart=human)
>
>
>
> and then concatenate tmp1, tmp2, tmp3, but I am not sure that 'cdna'
> seqType will give me introns...
>
> Also, I hope that there is a simpler way to get all these sequences
> using just one command with the right 'seqType' specification.
>
>
>
> Could someone please clarify this for me?
>
> Thank you!
>
>
>
> Best regards
>
> Galina
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Galina,
Yes this is possible, however you can only retrieve one sequence at a
time this way and you'll need RMySQL installed.
Here's how you do this:
library(biomaRt)
ensembl = useMart("ensembl", dataset="hsapiens_gene_ensembl",
mysql=TRUE)
getSequence(chromosome = 10, start=200000, end = 200010, mart =
ensembl)
you'll get:
chromosome start end sequence
1 10 2e+05 200010 TGTGTTCCCCT
Cheers,
Steffen
----- Original Message -----
From: "Glazko, Galina" <galina_glazko@urmc.rochester.edu>
Date: Thursday, February 7, 2008 4:02 pm
Subject: RE: [BioC] boiomaRt 'getSequence' question
To: Steffen Durinck <sdurinck at="" lbl.gov="">
> Steffen,
>
> thank you very much!
> But, I also have chromosomal coordinates.
> Is it possible instead of gene ID just indicate the coordinates,
> chromosome number, and then retrieve the entire sequence? Is there
> 'seqType' appropriate for this?
> thank you!
>
> best regrads
> Galina
>
>
> ________________________________
>
> From: Steffen Durinck [mailto:SDurinck at lbl.gov]
> Sent: Thu 2/7/2008 5:57 PM
> To: Glazko, Galina
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] boiomaRt 'getSequence' question
>
>
>
> Hi Galina,
>
> With biomaRt you can currently only specify either an upstream or
> downstream flank in one query. So you'll need at least two queries
> to do this. If you do ?getSequence, the help page will tell you
> that seqType "gene_exon_intron' gives the exons + introns of a
> gene. Note that if you retrieve seqType gene_exon_intron, you are
> already retrieving the 5' and 3' UTRs flanking the coding region.
> If you also want to include the promotor region in this query you
> could set upstream=4000. If you need sequences downstream the
> transcribed region, you'll have to do a second query and match up
> both query results.
>
> Cheers,
> Steffen
>
> ----- Original Message -----
> From: "Glazko, Galina" <galina_glazko at="" urmc.rochester.edu="">
> Date: Thursday, February 7, 2008 12:48 pm
> Subject: [BioC] boiomaRt 'getSequence' question
> To: bioconductor at stat.math.ethz.ch
>
> > Dear all,
> >
> >
> >
> > I have a list of ensemble gene IDs and I need to get corresponding
> > sequences together with 5' upstream (4000 bp), 3'downstream (4000
> bp)> and all introns.
> >
> > I know that I probably can do this using a combination of
commands:
> >
> >
> >
> > Tmp1<-getSequence(id=
> >
> "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank"
,ups> tream=4000,mart=human)
> >
> > Tmp2<-getSequence(id=
> >
> "ENSG00000128714",type="ensembl_gene_id",seqType="coding_gene_flank"
,dow> nstream=4000,mart=human)
> >
> > Tmp3<- getSequence(id=
> > "ENSG00000128714",type="ensembl_gene_id",seqType="cdna",
mart=human)
> >
> >
> >
> > and then concatenate tmp1, tmp2, tmp3, but I am not sure that
'cdna'
> > seqType will give me introns...
> >
> > Also, I hope that there is a simpler way to get all these
sequences
> > using just one command with the right 'seqType' specification.
> >
> >
> >
> > Could someone please clarify this for me?
> >
> > Thank you!
> >
> >
> >
> > Best regards
> >
> > Galina
> >
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>