reverse complement or no reverse complemnt on biomaRt / biomart.org
4
0
Entering edit mode
Tefina Paloma ▴ 220
@tefina-paloma-3676
Last seen 9.6 years ago
Dear list, having a look at the vegfc gene (located on the reverse strand) on the website of biomart and querying the 5utr and the flanking sequence yields the following: http://www.ensembl.org/Homo_sapiens/Transcript/Export?db=core;g=ENSG00 000150630;output=fasta;r=4:177604691-177713895;strand=feature;t=ENST00 000280193;param=utr5;genomic=5_flanking;_format=HTML Doing the same in R, yields essentially the same with the only difference that in the case of the flanking sequence the reverse complement is given: library(biomaRt) library(Biostrings) ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") vegfc_fs = getSequence(id = c("ENST00000280193"), type = "ensembl_transcript_id", seqType = "transcript_flank", upstream = 3000, mart = ensembl) vegfc_utr = getSequence(id = c("ENST00000280193"), type = "ensembl_transcript_id", seqType = "5utr", mart = ensembl) As the gene is located on the reverse strand, one would probably be interested in the reverse complement of the sequence returned by ensemble/biomart. Although it's nice that the flanking sequence is already reverse complemented in R, it should be somehow documented. And the question arises, why does biomaRt only return the reverse complement of the flanking sequence but not of the utr? I would appreciate any hints! Thanks a lot in advance, Best, Tefina > sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.12.8 IRanges_1.2.3 biomaRt_2.0.0 loaded via a namespace (and not attached): [1] Biobase_2.4.1 RCurl_0.98-1 XML_2.5-3 [[alternative HTML version deleted]]
biomaRt biomaRt • 1.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States
Hi Tefina, Tefina Paloma wrote: > Dear list, > > having a look at the vegfc gene (located on the reverse strand) on the > website of biomart and querying the 5utr and the flanking sequence yields > the following: > > http://www.ensembl.org/Homo_sapiens/Transcript/Export?db=core;g=ENSG 00000150630;output=fasta;r=4:177604691-177713895;strand=feature;t=ENST 00000280193;param=utr5;genomic=5_flanking;_format=HTML > > Doing the same in R, yields essentially the same with the only difference > that in the case of the flanking sequence the reverse complement is given: > > library(biomaRt) > library(Biostrings) > > ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") > > vegfc_fs = getSequence(id = c("ENST00000280193"), type = > "ensembl_transcript_id", > seqType = "transcript_flank", upstream = 3000, > mart = ensembl) > > vegfc_utr = getSequence(id = c("ENST00000280193"), type = > "ensembl_transcript_id", > seqType = "5utr", mart = ensembl) > > > As the gene is located on the reverse strand, one would probably be > interested in the reverse complement of the sequence returned by > ensemble/biomart. > > Although it's nice that the flanking sequence is already reverse > complemented in R, it should be somehow documented. The flanking sequence isn't reverse complemented in R, it is reported exactly as it is received from the Biomart server. I am a bit confused here as well; AFAICT, the sequence for the 5' flank and UTR are identical from all sources (Ensembl, Biomart and biomaRt). 5' flank: Ensembl ccgccgccagcgcccccgccgcagcgcccgcggcccggctcctctcactt Biomart CCGCCGCCAGCGCCCCCGCCGCAGCGCCCGCGGCCCGGCTCCTCTCACTT biomaRt CCGCCGCCAGCGCCCCCGCCGCAGCGCCCGCGGCCCGGCTCCTCTCACTT 5'UTR Ensembl CACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGGTCCTTCCACC Biomart CACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGGTCCTTCCACC biomaRt CACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGGTCCTTCCACC Best, Jim > > And the question arises, why does biomaRt only return the reverse complement > of the flanking sequence but not of the utr? > > I would appreciate any hints! > Thanks a lot in advance, > Best, > Tefina > >> sessionInfo() > R version 2.9.1 (2009-06-26) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > Kingdom.1252;LC_MONETARY=English_United > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Biostrings_2.12.8 IRanges_1.2.3 biomaRt_2.0.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.4.1 RCurl_0.98-1 XML_2.5-3 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT
0
Entering edit mode
Tefina Paloma ▴ 220
@tefina-paloma-3676
Last seen 9.6 years ago
James W. MacDonald <jmacdon at="" ...=""> writes: Hi, if I just export the sequence by using the "export data" button on the left hand side of the biomart page. I do not do this ususally, but I just wanted to compare the different possibilities of exporting the sequence to make sure that I always get the same sequence. Best, Tefina
ADD COMMENT
0
Entering edit mode
Tefina Paloma wrote: > James W. MacDonald <jmacdon at="" ...=""> writes: > > > Hi, > > if I just export the sequence by using the "export data" button on the left hand > side of the biomart page. That isn't a very precise description of what you did. If I go to biomart.org, then choose GRCh37 as the dataset, then filter on HGNC symbol using VEGFC, then go to attributes, click the Sequences radio box and then choose Flank (Transcript), and check the Upstream flank box and enter 50 in the box to the right, I get the following sequence; >ENSG00000150630|ENST00000280193 CCGCCGCCAGCGCCCCCGCCGCAGCGCCCGCGGCCCGGCTCCTCTCACTT Which is identical to what I get from biomaRt, as well as Ensembl. In addition, I don't see an 'export data' button on the left hand side of the biomart page. Is it possible that you are using something other than biomart.org? > I do not do this ususally, but I just wanted to compare the different > possibilities of exporting the sequence to make sure that I always get the same > sequence. > > Best, > Tefina > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD REPLY
0
Entering edit mode
Tefina Paloma ▴ 220
@tefina-paloma-3676
Last seen 9.6 years ago
Thanks a lot for your help, whit blat everything is really clear to me now. But indeed, a rather bad bug in the export wizard. Best, Tefina
ADD COMMENT
0
Entering edit mode
Tefina Paloma ▴ 220
@tefina-paloma-3676
Last seen 9.6 years ago
Hi, So, regardless if the gene is on the forward or on the reverse strand, the "start coordinates" for any kind of sequence are always lower than the "end coordinates". e.g. for Gene ENSG00000150630, Transcript ENST00000280193 and Exon ENSE00001153189 Start: 177,713,319 End: 177,713,895 So, if the gene is on the reverse strand, the sequence actually goes from the "end coordinates" to the "start coordinates". Am I right? Thanks a lot for all your help, Tefina
ADD COMMENT
0
Entering edit mode
Tefina Paloma wrote: > Hi, > > > So, regardless if the gene is on the forward or on the reverse strand, the > "start coordinates" for any kind of sequence are always lower than the "end > coordinates". > e.g. for Gene ENSG00000150630, Transcript ENST00000280193 and Exon > ENSE00001153189 > Start: 177,713,319 > End: 177,713,895 > > So, if the gene is on the reverse strand, the sequence actually goes from the > "end coordinates" to the "start coordinates". > > Am I right? Yes. For instance, the six exons for VEGFC have the following coordinates: EXON CHRO STRAND START END 1 4 - 177713319 177713895 2 4 - 177650687 177650900 3 4 - 177648932 177649122 4 4 - 177632653 177632804 5 4 - 177608975 177609081 6 4 - 177608341 177608674 Since we are on the reverse strand, the coordinates for each exon are to the left of the preceding exon (assuming conventional orientation of the chromosome, a la UCSC genome browser). Since the gene transcription moves from right to left for this gene, by definition the end coordinate of an exon is really the start of the sequence for that exon, and the start coordinate is really the end of the sequence for that exon. In other words, base coordinates are always counted on the forward strand, regardless of the strand the gene is on. Best, Jim > > Thanks a lot for all your help, > > Tefina > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD REPLY

Login before adding your answer.

Traffic: 604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6