reverse complement or no reverse complemnt on biomaRt / biomart.org
3
0
Entering edit mode
Tefina Paloma ▴ 220
@tefina-paloma-3676
Last seen 10.2 years ago
James W. MacDonald <jmacdon at="" ...=""> writes: > > The flanking sequence isn't reverse complemented in R, it is reported > exactly as it is received from the Biomart server. > > I am a bit confused here as well; AFAICT, the sequence for the 5' flank > and UTR are identical from all sources (Ensembl, Biomart and biomaRt). > > 5' flank: > Ensembl > > ccgccgccagcgcccccgccgcagcgcccgcggcccggctcctctcactt > > Biomart > > CCGCCGCCAGCGCCCCCGCCGCAGCGCCCGCGGCCCGGCTCCTCTCACTT > > biomaRt > > CCGCCGCCAGCGCCCCCGCCGCAGCGCCCGCGGCCCGGCTCCTCTCACTT > > 5'UTR > > Ensembl > > CACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGGTCCTTCCACC > > Biomart > > CACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGGTCCTTCCACC > > biomaRt > > CACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGGTCCTTCCACC > > Best, > > Jim Dear Jim, Do you know if these sequences are sense or antisense? If you export the sequence via biomart (via the webpage), you get the following: >ENST00000280193 utr5:KNOWN_protein_coding CGGGGAAGGGGAGGGAGGAGGGGGACGAGGGCTCTGGCGGGTTTGGAGGGGCTGAACATC GCGGGGTGTTCTGGTGTCCCCCGCCCCGCCTCTCCAAAAAGCTACACCGACGCGGACCGC GGCGGCGTCCTCCCTCGCCCTCGCTTCACCTCGCGGGCTCCGAATGCGGGGAGCTCGGAT GTCCGGTTTCCTGTGAGGCTTTTACCTGACACCCGCCGCCTTTCCCCGGCACTGGCTGGG AGGGCGCCCTGCAAAGTTGGGAACGCGGAGCCCCGGACCCGCTCCCGCCGCCTCCGGCTC GCCCAGGGGGGGTCGCCGGGAGGAGCCCGGGGGAGAGGGACCAGGAGGGGCCCGCGGCCT CGCAGGGGCGCCCGCGCCCCCACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGG TCCTTCCACC >5' Flanking sequence chromosome:GRCh37:4:177713896:177713945:1 AAGTGAGAGGAGCCGGGCCGCGGGCGCTGCGGCGGGGGCGCTGGCGGCGG So, in contrast to the web-view, the flanking sequence is reverse complemented. Basically it is just a problem of correct definition and assignment. So which sequences are sense and which are antisense. Best, Tefina
biomaRt biomaRt • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
Hi Tefina, Tefina Paloma wrote: > Dear Jim, > > Do you know if these sequences are sense or antisense? > If you export the sequence via biomart (via the webpage), you get the following: > >> ENST00000280193 utr5:KNOWN_protein_coding > CGGGGAAGGGGAGGGAGGAGGGGGACGAGGGCTCTGGCGGGTTTGGAGGGGCTGAACATC > GCGGGGTGTTCTGGTGTCCCCCGCCCCGCCTCTCCAAAAAGCTACACCGACGCGGACCGC > GGCGGCGTCCTCCCTCGCCCTCGCTTCACCTCGCGGGCTCCGAATGCGGGGAGCTCGGAT > GTCCGGTTTCCTGTGAGGCTTTTACCTGACACCCGCCGCCTTTCCCCGGCACTGGCTGGG > AGGGCGCCCTGCAAAGTTGGGAACGCGGAGCCCCGGACCCGCTCCCGCCGCCTCCGGCTC > GCCCAGGGGGGGTCGCCGGGAGGAGCCCGGGGGAGAGGGACCAGGAGGGGCCCGCGGCCT > CGCAGGGGCGCCCGCGCCCCCACCCCTGCCCCCGCCAGCGGACCGGTCCCCCACCCCCGG > TCCTTCCACC > >> 5' Flanking sequence chromosome:GRCh37:4:177713896:177713945:1 > AAGTGAGAGGAGCCGGGCCGCGGGCGCTGCGGCGGGGGCGCTGGCGGCGG How are you getting this? I get the same thing as the web service, as I noted yesterday: > getSequence(id = c("ENST00000280193"), type ="ensembl_transcript_id",seqType = "transcript_flank", upstream = 50, mart =mart) transcript_flank ensembl_transcript_id 1 CCGCCGCCAGCGCCCCCGCCGCAGCGCCCGCGGCCCGGCTCCTCTCACTT ENST00000280193 > sessionInfo() R version 2.10.0 Under development (unstable) (2009-09-21 r49780) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] biomaRt_2.1.0 annaffy_1.17.2 affycoretools_1.17.4 [4] KEGG.db_2.3.0 GO.db_2.3.0 RSQLite_0.7-2 [7] DBI_0.2-4 AnnotationDbi_1.7.17 affy_1.23.9 [10] Biobase_2.5.6 loaded via a namespace (and not attached): [1] affyio_1.13.5 annotate_1.23.2 Biostrings_2.13.50 [4] Category_2.11.4 gcrma_2.17.2 genefilter_1.25.7 [7] GOstats_2.11.3 graph_1.23.6 GSEABase_1.7.3 [10] IRanges_1.3.89 limma_2.19.4 preprocessCore_1.7.9 [13] RBGL_1.21.12 RCurl_0.98-1 splines_2.10.0 [16] survival_2.35-7 tools_2.10.0 XML_2.5-1 [19] xtable_1.5-5 Best, Jim > > So, in contrast to the web-view, the flanking sequence is reverse complemented. > Basically it is just a problem of correct definition and assignment. > So which sequences are sense and which are antisense. > > Best, > Tefina > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT
0
Entering edit mode
Tefina Paloma ▴ 220
@tefina-paloma-3676
Last seen 10.2 years ago
I apologize for giving such a vague description. If I go to the ensemble page, click on "Mine Ensembl with BioMart", chose as database ensembl 56, Homo Sapiens, GRCh37, filter by ENST00000280193, click on results, and then click on the ENST00000280193 which leads to the transcript summary of this ensembl. Then, on the left hand side, I click on "export data" and I chose "fasta format", "feature strand", "flanking sequence" and "5utr". By doing this, I get the sequences I posted. Best, Tefina
ADD COMMENT
0
Entering edit mode
Tefina Paloma wrote: > > I apologize for giving such a vague description. > > If I go to the ensemble page, click on "Mine Ensembl with BioMart", chose as > database ensembl 56, Homo Sapiens, GRCh37, filter by ENST00000280193, click on > results, and then click on the ENST00000280193 which leads to the transcript > summary of this ensembl. Then, on the left hand side, I click on "export data" > and I chose "fasta format", "feature strand", "flanking sequence" and "5utr". > By doing this, I get the sequences I posted. That looks like a bug in that export wizard. It doesn't matter which strand you choose for the 5' UTR, you get the (correct) reverse strand regardless. However, if you choose either feature strand or reverse strand, you get the forward strand (the reverse complement of the correct 5' flanking region). If you choose forward strand, you still get the forward strand, but it is off the 3' end of the gene. Try pasting this into blat (http://genome.ucsc.edu/cgi-bin/hgBlat) to see what I mean. > biomaRt CCGCCGCCAGCGCCCCCGCCGCAGCGCCCGCGGCCCGGCTCCTCTCACTT > forward GTAGTTTTCGTTCATCATGTAAGATGATAATGGACTGAACTTAGCAGCTT > feature AAGTGAGAGGAGCCGGGCCGCGGGCGCTGCGGCGGGGGCGCTGGCGGCGG So the only correct results here are from biomaRt. Best, Jim > > Best, > Tefina > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD REPLY
0
Entering edit mode
Tefina Paloma ▴ 220
@tefina-paloma-3676
Last seen 10.2 years ago
Thanks a lot for your quick help, I am really thankful, best, Tefina
ADD COMMENT

Login before adding your answer.

Traffic: 538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6