Is retrieving exon sequences with biomaRt a random process?
1
0
Entering edit mode
@straubhaar-juerg-391
Last seen 9.6 years ago
I am using the following code to retrieve the exon sequences of gene Tcfap2b with GeneID:21419. There are 8 exons for this gene. for (i in sequence(50)) { + x <- getSequence(id=21419,type="entrezgene",seqType="gene_exon",mart =ensembl) + if (is.null(x)) print('NULL result') + if (!is.null(x)) print("Correct result") + } This gives 44 NULL results and 6 correct results. 'correct' means getSequence() outputs the sequences of the exons. > sessionInfo() R version 2.8.1 (2008-12-22) x86_64-pc-linux-gnu locale: C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_1.16.0 loaded via a namespace (and not attached): [1] RCurl_0.94-0 XML_1.99-0 tools_2.8.1 Thank you, Juerg Straubhaar, Umass Med School [[alternative HTML version deleted]]
• 678 views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 17 days ago
EMBL European Molecular Biology Laborat…
Dear J?rg thank you for the feedback! Can you send us a reproducible example - this may better help us figuring what is going on. In the example you posted, what is the object "ensembl" and how did you generate it? I tried the following example, which is as similar to yours as I could think of. I could not reproduce your problem, i.e. I got consistent (i.e. non-random) results, as shown below: library("biomaRt") ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") res = lapply(sequence(50), function(i) getSequence(id=21419,type="entrezgene", seqType="gene_exon",mart=ensembl)) R 2.8.1 ========== > res [[1]] NULL [[2]] NULL [[3]] NULL .... (46 more times NULL) [[50]] NULL > sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_1.16.0 loaded via a namespace (and not attached): [1] RCurl_0.92-0 XML_1.99-0 Today's R + bioC devel ====================== > res [[1]] [1] gene_exon entrezgene <0 rows> (or 0-length row.names) [[2]] [1] gene_exon entrezgene <0 rows> (or 0-length row.names) [[3]] [1] gene_exon entrezgene <0 rows> (or 0-length row.names) .... (46 times the same) [[50]] [1] gene_exon entrezgene <0 rows> (or 0-length row.names) Also, when using a different Entrez Gene ID, I get a non-trivial result, e.g. with > g= getSequence(id=1499,type="entrezgene",seqType="gene_exon",mart=ensembl ) > str(g) 'data.frame': 23 obs. of 2 variables: $ gene_exon : chr "GTGGTGGTTAATAAGGCTGCAGTTATGGTCCATCAGCTTTCTAAAAAGGAAGCTTCCAGACACGCTATC ATGCGTTCTCCTCAGATGGTGTCTGCTATTGTACGTACCATGCAGAATACAAATGATG"| __truncated__ "GCCGGTGGCGGCAGGATACAGCGGCTTCTGCGCGACTTATAAGAGCTCCTTGTGCGGCGCCATTTTAAG CCTCTCGGTCTGTGGCAGCAGCGTTGGCCCGGCCCCGGGAGCGGAGAGCGAGGGGAGG"| __truncated__ "GGTATTTGAAGTATACCATACAACTGTTTTGAAAATCCAGCGTGGACAATGGCTACTCAAG" "CTGCTTTATTCTCCCATTGAAAACATCCAAAGAGTAGCTGCAGGGGTCCTCTGTGAACTTGCTCAGGAC AAGGAAGCTGCAGAAGCTATTGAAGCTGAGGGAGCCACAGCTCCTCTGACAGAGTTAC"| __truncated__ ... $ entrezgene: int 1499 1499 1499 1499 1499 1499 1499 1499 1499 1499 ... > sessionInfo() R version 2.10.0 Under development (unstable) (2009-04-12 r48319) x86_64-unknown-linux-gnu locale: LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAG ES=it_IT;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREME NT=C;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] biomaRt_1.99.9 loaded via a namespace (and not attached): [1] RCurl_0.94-1 XML_2.3-0 tools_2.10.0 Straubhaar, Juerg wrote: > I am using the following code to retrieve the exon sequences of gene Tcfap2b with GeneID:21419. There are 8 exons for this gene. > > > for (i in sequence(50)) { > + x <- getSequence(id=21419,type="entrezgene",seqType="gene_exon",ma rt=ensembl) > + if (is.null(x)) print('NULL result') > + if (!is.null(x)) print("Correct result") > + } > > This gives 44 NULL results and 6 correct results. 'correct' means getSequence() outputs the sequences of the exons. > >> sessionInfo() > R version 2.8.1 (2008-12-22) > x86_64-pc-linux-gnu > > locale: > C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_1.16.0 > > loaded via a namespace (and not attached): > [1] RCurl_0.94-0 XML_1.99-0 tools_2.8.1 > > Thank you, > > Juerg Straubhaar, Umass Med School > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- ---------------------------------------------------- Wolfgang Huber EMBL-EBI http://www.ebi.ac.uk/huber -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: biomaRt-Straubhaar.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20090413="" 8db958c1="" attachment.txt="">
ADD COMMENT

Login before adding your answer.

Traffic: 1050 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6