"start" filter in biomaRt
2
0
Entering edit mode
Shi, Tao ▴ 720
@shi-tao-199
Last seen 9.4 years ago
Hi list, In the following code, I'm using 'biomaRt' to retrieve all the genes that start beyond 1000000 on chromosome 17. However, I'm not expecting the first gene which starts at 853510 to be in the resulting list. It seems the "start" filter is not just simple ">=". More explanations, please! Thanks! ...Tao > library(biomaRt) Loading required package: RCurl > mart = useMart("ensembl", dataset="hsapiens_gene_ensembl") Checking attributes and filters ... ok > tmp2 <- getBM(attributes=c("ensembl_gene_id","start_position","end_p osition"),filters = c("chromosome_name", "start"), values=list("17", "1000000"), mart = mart) > tmp2[order(tmp2$start_position),][1:10,] ensembl_gene_id start_position end_position 1135 ENSG00000159842 853510 1029881 1442 ENSG00000205899 1120603 1121504 690 ENSG00000184811 1129707 1151031 1134 ENSG00000209456 1156460 1156529 452 ENSG00000108953 1194595 1250267 451 ENSG00000167193 1272190 1306294 1133 ENSG00000197879 1314232 1342633 450 ENSG00000132376 1344622 1366719 689 ENSG00000174238 1368037 1412835 449 ENSG00000167703 1424448 1478880 > sessionInfo() R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_1.14.0 RCurl_0.9-3 loaded via a namespace (and not attached): [1] XML_1.95-2 >
• 1.1k views
ADD COMMENT
0
Entering edit mode
@steffenstatberkeleyedu-2907
Last seen 10.2 years ago
Hi Tao, If you add the strand attribute (see below), you'll notice that the first gene lays on the reverse strand. In ensembl everything is given on the forward strand. The start position of the first gene is in fact 1029881 as it lays on the reverse strand and this is why it was returned. mart = useMart("ensembl", dataset="hsapiens_gene_ensembl") g = getBM(c("ensembl_gene_id","start_position","end_position", "strand"), filters = c("chromosome_name","start"), values=list(17,1000000), mart = mart) ord=order(g[,2]) g=g[ord,] g[1:10,] ensembl_gene_id start_position end_position strand 1135 ENSG00000159842 853510 1029881 -1 1442 ENSG00000205899 1120603 1121504 1 690 ENSG00000184811 1129707 1151031 1 1134 ENSG00000209456 1156460 1156529 -1 452 ENSG00000108953 1194595 1250267 -1 451 ENSG00000167193 1272190 1306294 -1 1133 ENSG00000197879 1314232 1342633 -1 450 ENSG00000132376 1344622 1366719 -1 689 ENSG00000174238 1368037 1412835 -1 449 ENSG00000167703 1424448 1478880 -1 cheers, Steffen > Hi list, > > In the following code, I'm using 'biomaRt' to retrieve all the genes that > start beyond 1000000 on chromosome 17. However, I'm not expecting the > first gene which starts at 853510 to be in the resulting list. It seems > the "start" filter is not just simple ">=". More explanations, please! > > Thanks! > > ...Tao > > > >> library(biomaRt) > Loading required package: RCurl >> mart = useMart("ensembl", dataset="hsapiens_gene_ensembl") > Checking attributes and filters ... ok >> tmp2 <- >> getBM(attributes=c("ensembl_gene_id","start_position","end_position "),filters >> = c("chromosome_name", "start"), values=list("17", "1000000"), mart = >> mart) >> tmp2[order(tmp2$start_position),][1:10,] > ensembl_gene_id start_position end_position > 1135 ENSG00000159842 853510 1029881 > 1442 ENSG00000205899 1120603 1121504 > 690 ENSG00000184811 1129707 1151031 > 1134 ENSG00000209456 1156460 1156529 > 452 ENSG00000108953 1194595 1250267 > 451 ENSG00000167193 1272190 1306294 > 1133 ENSG00000197879 1314232 1342633 > 450 ENSG00000132376 1344622 1366719 > 689 ENSG00000174238 1368037 1412835 > 449 ENSG00000167703 1424448 1478880 >> sessionInfo() > R version 2.7.0 (2008-04-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_1.14.0 RCurl_0.9-3 > > loaded via a namespace (and not attached): > [1] XML_1.95-2 >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Shi, Tao ▴ 720
@shi-tao-199
Last seen 9.4 years ago
Hi Steffen, Thanks! ...Tao ----- Original Message ---- From: "steffen@stat.Berkeley.EDU" <steffen@stat.berkeley.edu> To: "Shi, Tao" <shidaxia at="" yahoo.com=""> Cc: bioconductor at stat.math.ethz.ch Sent: Monday, August 25, 2008 12:33:29 PM Subject: Re: [BioC] "start" filter in biomaRt Hi Tao, If you add the strand attribute (see below), you'll notice that the first gene lays on the reverse strand. In ensembl everything is given on the forward strand. The start position of the first gene is in fact 1029881 as it lays on the reverse strand and this is why it was returned. mart = useMart("ensembl", dataset="hsapiens_gene_ensembl") g = getBM(c("ensembl_gene_id","start_position","end_position", "strand"), filters = c("chromosome_name","start"), values=list(17,1000000), mart = mart) ord=order(g[,2]) g=g[ord,] g[1:10,] ensembl_gene_id start_position end_position strand 1135 ENSG00000159842 853510 1029881 -1 1442 ENSG00000205899 1120603 1121504 1 690 ENSG00000184811 1129707 1151031 1 1134 ENSG00000209456 1156460 1156529 -1 452 ENSG00000108953 1194595 1250267 -1 451 ENSG00000167193 1272190 1306294 -1 1133 ENSG00000197879 1314232 1342633 -1 450 ENSG00000132376 1344622 1366719 -1 689 ENSG00000174238 1368037 1412835 -1 449 ENSG00000167703 1424448 1478880 -1 cheers, Steffen > Hi list, > > In the following code, I'm using 'biomaRt' to retrieve all the genes that > start beyond 1000000 on chromosome 17. However, I'm not expecting the > first gene which starts at 853510 to be in the resulting list. It seems > the "start" filter is not just simple ">=". More explanations, please! > > Thanks! > > ...Tao > > > >> library(biomaRt) > Loading required package: RCurl >> mart = useMart("ensembl", dataset="hsapiens_gene_ensembl") > Checking attributes and filters ... ok >> tmp2 <- >> getBM(attributes=c("ensembl_gene_id","start_position","end_position "),filters >> = c("chromosome_name", "start"), values=list("17", "1000000"), mart = >> mart) >> tmp2[order(tmp2$start_position),][1:10,] > ensembl_gene_id start_position end_position > 1135 ENSG00000159842 853510 1029881 > 1442 ENSG00000205899 1120603 1121504 > 690 ENSG00000184811 1129707 1151031 > 1134 ENSG00000209456 1156460 1156529 > 452 ENSG00000108953 1194595 1250267 > 451 ENSG00000167193 1272190 1306294 > 1133 ENSG00000197879 1314232 1342633 > 450 ENSG00000132376 1344622 1366719 > 689 ENSG00000174238 1368037 1412835 > 449 ENSG00000167703 1424448 1478880 >> sessionInfo() > R version 2.7.0 (2008-04-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_1.14.0 RCurl_0.9-3 > > loaded via a namespace (and not attached): > [1] XML_1.95-2 >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6