Entering edit mode
Mao Jianfeng
▴
290
@mao-jianfeng-3598
Last seen 10.2 years ago
Dear listers, Sean and Steve,
I have posted a similar question in this list. But, I am still
confused. So I try to describe my question more detail, in order to
let it more clear for you. PLEASE read all the 6 sections followed.
Thanks a lot. My question is not a student's homework. And, I have
only one way to get helps on R and bioconductor. I learned all of them
by myself, in a somewhat isolated environment. So, your any helps are
very very valuable for me.
Jian-Feng,
(1) the genomic variants data I need to be annotated:
# SNPs,chromosome,start,end
SNP_1,1,43,43
SNP_2,2,56,56
(2) I want to get (annotation), there maybe multiples term for a
specific annotation column, they need be combined in one cell. Or they
need be in different rows of the same column. Whatever they are, the
genomic positions should go along with their specific annotations.
# SNPs,chromosome,start,end,annotation_term
SNP_1,1,43,43,go_1:go_3
SNP_2,2,56,56,go_100:go_1000
or
# SNPs,chromosome,start,end,go_term
SNP_1,1,43,43,go_1
SNP_1,1,43,43,go_3
SNP_2,2,56,56,go_100
SNP_2,2,56,56,go_1000
(3) It was said that biomaRt package have such functionalities,
(4) what I have got using the biomaRt package,
library(biomaRt)
listMarts()
plant = useMart("plant_mart_7")
alyr=useDataset("alyrata_eg_gene", mart=plant)
atha = useDataset ("athaliana_eg_gene",mart=plant)
listAttributes(alyr)
listFilters(alyr)
chr<-c(rep(1, 10))
start<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000,
100000)
end<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000, 100000)
getBM(attributes =
c("chromosome_name","start_position","ensembl_gene_id",
"go_biological_process_linkage_type"), filters = c("chromosome_name",
"start", "end"), values = list(chr, start, end), mart=alyr, uniqueRows
= TRUE)
(5) what I got
chromosome_name start_position end_position
ensembl_gene_id
1 1 48875 49123
Al_scaffold_0001_16
2 1 72255 72617
Al_scaffold_0001_21
3 1 10652 11944
Al_scaffold_0001_4
4 1 82573 83367
fgenesh1_pg.C_scaffold_1000018
5 1 87206 90301
fgenesh1_pg.C_scaffold_1000020
6 1 29681 31614
fgenesh1_pm.C_scaffold_1000009
7 1 51526 52636
fgenesh1_pm.C_scaffold_1000016
8 1 78367 80505
fgenesh1_pm.C_scaffold_1000020
9 1 35461 39593
fgenesh2_kg.1__12__AT1G02120.1
10 1 39949 42531
fgenesh2_kg.1__13__AT1G02110.1
11 1 46396 48761
fgenesh2_kg.1__19__AT1G02090.1
12 1 55814 56468
fgenesh2_kg.1__20__AT1G02070.1
13 1 74785 76652
fgenesh2_kg.1__23__AT1G02065.1
14 1 80941 82330
fgenesh2_kg.1__25__AT1G02050.1
15 1 80941 82330
fgenesh2_kg.1__25__AT1G02050.1
16 1 90714 113497
fgenesh2_kg.1__28__AT1G02010.1
17 1 90714 113497
fgenesh2_kg.1__28__AT1G02010.1
18 1 3311 6198
fgenesh2_kg.1__2__AT1G02190.2
19 1 3311 6198
fgenesh2_kg.1__2__AT1G02190.2
20 1 9512 10567
fgenesh2_kg.1__3__AT1G02180.1
21 1 12552 13416
fgenesh2_kg.1__5__AT1G02160.2
22 1 47 2523
scaffold_100001.1
23 1 47 2523
scaffold_100001.1
24 1 7429 7630
scaffold_100003.1
25 1 13702 15386
scaffold_100007.1
26 1 15665 19464
scaffold_100008.1
27 1 19692 20609
scaffold_100009.1
28 1 24515 27497
scaffold_100010.1
29 1 33055 34772
scaffold_100013.1
30 1 33055 34772
scaffold_100013.1
31 1 33055 34772
scaffold_100013.1
32 1 33055 34772
scaffold_100013.1
33 1 33055 34772
scaffold_100013.1
34 1 33055 34772
scaffold_100013.1
35 1 43130 46178
scaffold_100016.1
36 1 49553 51020
scaffold_100018.1
37 1 49553 51020
scaffold_100018.1
38 1 57579 57871
scaffold_100022.1
39 1 58865 72177
scaffold_100023.1
go_biological_process_linkage_type
1
2
3 IEA
4
5
6
7
8
9
10
11
12
13
14 IEA
15 IEA
16 IEA
17 IEA
18 IEA
19 IEA
20
21
22 IEA
23 IEA
24
25
26 IEA
27
28
29 IEA
30 IEA
31 IEA
32 IEA
33 IEA
34 IEA
35
36 IEA
37 IEA
38
39
(6) my problem is I can not link the genomic positions I queried and
their specific annotation.
--
Jian-Feng, Mao
the Institute of Botany,
Chinese Academy of Botany,