remove evidence Codes

0

Entering edit mode

Juan Fernández Tajes ▴ 190

@juan-fernandez-tajes-5273

Last seen 9.6 years ago

Dear List, I have a list of gene.symbol, that looks like that >head(mylist) $cluster.1 [1] "HSP90AB1" "INMT" "CKB" "NR2E1" "ME3" "FAM162A" "KIRREL2" $cluster.2 [1] "ENSG00000212860" "TRADD" "C1QBP" "KIAA1967" "ENSG00000137379" "MAP3K3" "TNFRSF1B" "BAG2" [9] "ENSG00000212866" "RIPK3" "EPRS" "HSPA6" "HSPA5" "IKBKG" "TBK1" "TRAF2" [17] "MAP3K7" "NFKB1" "MAP3K14" "HSPA1A" "MAP3K7IP2" "HSPBP1" "NFKB2" "DNAJA1" [25] "TNFRSF1A" "TRAF3IP2" "NFKBIA" "HSPA9" "ENSG00000183311" "TUBB" "TUBA3D" "TANK" [33] "ENSG00000215292" "REL" "MAP3K1" "HSPA1B" "HSPA8" "NFKBIB" "PGAM5" "EEF1A2" [41] "MAP3K8" "CLTC" "RCN2" "MAP3K7IP1" "RARS" "TRAF1" "TUBA3C" "HSPA1L" [49] "MYO1D" "NOD1" "HSP90AA2" "CAD" "RELB" "AIFM1" "TUBB2B" "RIPK2" [57] "CDC37" "IKBKB" "ERLIN1" "RIPK1" "TNIP2" "STUB1" "TUBB4" "HSPA2" [65] "CHUK" "DNAJC3" "CCDC50" "SLC25A5" "NFKBIE" "AK3" "TICAM1" "TIMM50" [73] "ANKRD17" "OTUD7B" "TNFAIP3" "RPS27L" "TRPC4AP" "TUBB6" "DNAJC6" "PXMP2" [81] "FLJ25006" $cluster.3 [1] "ACTB" "PFN1" "XPO6" "VASP" "ZYX" "PFN2" "DIAPH1" "APBB1IP" "DIAPH2" "PARVG" "ENAH" "PCYT1B" "PFN4" "CNN2" "NSMAF" "PFN3" [17] "LMOD1" $cluster.4 [1] "UBB" "HERC3" "KLRK1" "ULBP1" "RAET1E" "MICA" "HCST" "ENSG00000184444" [9] "ENSG00000206449" "ULBP2" "ZNF385A" "ULBP3" "RAET1G" $cluster.5 [1] "YWHAZ" "SLAIN2" "ZC3H13" "C12orf51" "PGLYRP1" "ATL3" $cluster.6 [1] "ACTG1" "EPS8L3" "PARVG" "TMSB4Y" "B3GALT1" "UGT1A6² I want to extract the GO terms for every clsuter (e.g each list component) but excluding some of them based on their evidence codes (such as IEA or NR). The code I¹m using is the following: e2s <- toTable(org.Hs.egSYMBOL) p <- lapply(mylist, function(x) {y <- e2s$gene_id[e2s$symbol %in% x]; return(y)}) entrezIDs <- lapply(p, function(x) {org.Hs.egGO(x)}) list.GO <- lapply(entrezIDs, function(x){toTable(x)}) With this approach I got a list of data.frames (list.GO) where I can exclude the evidence code afterwards, however I would like to know whether any way to exclude the evidence codes before mapping my entrez ids to GO terms. My final aim is to calculate a semantic similarity index inside the clusters Many thanks for the help > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.40.0 GO.db_2.10.1 org.Hs.eg.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 [8] BiocGenerics_0.8.0 pheatmap_0.7.7 RColorBrewer_1.0-5 plyr_1.8 loaded via a namespace (and not attached): [1] grid_3.0.2 IRanges_1.20.5 stats4_3.0.2 tools_3.0.2 XML_3.95-0.2 xtable_1.7-1 --- Juan Fernandez Tajes, phD Grupo Xenomar Área de Genética Facultad de Ciencias A Zapateira Universidad de A Coruña Spain Tlf - +34 981 16700 Email: Jfernandezt@udc.es -- [[alternative HTML version deleted]]

GO GO • 785 views

ADD COMMENT • link updated 10.4 years ago by Vincent J. Carey, Jr. 6.7k • written 10.4 years ago by Juan Fernández Tajes ▴ 190

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 6 weeks ago

United States

you can use the select interface to move from symbols to GO terms directly if you like; see below, input based on your cluster.5 (vector named 'dem'). if you really want to exclude by evidence code earlier, i believe you'll have to use the sqlite facilities underlying select, and there is documentation in the OrganismDbi vignette IIRC > library(Homo.sapiens) > dem [1] "YWHAZ" "SLAIN2" "ZC3H13" "C12orf51" "PGLYRP1" "ATL3" > demsel = select(org.Hs.eg.db, keytype="SYMBOL", keys=dem, columns="GO") Warning message: In .generateExtraRows(tab, keys, jointype) : 'select' resulted in 1:many mapping between keys and return rows > dim(demsel) [1] 63 4 > demsel[c(1:3,61:63),] SYMBOL GO EVIDENCE ONTOLOGY 1 YWHAZ GO:0002553 IEA BP 2 YWHAZ GO:0005515 IPI MF 3 YWHAZ GO:0005654 TAS CC 61 ATL3 GO:0016021 IDA CC 62 ATL3 GO:0042802 IDA MF 63 ATL3 GO:0051260 IDA BP > sessionInfo() R Under development (unstable) (2013-12-01 r64371) Platform: x86_64-apple-darwin10.8.0/x86_64 (64-bit) locale: [1] en_US.US-ASCII/en_US.US-ASCII/en_US.US-ASCII/C/en_US.US-ASCII/en_US .US-ASCII attached base packages: [1] parallel stats graphics grDevices datasets utils tools methods base other attached packages: [1] Homo.sapiens_1.1.2 TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 [3] org.Hs.eg.db_2.10.1 GO.db_2.10.1 [5] RSQLite_0.11.4 DBI_0.2-7 [7] OrganismDbi_1.5.2 GenomicFeatures_1.15.4 [9] GenomicRanges_1.15.11 XVector_0.3.3 [11] IRanges_1.21.13 AnnotationDbi_1.25.9 [13] Biobase_2.23.3 BiocGenerics_0.9.1 [15] BiocInstaller_1.13.3 weaver_1.29.1 [17] codetools_0.2-8 digest_0.6.4 loaded via a namespace (and not attached): [1] biomaRt_2.19.1 Biostrings_2.31.3 bitops_1.0-6 BSgenome_1.31.7 [5] GenomicAlignments_0.99.5 graph_1.41.1 RBGL_1.39.1 RCurl_1.95-4.1 [9] Rsamtools_1.15.14 rtracklayer_1.23.3 stats4_3.1.0 XML_3.98-1.1 [13] zlibbioc_1.9.0 On Sat, Dec 7, 2013 at 7:17 AM, jfertaj <jfernandezt@udc.es> wrote: > Dear List, > > I have a list of gene.symbol, that looks like that > > >head(mylist) > $cluster.1 > [1] "HSP90AB1" "INMT" "CKB" "NR2E1" "ME3" "FAM162A" > "KIRREL2" > > $cluster.2 > [1] "ENSG00000212860" "TRADD" "C1QBP" "KIAA1967" > "ENSG00000137379" "MAP3K3" "TNFRSF1B" "BAG2" > [9] "ENSG00000212866" "RIPK3" "EPRS" "HSPA6" > "HSPA5" "IKBKG" "TBK1" "TRAF2" > [17] "MAP3K7" "NFKB1" "MAP3K14" "HSPA1A" > "MAP3K7IP2" "HSPBP1" "NFKB2" "DNAJA1" > [25] "TNFRSF1A" "TRAF3IP2" "NFKBIA" "HSPA9" > "ENSG00000183311" "TUBB" "TUBA3D" "TANK" > [33] "ENSG00000215292" "REL" "MAP3K1" "HSPA1B" > "HSPA8" "NFKBIB" "PGAM5" "EEF1A2" > [41] "MAP3K8" "CLTC" "RCN2" "MAP3K7IP1" > "RARS" "TRAF1" "TUBA3C" "HSPA1L" > [49] "MYO1D" "NOD1" "HSP90AA2" "CAD" > "RELB" "AIFM1" "TUBB2B" "RIPK2" > [57] "CDC37" "IKBKB" "ERLIN1" "RIPK1" > "TNIP2" "STUB1" "TUBB4" "HSPA2" > [65] "CHUK" "DNAJC3" "CCDC50" "SLC25A5" > "NFKBIE" "AK3" "TICAM1" "TIMM50" > [73] "ANKRD17" "OTUD7B" "TNFAIP3" "RPS27L" > "TRPC4AP" "TUBB6" "DNAJC6" "PXMP2" > [81] "FLJ25006" > > $cluster.3 > [1] "ACTB" "PFN1" "XPO6" "VASP" "ZYX" "PFN2" "DIAPH1" > "APBB1IP" "DIAPH2" "PARVG" "ENAH" "PCYT1B" "PFN4" "CNN2" > "NSMAF" "PFN3" > [17] "LMOD1" > > $cluster.4 > [1] "UBB" "HERC3" "KLRK1" "ULBP1" > "RAET1E" "MICA" "HCST" "ENSG00000184444" > [9] "ENSG00000206449" "ULBP2" "ZNF385A" "ULBP3" > "RAET1G" > > $cluster.5 > [1] "YWHAZ" "SLAIN2" "ZC3H13" "C12orf51" "PGLYRP1" "ATL3" > > $cluster.6 > [1] "ACTG1" "EPS8L3" "PARVG" "TMSB4Y" "B3GALT1" "UGT1A6² > > > I want to extract the GO terms for every clsuter (e.g each list component) > but excluding some of them based on their evidence codes (such as IEA or > NR). The code I¹m using is the following: > > e2s <- toTable(org.Hs.egSYMBOL) > p <- lapply(mylist, function(x) {y <- e2s$gene_id[e2s$symbol %in% x]; > return(y)}) > > > > entrezIDs <- lapply(p, function(x) {org.Hs.egGO(x)}) > > list.GO <- lapply(entrezIDs, function(x){toTable(x)}) > > > With this approach I got a list of data.frames (list.GO) where I can > exclude > the evidence code afterwards, however I would like to know whether any way > to exclude the evidence codes before mapping my entrez ids to GO terms. > My final aim is to calculate a semantic similarity index inside the > clusters > > Many thanks for the help > > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] annotate_1.40.0 GO.db_2.10.1 org.Hs.eg.db_2.10.1 > RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 > Biobase_2.22.0 > [8] BiocGenerics_0.8.0 pheatmap_0.7.7 RColorBrewer_1.0-5 > plyr_1.8 > > loaded via a namespace (and not attached): > [1] grid_3.0.2 IRanges_1.20.5 stats4_3.0.2 tools_3.0.2 > XML_3.95-0.2 > xtable_1.7-1 > > --- > Juan Fernandez Tajes, phD > Grupo Xenomar Área de Genética > Facultad de Ciencias A Zapateira > Universidad de A Coruña > Spain > Tlf - +34 981 16700 > Email: Jfernandezt@udc.es > -- > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.4 years ago Vincent J. Carey, Jr. 6.7k

Login before adding your answer.