remove evidence Codes
1
0
Entering edit mode
@juan-fernandez-tajes-5273
Last seen 9.6 years ago
Dear List, I have a list of gene.symbol, that looks like that >head(mylist) $cluster.1 [1] "HSP90AB1" "INMT" "CKB" "NR2E1" "ME3" "FAM162A" "KIRREL2" $cluster.2 [1] "ENSG00000212860" "TRADD" "C1QBP" "KIAA1967" "ENSG00000137379" "MAP3K3" "TNFRSF1B" "BAG2" [9] "ENSG00000212866" "RIPK3" "EPRS" "HSPA6" "HSPA5" "IKBKG" "TBK1" "TRAF2" [17] "MAP3K7" "NFKB1" "MAP3K14" "HSPA1A" "MAP3K7IP2" "HSPBP1" "NFKB2" "DNAJA1" [25] "TNFRSF1A" "TRAF3IP2" "NFKBIA" "HSPA9" "ENSG00000183311" "TUBB" "TUBA3D" "TANK" [33] "ENSG00000215292" "REL" "MAP3K1" "HSPA1B" "HSPA8" "NFKBIB" "PGAM5" "EEF1A2" [41] "MAP3K8" "CLTC" "RCN2" "MAP3K7IP1" "RARS" "TRAF1" "TUBA3C" "HSPA1L" [49] "MYO1D" "NOD1" "HSP90AA2" "CAD" "RELB" "AIFM1" "TUBB2B" "RIPK2" [57] "CDC37" "IKBKB" "ERLIN1" "RIPK1" "TNIP2" "STUB1" "TUBB4" "HSPA2" [65] "CHUK" "DNAJC3" "CCDC50" "SLC25A5" "NFKBIE" "AK3" "TICAM1" "TIMM50" [73] "ANKRD17" "OTUD7B" "TNFAIP3" "RPS27L" "TRPC4AP" "TUBB6" "DNAJC6" "PXMP2" [81] "FLJ25006" $cluster.3 [1] "ACTB" "PFN1" "XPO6" "VASP" "ZYX" "PFN2" "DIAPH1" "APBB1IP" "DIAPH2" "PARVG" "ENAH" "PCYT1B" "PFN4" "CNN2" "NSMAF" "PFN3" [17] "LMOD1" $cluster.4 [1] "UBB" "HERC3" "KLRK1" "ULBP1" "RAET1E" "MICA" "HCST" "ENSG00000184444" [9] "ENSG00000206449" "ULBP2" "ZNF385A" "ULBP3" "RAET1G" $cluster.5 [1] "YWHAZ" "SLAIN2" "ZC3H13" "C12orf51" "PGLYRP1" "ATL3" $cluster.6 [1] "ACTG1" "EPS8L3" "PARVG" "TMSB4Y" "B3GALT1" "UGT1A6² I want to extract the GO terms for every clsuter (e.g each list component) but excluding some of them based on their evidence codes (such as IEA or NR). The code I¹m using is the following: e2s <- toTable(org.Hs.egSYMBOL) p <- lapply(mylist, function(x) {y <- e2s$gene_id[e2s$symbol %in% x]; return(y)}) entrezIDs <- lapply(p, function(x) {org.Hs.egGO(x)}) list.GO <- lapply(entrezIDs, function(x){toTable(x)}) With this approach I got a list of data.frames (list.GO) where I can exclude the evidence code afterwards, however I would like to know whether any way to exclude the evidence codes before mapping my entrez ids to GO terms. My final aim is to calculate a semantic similarity index inside the clusters Many thanks for the help > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.40.0 GO.db_2.10.1 org.Hs.eg.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 [8] BiocGenerics_0.8.0 pheatmap_0.7.7 RColorBrewer_1.0-5 plyr_1.8 loaded via a namespace (and not attached): [1] grid_3.0.2 IRanges_1.20.5 stats4_3.0.2 tools_3.0.2 XML_3.95-0.2 xtable_1.7-1 --- Juan Fernandez Tajes, phD Grupo Xenomar ­ Área de Genética Facultad de Ciencias ­ A Zapateira Universidad de A Coruña Spain Tlf - +34 981 16700 Email: Jfernandezt@udc.es -- [[alternative HTML version deleted]]
GO GO • 785 views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 6 weeks ago
United States
you can use the select interface to move from symbols to GO terms directly if you like; see below, input based on your cluster.5 (vector named 'dem'). if you really want to exclude by evidence code earlier, i believe you'll have to use the sqlite facilities underlying select, and there is documentation in the OrganismDbi vignette IIRC > library(Homo.sapiens) > dem [1] "YWHAZ" "SLAIN2" "ZC3H13" "C12orf51" "PGLYRP1" "ATL3" > demsel = select(org.Hs.eg.db, keytype="SYMBOL", keys=dem, columns="GO") Warning message: In .generateExtraRows(tab, keys, jointype) : 'select' resulted in 1:many mapping between keys and return rows > dim(demsel) [1] 63 4 > demsel[c(1:3,61:63),] SYMBOL GO EVIDENCE ONTOLOGY 1 YWHAZ GO:0002553 IEA BP 2 YWHAZ GO:0005515 IPI MF 3 YWHAZ GO:0005654 TAS CC 61 ATL3 GO:0016021 IDA CC 62 ATL3 GO:0042802 IDA MF 63 ATL3 GO:0051260 IDA BP > sessionInfo() R Under development (unstable) (2013-12-01 r64371) Platform: x86_64-apple-darwin10.8.0/x86_64 (64-bit) locale: [1] en_US.US-ASCII/en_US.US-ASCII/en_US.US-ASCII/C/en_US.US-ASCII/en_US .US-ASCII attached base packages: [1] parallel stats graphics grDevices datasets utils tools methods base other attached packages: [1] Homo.sapiens_1.1.2 TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 [3] org.Hs.eg.db_2.10.1 GO.db_2.10.1 [5] RSQLite_0.11.4 DBI_0.2-7 [7] OrganismDbi_1.5.2 GenomicFeatures_1.15.4 [9] GenomicRanges_1.15.11 XVector_0.3.3 [11] IRanges_1.21.13 AnnotationDbi_1.25.9 [13] Biobase_2.23.3 BiocGenerics_0.9.1 [15] BiocInstaller_1.13.3 weaver_1.29.1 [17] codetools_0.2-8 digest_0.6.4 loaded via a namespace (and not attached): [1] biomaRt_2.19.1 Biostrings_2.31.3 bitops_1.0-6 BSgenome_1.31.7 [5] GenomicAlignments_0.99.5 graph_1.41.1 RBGL_1.39.1 RCurl_1.95-4.1 [9] Rsamtools_1.15.14 rtracklayer_1.23.3 stats4_3.1.0 XML_3.98-1.1 [13] zlibbioc_1.9.0 On Sat, Dec 7, 2013 at 7:17 AM, jfertaj <jfernandezt@udc.es> wrote: > Dear List, > > I have a list of gene.symbol, that looks like that > > >head(mylist) > $cluster.1 > [1] "HSP90AB1" "INMT" "CKB" "NR2E1" "ME3" "FAM162A" > "KIRREL2" > > $cluster.2 > [1] "ENSG00000212860" "TRADD" "C1QBP" "KIAA1967" > "ENSG00000137379" "MAP3K3" "TNFRSF1B" "BAG2" > [9] "ENSG00000212866" "RIPK3" "EPRS" "HSPA6" > "HSPA5" "IKBKG" "TBK1" "TRAF2" > [17] "MAP3K7" "NFKB1" "MAP3K14" "HSPA1A" > "MAP3K7IP2" "HSPBP1" "NFKB2" "DNAJA1" > [25] "TNFRSF1A" "TRAF3IP2" "NFKBIA" "HSPA9" > "ENSG00000183311" "TUBB" "TUBA3D" "TANK" > [33] "ENSG00000215292" "REL" "MAP3K1" "HSPA1B" > "HSPA8" "NFKBIB" "PGAM5" "EEF1A2" > [41] "MAP3K8" "CLTC" "RCN2" "MAP3K7IP1" > "RARS" "TRAF1" "TUBA3C" "HSPA1L" > [49] "MYO1D" "NOD1" "HSP90AA2" "CAD" > "RELB" "AIFM1" "TUBB2B" "RIPK2" > [57] "CDC37" "IKBKB" "ERLIN1" "RIPK1" > "TNIP2" "STUB1" "TUBB4" "HSPA2" > [65] "CHUK" "DNAJC3" "CCDC50" "SLC25A5" > "NFKBIE" "AK3" "TICAM1" "TIMM50" > [73] "ANKRD17" "OTUD7B" "TNFAIP3" "RPS27L" > "TRPC4AP" "TUBB6" "DNAJC6" "PXMP2" > [81] "FLJ25006" > > $cluster.3 > [1] "ACTB" "PFN1" "XPO6" "VASP" "ZYX" "PFN2" "DIAPH1" > "APBB1IP" "DIAPH2" "PARVG" "ENAH" "PCYT1B" "PFN4" "CNN2" > "NSMAF" "PFN3" > [17] "LMOD1" > > $cluster.4 > [1] "UBB" "HERC3" "KLRK1" "ULBP1" > "RAET1E" "MICA" "HCST" "ENSG00000184444" > [9] "ENSG00000206449" "ULBP2" "ZNF385A" "ULBP3" > "RAET1G" > > $cluster.5 > [1] "YWHAZ" "SLAIN2" "ZC3H13" "C12orf51" "PGLYRP1" "ATL3" > > $cluster.6 > [1] "ACTG1" "EPS8L3" "PARVG" "TMSB4Y" "B3GALT1" "UGT1A6² > > > I want to extract the GO terms for every clsuter (e.g each list component) > but excluding some of them based on their evidence codes (such as IEA or > NR). The code I¹m using is the following: > > e2s <- toTable(org.Hs.egSYMBOL) > p <- lapply(mylist, function(x) {y <- e2s$gene_id[e2s$symbol %in% x]; > return(y)}) > > > > entrezIDs <- lapply(p, function(x) {org.Hs.egGO(x)}) > > list.GO <- lapply(entrezIDs, function(x){toTable(x)}) > > > With this approach I got a list of data.frames (list.GO) where I can > exclude > the evidence code afterwards, however I would like to know whether any way > to exclude the evidence codes before mapping my entrez ids to GO terms. > My final aim is to calculate a semantic similarity index inside the > clusters > > Many thanks for the help > > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] annotate_1.40.0 GO.db_2.10.1 org.Hs.eg.db_2.10.1 > RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 > Biobase_2.22.0 > [8] BiocGenerics_0.8.0 pheatmap_0.7.7 RColorBrewer_1.0-5 > plyr_1.8 > > loaded via a namespace (and not attached): > [1] grid_3.0.2 IRanges_1.20.5 stats4_3.0.2 tools_3.0.2 > XML_3.95-0.2 > xtable_1.7-1 > > --- > Juan Fernandez Tajes, phD > Grupo Xenomar ­ Área de Genética > Facultad de Ciencias ­ A Zapateira > Universidad de A Coruña > Spain > Tlf - +34 981 16700 > Email: Jfernandezt@udc.es > -- > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6