Question

GO enrichment

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.4 years ago

Hello everyone I am very new with bioinformatics work, which I hope someone can give me the answer and suggestions. I am try to use GOseq package to get GO enrichment for my data which is not built-in oraganism. >enriched.GO=unsorted_L14.15_S_GO.wall$category[p.adjust(unsorted_L14. 15_S_GO.wall$over_ represented_pvalue, method="BH") < 0.05] > head(enriched.GO) character(0) I have prepared data as below #create LengthData > unsorted_L14.15_S_LengthData <- unsorted_L14.15_S_gene2length > unsorted_L14.15_S_id <- as.vector(unsorted_L14.15_S_gene2length[,1]) > unsorted_L14.15_S_length <- as.numeric(unsorted_L14.15_S_gene2length[,2]) > unsorted_L14.15_S_LengthData <- structure(unsorted_L14.15_S_length, .names=unsorted_L14.15_S_id) #PWF=fitting the probability weighting function unsorted_L14.15_S_pwf = nullp(unsorted_genesL14.15_S, bias.data=unsorted_L14.15_S_length, plot.fit=TRUE) unsorted_L14.15_S_pwf = nullp(unsorted_genesL14.15_S, bias.data=unsorted_L14.15_S_LengthData, plot.fit=TRUE) > head(unsorted_L14.15_S_pwf) DEgenes bias.data pwf Cucsa.000210 0 1512 0.5013243 Cucsa.000250 0 405 0.5182944 Cucsa.000270 0 258 0.5205436 > unsorted_L14.15_S_GO.wall <- goseq(unsorted_L14.15_S_pwf, gene2cat=unsorted_L14.15_S_gene2go, test.cats=c("GO:CC", "GO:BP", "GO:MF"), method="Wallenius", repcnt=2000, use_genes_without_cat=TRUE) Using manually entered categories. Calculating the p-values... > head(unsorted_L14.15_S_GO.wall) category over_represented_pvalue under_represented_pvalue numDEInCat numInCat 594 GO:0043565 0.0001000255 0.9999945 17 18 177 GO:0005515 0.0079055773 0.9933285 618 1162 380 GO:0008565 0.0088243286 1.0000000 7 7 I found some category such as GO:0043565 has potential to be one of the category that have significant enrichment because from the result it obtains 17 DE_genes out 18 genes that assigned to this category. Then I went back to check in my genelist table I found only 7 DE_genes out of 18 genes for this category. So I don't know what I have done wrong. I have someone can help me with this. Thank you so much. Regards, warin -- output of sessionInfo(): > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 [3] LC_TIME=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 [9] LC_ADDRESS=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_NUMERIC=C LC_COLLATE=C LC_MESSAGES=en_US.UTF-8 LC_NAME=C LC_TELEPHONE=C LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics other attached packages: [1] GO.db_2.14.0 [4] DBI_0.2-7 grDevices utils org.Hs.eg.db_2.14.0 AnnotationDbi_1.26.0 datasets methods RSQLite_0.11.4 GenomeInfoDb_1.0.2 base [7] Biobase_2.24.0 [10] limma_3.20.8 [13] BiasedUrn_1.06.1 BiocGenerics_0.10.0 goseq_1.16.2 edgeR_3.6.4 geneLenDataBase_1.0.0 loaded via a namespace (and not attached): [1] BBmisc_1.7 BSgenome_1.32.0 [4] BiocParallel_0.6.1 Biostrings_2.32.0 [7] GenomicFeatures_1.16.2 GenomicRanges_1.16.3 [10] Matrix_1.1-4 RCurl_1.95-4.1 [13] Rsamtools_1.16.1 XML_3.98-1.1 [16] biomaRt_2.20.0 bitops_1.0-6 [19] checkmate_1.1 codetools_0.2-8 [22] fail_1.2 foreach_1.4.2 [25] iterators_1.0.7 lattice_0.20-29 [28] nlme_3.1-117 plyr_1.8.1 [31] sendmailR_1.1-2 stats4_3.1.0 [34] tools_3.1.0 zlibbioc_1.10.0 BatchJobs_1.2 GenomicAlignments_1.0.2 IRanges_1.22.9 Rcpp_0.11.2 XVector_0.4.0 brew_1.0-6 digest_0.6.4 grid_3.1.0 mgcv_1.8-0 rtracklayer_1.24.2 stringr_0.6.2 -- Sent via the guest posting facility at bioconductor.org.

GO Category goseq GO Category goseq • 1.8k views

ADD COMMENT • link 11.4 years ago Guest User ★ 13k

score 0 · Answer 1 · 2014-08-10

Hello all I have checked the number of DE genes that assigned GO category agian. I found that the result from GOseq is correct. I am apologize for all people who tried to help me. regards, warin -- output of sessionInfo(): sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 [3] LC_TIME=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 [9] LC_ADDRESS=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_NUMERIC=C LC_COLLATE=C LC_MESSAGES=en_US.UTF-8 LC_NAME=C LC_TELEPHONE=C LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics other attached packages: [1] GO.db_2.14.0 [4] DBI_0.2-7 grDevices utils org.Hs.eg.db_2.14.0 AnnotationDbi_1.26.0 -- Sent via the guest posting facility at bioconductor.org.