edgeR : results from goana and kegga functions show opposite sig. p-value
Entering edit mode
Last seen 5 days ago
United States


I am running DE and FE analysis using edgeR for matched samples at two time points. The issue I have is interpreting the output from GO and KEGG analysis. For ex: a pathway from GO has a sig. P.Up value, while the same pathway in KEGG shows as sig. P.Down value:

    Term    Ont N   Up  Down    **P.Up**    P.Down
GO:0050853  B cell receptor signaling pathway   BP  104 66  31  **1.73E-06**    0.988896595
    Pathway N   Up  Down    P.Up    **P.Down**
path:hsa04662   B cell receptor signaling pathway   72  14  48  0.999960966 **4.49E-06**

Can you please let me know if I am missing something.

Please see the code below for your reference.

Time <- as.factor(Samples$group)
y <- DGEList(counts=Counts, group=Time,genes=as.data.frame(row.names(Counts)))
y$genes$entrezid <- mapIds(org.Hs.eg.db, y$genes$`row.names(Counts)`, keytype="SYMBOL", column="ENTREZID")

y$samples$Patient <-  Samples$patient 

# drop without gene symbols
y <- y[!is.na(y$genes$entrezid), ]

# design matrix
design <- model.matrix(~0+Patient+group, data=y$samples)

# filter out lowly expressed genes 
keep <- filterByExpr(y,group=y$samples$group)
y <- y[keep,,keep.lib.sizes=FALSE]

# calculate normalized factors
y <- calcNormFactors(y)

# estimate dispersion
y <- estimateDisp(y,design, robust=TRUE)

#To perform quasi-likelihood F-tests:
fit <- glmQLFit(y,design,robust=TRUE)

ql <- glmQLFTest(fit)

# GO analysis
go <- goana(ql,geneid = "entrezid",species="Hs")
go_out <- topGO(go,n=Inf)

# KEGG analysis
keg <- kegga(ql,geneid = "entrezid",species="Hs")
kegg_out <- topKEGG(keg, n=Inf)

sessionInfo( )
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] matrixStats_0.62.0   umap_0.2.8.0         org.Hs.eg.db_3.14.0  AnnotationDbi_1.56.2 IRanges_2.28.0      
 [6] S4Vectors_0.32.4     Biobase_2.54.0       BiocGenerics_0.40.0  edgeR_3.36.0         limma_3.50.3        
[11] stringi_1.7.6        data.table_1.14.2    forcats_0.5.1        stringr_1.4.0        dplyr_1.0.8         
[16] purrr_0.3.4          readr_2.1.2          tidyr_1.2.0          tibble_3.1.6         ggplot2_3.3.5       
[21] tidyverse_1.3.1      magick_2.7.3         tesseract_5.0.0     

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           fs_1.5.2               lubridate_1.8.0        bit64_4.0.5           
 [5] httr_1.4.2             GenomeInfoDb_1.30.1    tools_4.1.3            backports_1.4.1       
 [9] utf8_1.2.2             R6_2.5.1               DBI_1.1.2              colorspace_2.0-3      
[13] withr_2.5.0            tidyselect_1.1.2       bit_4.0.4              curl_4.3.2            
[17] compiler_4.1.3         textshaping_0.3.6      cli_3.3.0              rvest_1.0.2           
[21] xml2_1.3.3             labeling_0.4.2         scales_1.2.0           askpass_1.1           
[25] rappdirs_0.3.3         systemfonts_1.0.4      digest_0.6.29          XVector_0.34.0        
[29] pkgconfig_2.0.3        dbplyr_2.1.1           fastmap_1.1.0          rlang_1.0.2           
[33] readxl_1.4.0           rstudioapi_0.13        RSQLite_2.2.12         farver_2.1.0          
[37] generics_0.1.2         jsonlite_1.8.0         RCurl_1.98-1.6         magrittr_2.0.3        
[41] GO.db_3.14.0           GenomeInfoDbData_1.2.7 Matrix_1.4-1           Rcpp_1.0.8.3          
[45] munsell_0.5.0          fansi_1.0.3            reticulate_1.24        lifecycle_1.0.1       
[49] zlibbioc_1.40.0        grid_4.1.3             blob_1.2.3             crayon_1.5.1          
[53] lattice_0.20-45        Biostrings_2.62.0      haven_2.5.0            splines_4.1.3         
[57] hms_1.1.1              KEGGREST_1.34.0        locfit_1.5-9.5         pillar_1.7.0          
[61] reprex_2.0.1           glue_1.6.2             modelr_0.1.8           png_0.1-7             
[65] vctrs_0.4.1            tzdb_0.3.0             cellranger_1.1.0       gtable_0.3.0          
[69] openssl_2.0.0          assertthat_0.2.1       cachem_1.0.6           broom_0.8.0           
[73] RSpectra_0.16-0        ragg_1.2.2             memoise_2.0.1          statmod_1.4.36        
[77] ellipsis_0.3.2

Thanks Sharvari

edgeR • 207 views
Entering edit mode
Last seen 8 hours ago
United States

It is not a valid assumption that a GO term and a KEGG term that appear to be the same are actually the same.

> library(org.Hs.eg.db)
## Get GO terms for B-cell pathway
> gns.GO <- mapIds(org.Hs.eg.db, "GO:0050853", "ENTREZID","GOALL", multiVals = "list")[[1]]
'select()' returned 1:many mapping between keys and columns
## remove any dups
> gns.GO <- unique(gns.GO)

## Now get KEGG genes
> gns.KEGG <- read.table("http://rest.kegg.jp/link/pathway/hsa")
> gns.KEGG <- split(gns.KEGG[,1], gns.KEGG[,2])[["path:hsa04662"]]
> gns.KEGG <- gsub("hsa:", "", gns.KEGG)
## remove any dups
> gns.KEGG <- unique(gns.KEGG)

## how many genes are in both groups?
> length(intersect(gns.GO, gns.KEGG))
[1] 17

## how many are in the GO term, but not KEGG?
> length(setdiff(gns.GO, gns.KEGG))
[1] 114

## How many are in the KEGG term but not GO?
> length(setdiff(gns.KEGG, gns.GO))
[1] 65

Only about 20% of the KEGG genes are in the GO term, and about 13% of the GO genes are in the KEGG pathway.

Entering edit mode

Thank you so much!


Login before adding your answer.

Traffic: 233 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6