Finding enriched pathways from a gene list.
1
1
Entering edit mode
omarrafiqued ▴ 50
@omarrafiqued-21833
Last seen 5 months ago
India

I have a gene list and now I want to use Go or KEGG for the enrichment analysis of the top deferentially expressed genes. The problem I am facing is with the data-set. The data-set is a matrix with with approximately 570 samples and 12000 genes. The sample names are in the standard format e.g. "TCGA-3C-AALK-01A-11R-A41B-07". I get this. But the gene names are something I done understand. for example, the first five genes in the data-set are named as , "1", "87769", "144568", "2", "53947"..... I don't know if they are ENTREZ IDs or some other format of gene naming. Could someone please clarify this confusion. Furthermore, could someone provide an R code to do enrichment analysis using the above naming format. For clarification below I have provided the first 100 gene names in the data-set...

   [1] "1"         "87769"     "144568"    "2"         "53947"     "65985"     "51166"
[8] "79719"     "22848"     "57505"     "80755"     "16"        "60496"     "132949"
[15] "10157"     "26574"     "9625"      "18"        "10349"     "79963"     "26154"
[22] "650655"    "19"        "20"        "21"        "24"        "23461"     "23460"
[29] "10347"     "10351"     "10350"     "23456"     "5243"      "5244"      "10058"
[36] "11194"     "23457"     "89845"     "85320"     "4363"      "1244"      "8714"
[43] "10257"     "10057"     "730013"    "368"       "6833"      "10060"     "215"
[50] "225"       "5825"      "5826"      "6059"      "9619"      "9429"      "83451"
[57] "26090"     "84945"     "25864"     "84836"     "116236"    "84696"     "11057"
[64] "171586"    "63874"     "51099"     "57406"     "79575"     "10152"     "25890"
[71] "51225"     "27"        "3983"      "84448"     "22885"     "28"        "26"
[78] "29"        "80325"     "25841"     "30"        "10449"     "31"        "32"
[85] "80724"     "84129"     "27034"     "34"        "36"        "35"        "37"
[92] "176"       "9744"      "23527"     "116983"    "38"        "39"        "64746"
[99] "79777"     "91452"


Thanks .

microarray limma kegg ENTREZ enrichment analysis • 3.4k views
1
Entering edit mode

They appear to be Entrez IDs, indeed; however, please quote the exact source of your data (and check there yourself) in order to help to confirm this.

For the enrichment work itself, you can eventually use:

• topGO
• KEGGprofile

Both of these accept Entrez IDs and are both Bioconductor packages.

0
Entering edit mode

Do the R libraries you listed below require internet connection.

With the following file name.

1
Entering edit mode

Thanks. Then —yes— they are likely Entrez IDs. For your other question, I believe they require an Internet connection. Can you not check that yourself ... ?

0
Entering edit mode

3
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

The row.names you show do appear to be human Entrez Gene Ids.

You added a limma tag to your question and I am guessing that you have used limma before. So you would know that there are several pathway analysis functions provided by limma and they all work with Entrez Gene Ids. Of all the gene set testing functions provided by limma (roast, fry, camera, wilcoxGST, goana and kegga), only kegga requires in internet connection. The kegga internet requirement is unavoidable because of KEGG's licensing restrictions.

For example, if the top row of genes in your question was your gene list, a GO analysis could be done by

Genes <- c("1","87769","144568","2","53947","65985","51166"
g <- goana(Genes)
topGO(g)


provided you have the Bioconductor limma, GO.db and org.Hs.eg.db packages installed.

There are plenty of examples showing how to do pathway enrichment analyses in the context of a limma or edgeR differential expression analysis, for example

Of course you can't do an enrichment analysis until you have a gene list and you won't have a gene list until you undertake a differential expression analysis. At the moment you haven't mentioned any analysis.

0
Entering edit mode

I can't thank you enough for this answer.