Finding enriched pathways from a gene list.
1
1
Entering edit mode
omarrafiqued ▴ 50
@omarrafiqued-21833
Last seen 9 months ago
India

I have a gene list and now I want to use Go or KEGG for the enrichment analysis of the top deferentially expressed genes. The problem I am facing is with the data-set. The data-set is a matrix with with approximately 570 samples and 12000 genes. The sample names are in the standard format e.g. "TCGA-3C-AALK-01A-11R-A41B-07". I get this. But the gene names are something I done understand. for example, the first five genes in the data-set are named as , "1", "87769", "144568", "2", "53947"..... I don't know if they are ENTREZ IDs or some other format of gene naming. Could someone please clarify this confusion. Furthermore, could someone provide an R code to do enrichment analysis using the above naming format. For clarification below I have provided the first 100 gene names in the data-set...

   [1] "1"         "87769"     "144568"    "2"         "53947"     "65985"     "51166"    
   [8] "79719"     "22848"     "57505"     "80755"     "16"        "60496"     "132949"   
  [15] "10157"     "26574"     "9625"      "18"        "10349"     "79963"     "26154"    
  [22] "650655"    "19"        "20"        "21"        "24"        "23461"     "23460"    
  [29] "10347"     "10351"     "10350"     "23456"     "5243"      "5244"      "10058"    
  [36] "11194"     "23457"     "89845"     "85320"     "4363"      "1244"      "8714"     
  [43] "10257"     "10057"     "730013"    "368"       "6833"      "10060"     "215"      
  [50] "225"       "5825"      "5826"      "6059"      "9619"      "9429"      "83451"    
  [57] "26090"     "84945"     "25864"     "84836"     "116236"    "84696"     "11057"    
  [64] "171586"    "63874"     "51099"     "57406"     "79575"     "10152"     "25890"    
  [71] "51225"     "27"        "3983"      "84448"     "22885"     "28"        "26"       
  [78] "29"        "80325"     "25841"     "30"        "10449"     "31"        "32"       
  [85] "80724"     "84129"     "27034"     "34"        "36"        "35"        "37"       
  [92] "176"       "9744"      "23527"     "116983"    "38"        "39"        "64746"    
  [99] "79777"     "91452"   

Thanks .

microarray limma kegg ENTREZ enrichment analysis • 5.6k views
ADD COMMENT
1
Entering edit mode

They appear to be Entrez IDs, indeed; however, please quote the exact source of your data (and check there yourself) in order to help to confirm this.

For the enrichment work itself, you can eventually use:

  • topGO
  • KEGGprofile

Both of these accept Entrez IDs and are both Bioconductor packages.

ADD REPLY
0
Entering edit mode

Thanks for the answer.

Do the R libraries you listed below require internet connection.

I downloaded the data from the following link:

http://gdac.broadinstitute.org/runs/stddata_201504_02/data/BRCA/20150402/

With the following file name.

gdac.broadinstitute.orgBRCA.MergernaseqilluminahiseqrnasequnceduLevel3geneexpression_data.Level3.2015040200.0.0.tar.gz

ADD REPLY
1
Entering edit mode

Thanks. Then —yes— they are likely Entrez IDs. For your other question, I believe they require an Internet connection. Can you not check that yourself ... ?

ADD REPLY
0
Entering edit mode

I have very limited access to the internet...so I had to ask...Thanks for the answers.

ADD REPLY
3
Entering edit mode
@gordon-smyth
Last seen 6 minutes ago
WEHI, Melbourne, Australia

The row.names you show do appear to be human Entrez Gene Ids.

You added a limma tag to your question and I am guessing that you have used limma before. So you would know that there are several pathway analysis functions provided by limma and they all work with Entrez Gene Ids. Of all the gene set testing functions provided by limma (roast, fry, camera, wilcoxGST, goana and kegga), only kegga requires in internet connection. The kegga internet requirement is unavoidable because of KEGG's licensing restrictions.

For example, if the top row of genes in your question was your gene list, a GO analysis could be done by

Genes <- c("1","87769","144568","2","53947","65985","51166" 
g <- goana(Genes)
topGO(g)

provided you have the Bioconductor limma, GO.db and org.Hs.eg.db packages installed.

There are plenty of examples showing how to do pathway enrichment analyses in the context of a limma or edgeR differential expression analysis, for example

Of course you can't do an enrichment analysis until you have a gene list and you won't have a gene list until you undertake a differential expression analysis. At the moment you haven't mentioned any analysis.

ADD COMMENT
0
Entering edit mode

I can't thank you enough for this answer.

ADD REPLY

Login before adding your answer.

Traffic: 960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6