Search
Question: Problem for finding "Locus Link ID" from Annotation file
0
4 weeks ago by
modarzi10
modarzi10 wrote:

Hi,

Generally, based on  annotation file(below link) for TCGA data, I don't have  "LocusLinkID" as an attribute for genes. but as you see in below code of WGCNA tutorial, for Interfacing network analysis with other data such as functional annotation and gene ontology I need "LocusLinkID":

# Read in the probe annotation
# Match probes in the data set to the probe IDs in the annotation file
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH) # Get the corresponding Locuis Link IDs allLLIDs = annot$LocusLinkID[probes2annot];
# $Choose interesting modules intModules = c("brown", "red", "salmon") for (module in intModules) { # Select module probes modGenes = (moduleColors==module) # Get their entrez ID codes modLLIDs = allLLIDs[modGenes]; # Write them into a file fileName = paste("LocusLinkIDs-", module, ".txt", sep=""); write.table(as.data.frame(modLLIDs), file = fileName, row.names = FALSE, col.names = FALSE) } # As background in the enrichment analysis, we will use all probes in the analysis. fileName = paste("LocusLinkIDs-all.txt", sep=""); write.table(as.data.frame(allLLIDs), file = fileName, row.names = FALSE, col.names = FALSE) I use "gene_id" instead of "substanceBXH" but for "LocusLinkID" I don't have any idea. I appreciate if any body share his/her comment with me for solving this problem? Best Regards, Mohammad Darzi PS: my annotation file can fine in below link: https://github.com/cpreid2/gdc-rnaseq-tool/blob/master/Gene_Annotation/gencode.v22.genes.txt ADD COMMENTlink modified 4 weeks ago by James W. MacDonald48k • written 4 weeks ago by modarzi10 1 4 weeks ago by United States James W. MacDonald48k wrote: Although often asked about around here, WGCNA isn't actually a Bioconductor package. It's a CRAN package. Questions about CRAN packages should be asked at R-help@r-project.org. Your main problem is that you are following a tutorial without understanding it enough to apply to your own data. The basic idea is to take the IDs from whatever data you have, and then map to other IDs from a particular annotation service (and wow - LocusLink? that's a blast from the past). Anyway, matching is just something that you can do with base R, using match, or there is probably some spiffy way to do that using the tidyverse as well. But again, how to do basic things with R is a R-help question, not Bioconductor. ADD COMMENTlink written 4 weeks ago by James W. MacDonald48k Dear Dr. W.MacDonald Hello, Thanks for your comment. You are right. WGCNA is not Bioconductor package and my presenting of question is wrong. Actually my problem relate to converting Ensemble IDs to Entrez IDs. I have Ensemble IDs and also Symbol Genes So based on theses information I would like to retrieve Entrez IDs. Now, I do it by "biomaRt" package. So, my gene types are 56390 but based on below code I get just 19457 Enterz IDs.also some of them don't have Enterz ID. library(biomaRt) DF=read.csv("df3.csv") dim(DF) [1] 56390 55 > head(DF$gene_id)
[1] "ENSG00000000003.13" "ENSG00000000005.5"  "ENSG00000000419.11" "ENSG00000000457.12"
[5] "ENSG00000000460.15" "ENSG00000000938.11"

mart= useMart("ENSEMBL_MART_ENSEMBL")
ensembl <- useDataset("hsapiens_gene_ensembl", mart)
ensembl_gene_id=DF\$gene_id

x=getBM(attributes= c("hgnc_symbol","entrezgene","gene_biotype"),
filters="ensembl_gene_id_version",
values=ensembl_gene_id, mart=ensembl)

I appreciate if you share your comment with me.

Best Regards,

1

There is usually little profit in trying to convert from Ensembl to Entrez. There are any number of differences between what the two annotation services think are the set of known genes, for myriad reasons, and trying to naively convert will simply show you just how extensive those differences are.

My general recommendation is to stick with one annotation service to limit these technicalities, which are usually unimportant to the analysis at hand.

If you do insist on mapping between them, do note that biomaRt will return results in random order, so you need to return the filter column in the attributes so you can reorder.