Search
Question: Problem for finding "Locus Link ID" from Annotation file
0
gravatar for modarzi
4 weeks ago by
modarzi10
modarzi10 wrote:

Hi,

Generally, based on  annotation file(below link) for TCGA data, I don't have  "LocusLinkID" as an attribute for genes. but as you see in below code of WGCNA tutorial, for Interfacing network analysis with other data such as functional annotation and gene ontology I need "LocusLinkID":

# Read in the probe annotation
annot = read.csv(file = "GeneAnnotation.csv");
# Match probes in the data set to the probe IDs in the annotation file
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH)
# Get the corresponding Locuis Link IDs
allLLIDs = annot$LocusLinkID[probes2annot];
# $ Choose interesting modules
intModules = c("brown", "red", "salmon")
for (module in intModules)
{
  # Select module probes
  modGenes = (moduleColors==module)
  # Get their entrez ID codes
  modLLIDs = allLLIDs[modGenes];
  # Write them into a file
  fileName = paste("LocusLinkIDs-", module, ".txt", sep="");
  write.table(as.data.frame(modLLIDs), file = fileName,
              row.names = FALSE, col.names = FALSE)
}
# As background in the enrichment analysis, we will use all probes in the analysis.
fileName = paste("LocusLinkIDs-all.txt", sep="");
write.table(as.data.frame(allLLIDs), file = fileName,
            row.names = FALSE, col.names = FALSE)

I use "gene_id" instead of "substanceBXH" but for "LocusLinkID" I don't have any idea.

I appreciate if any body share his/her comment with me for solving this problem?

Best Regards,
Mohammad Darzi

 

PS: my annotation file can fine in below link:

https://github.com/cpreid2/gdc-rnaseq-tool/blob/master/Gene_Annotation/gencode.v22.genes.txt

ADD COMMENTlink modified 4 weeks ago by James W. MacDonald48k • written 4 weeks ago by modarzi10
1
gravatar for James W. MacDonald
4 weeks ago by
United States
James W. MacDonald48k wrote:

Although often asked about around here, WGCNA isn't actually a Bioconductor package. It's a CRAN package. Questions about CRAN packages should be asked at R-help@r-project.org.

Your main problem is that you are following a tutorial without understanding it enough to apply to your own data. The basic idea is to take the IDs from whatever data you have, and then map to other IDs from a particular annotation service (and wow - LocusLink? that's a blast from the past). Anyway, matching is just something that you can do with base R, using match, or there is probably some spiffy way to do that using the tidyverse as well.

But again, how to do basic things with R is a R-help question, not Bioconductor.

ADD COMMENTlink written 4 weeks ago by James W. MacDonald48k

Dear Dr. W.MacDonald

Hello,

Thanks for your comment. You are right. WGCNA is not Bioconductor package and my presenting of question  is wrong. Actually my problem relate to converting Ensemble IDs to Entrez IDs. I have Ensemble IDs and also Symbol Genes So based on theses information I would like to retrieve Entrez IDs. Now, I do it by "biomaRt" package. So, my gene types are 56390 but based on below code I get just 19457 Enterz IDs.also some of them don't have Enterz ID.

library(biomaRt)
DF=read.csv("df3.csv")
dim(DF)
[1] 56390    55
> head(DF$gene_id)
[1] "ENSG00000000003.13" "ENSG00000000005.5"  "ENSG00000000419.11" "ENSG00000000457.12"
[5] "ENSG00000000460.15" "ENSG00000000938.11"

mart= useMart("ENSEMBL_MART_ENSEMBL")
ensembl <- useDataset("hsapiens_gene_ensembl", mart)
ensembl_gene_id=DF$gene_id

x=getBM(attributes= c("hgnc_symbol","entrezgene","gene_biotype"),
      filters="ensembl_gene_id_version",
      values=ensembl_gene_id, mart=ensembl)

I appreciate if you share your comment with me.

Best Regards,

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by modarzi10
1

There is usually little profit in trying to convert from Ensembl to Entrez. There are any number of differences between what the two annotation services think are the set of known genes, for myriad reasons, and trying to naively convert will simply show you just how extensive those differences are.

My general recommendation is to stick with one annotation service to limit these technicalities, which are usually unimportant to the analysis at hand.

If you do insist on mapping between them, do note that biomaRt will return results in random order, so you need to return the filter column in the attributes so you can reorder.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by James W. MacDonald48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 360 users visited in the last hour