Question

Problem for finding "Locus Link ID" from Annotation file

0

Entering edit mode

modarzi ▴ 10

@modarzi-16296

Last seen 4.4 years ago

Hi,

Generally, based on annotation file(below link) for TCGA data, I don't have "LocusLinkID" as an attribute for genes. but as you see in below code of WGCNA tutorial, for Interfacing network analysis with other data such as functional annotation and gene ontology I need "LocusLinkID":

# Read in the probe annotation
annot = read.csv(file = "GeneAnnotation.csv");
# Match probes in the data set to the probe IDs in the annotation file
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH)
# Get the corresponding Locuis Link IDs
allLLIDs = annot$LocusLinkID[probes2annot];
# $ Choose interesting modules
intModules = c("brown", "red", "salmon")
for (module in intModules)
{
  # Select module probes
  modGenes = (moduleColors==module)
  # Get their entrez ID codes
  modLLIDs = allLLIDs[modGenes];
  # Write them into a file
  fileName = paste("LocusLinkIDs-", module, ".txt", sep="");
  write.table(as.data.frame(modLLIDs), file = fileName,
              row.names = FALSE, col.names = FALSE)
}
# As background in the enrichment analysis, we will use all probes in the analysis.
fileName = paste("LocusLinkIDs-all.txt", sep="");
write.table(as.data.frame(allLLIDs), file = fileName,
            row.names = FALSE, col.names = FALSE)

I use "gene_id" instead of "substanceBXH" but for "LocusLinkID" I don't have any idea.

I appreciate if any body share his/her comment with me for solving this problem?

Best Regards,
Mohammad Darzi

PS: my annotation file can fine in below link:

https://github.com/cpreid2/gdc-rnaseq-tool/blob/master/Gene_Annotation/gencode.v22.genes.txt

wgcna package TCGA LocusLinkID genecode ensembl • 1.9k views

ADD COMMENT • link updated 6.4 years ago by James W. MacDonald 68k • written 6.4 years ago by modarzi ▴ 10

score 1 · Answer 1 · 2018-10-19

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Although often asked about around here, WGCNA isn't actually a Bioconductor package. It's a CRAN package. Questions about CRAN packages should be asked at R-help@r-project.org.

Your main problem is that you are following a tutorial without understanding it enough to apply to your own data. The basic idea is to take the IDs from whatever data you have, and then map to other IDs from a particular annotation service (and wow - LocusLink? that's a blast from the past). Anyway, matching is just something that you can do with base R, using match, or there is probably some spiffy way to do that using the tidyverse as well.

But again, how to do basic things with R is a R-help question, not Bioconductor.

ADD COMMENT • link 6.4 years ago James W. MacDonald 68k

0

Entering edit mode

Dear Dr. W.MacDonald

Hello,

Thanks for your comment. You are right. WGCNA is not Bioconductor package and my presenting of question is wrong. Actually my problem relate to converting Ensemble IDs to Entrez IDs. I have Ensemble IDs and also Symbol Genes So based on theses information I would like to retrieve Entrez IDs. Now, I do it by "biomaRt" package. So, my gene types are 56390 but based on below code I get just 19457 Enterz IDs.also some of them don't have Enterz ID.

library(biomaRt)
DF=read.csv("df3.csv")
dim(DF)
[1] 56390    55
> head(DF$gene_id)
[1] "ENSG00000000003.13" "ENSG00000000005.5"  "ENSG00000000419.11" "ENSG00000000457.12"
[5] "ENSG00000000460.15" "ENSG00000000938.11"

mart= useMart("ENSEMBL_MART_ENSEMBL")
ensembl <- useDataset("hsapiens_gene_ensembl", mart)
ensembl_gene_id=DF$gene_id

x=getBM(attributes= c("hgnc_symbol","entrezgene","gene_biotype"),
      filters="ensembl_gene_id_version",
      values=ensembl_gene_id, mart=ensembl)

I appreciate if you share your comment with me.

Best Regards,

ADD REPLY • link 6.4 years ago modarzi ▴ 10

1

Entering edit mode

There is usually little profit in trying to convert from Ensembl to Entrez. There are any number of differences between what the two annotation services think are the set of known genes, for myriad reasons, and trying to naively convert will simply show you just how extensive those differences are.

My general recommendation is to stick with one annotation service to limit these technicalities, which are usually unimportant to the analysis at hand.

If you do insist on mapping between them, do note that biomaRt will return results in random order, so you need to return the filter column in the attributes so you can reorder.

ADD REPLY • link 6.4 years ago James W. MacDonald 68k