Hi,
I want to use WGCNA for my RNA-seq data set. based on WGCNA tutorial, for providing output file for gene ontology analysis I have to run below code that is based on MicroArray data set:
# Read in the probe annotation
GeneAnnotation=read.csv(file="GeneAnnotation.csv")
# Match probes in the data set to those of the annotation file
probes = names(datExprFemale)
probes2annot = match(probes,GeneAnnotation$substanceBXH)
# data frame with gene significances (cor with the traits)
datGS.Traits=data.frame(cor(datExprFemale,datTraits,use="p"))
names(datGS.Traits)=paste("cor",names(datGS.Traits),sep=".")
datOutput=data.frame(ProbeID=names(datExprFemale),
GeneAnnotation[probes2annot,],moduleColorsFemale,datKME,datGS.Traits)
# save the results in a comma delimited file
write.table(datOutput,"FemaleMouseResults.csv",row.names=F,sep=",")
Asumption my expression file is datExprSTLMS. How can I customize that code for RNA-seq?
I appreciate if anybody share his/her comment with me.
Best Regrds,
Mohammad
Dear Dr. Langfelder
Thanks for your comment. for better understanding your mean I present example of my RNA-seq(9 case) as below:
> head(datExprSTLMS02,3)
case[1] case[2] case[3] case[4] case[5] case[6] case[7] case[8] case[9]
ENSG00000000003 15.65229 16.47096 17.831792 15.68051 16.828615 16.11393 15.67620 15.52688 16.91999
ENSG00000000005 17.35388 16.49920 8.138663 16.15056 9.519406 19.95802 12.73388 14.13277 11.21169
ENSG00000000419 19.01195 19.60623 19.425689 19.29729 18.552686 19.13429 19.28964 19.17179 19.22746
and head() for annotation file is:
based on that information, I appreciate if you share your comment with me and give me solution for rewrite the code about output file for gene ontology analysis in WGCNA tutorial code.
P.S: I know how to convert Ensemble gene_id to gene name via BiomaRt package.
Best Regards,
Mohammad
Change the line
probes2annot = match(probes,GeneAnnotation$substanceBXH)
to
probes2annot = match(probes,GeneAnnotation$gene_id)
You may also want to change
ProbeID=names(datExprFemale),
GeneAnnotation[probes2annot,]
to just
GeneAnnotation[probes2annot, c("gene_id", "gene_name")]
and you can add whichever columns from GeneAnnotation that you want in the output table.
Dear Professor Langfelder
Thanks for your comment. As you know, in WGCNA analysis for Gene Enrichment Analysis and also GO study I need "Locus Link ID". based on below code I have to have "Locus Link ID" but in my annotation file(gencode.v22.genes.csv) I don't have this field:
I deeply appreciate if you share your comment with me.
Best Regards,
Mohammad
LocusLinkID is another name for Entrez ID. You can change all references to LocusLinkID to Entrez, and you will need the Entrez column in your gene annotation table.
For this particular piece of code, however, you can use any gene identifier you like (or your enrichment software can use). You could use symbol or even the Ensembl IDs.
Dear Professor Langfelder
Thanks for your comment. Now, I used Ensemble gene_id instead Probes and alternate of Locus Link ID used Gene_Symbol. I have to say that I tried to mapping my Ensemble Gene IDs to Entrez IDs by "biomaRt" package but unfortunately my all gene types didn't have one to one mapping. In fact I have 56930 Ensemble IDs but I could find just 35950 Entrez IDs. because of this experience, I used Gene Symbol instead of Entrez ID. So, for GO analysis I ran "GOenrichmentAnalysis" function based on below syntax:
But I got this Error:
Could you please share your comment with me for solving this problem?
Best Regards,
Mohammad
Would you please be so kind and read the help page for GOenrichmentAnalysis? The identifiers must be Entrez IDs. Missing Entrez IDs are removed from the enrichment analysis.