problem for Using RNA-seq data-set in WGCNA
1
0
Entering edit mode
modarzi ▴ 10
@modarzi-16296
Last seen 4.2 years ago

Hi,

I want to use WGCNA for my RNA-seq data set. based on WGCNA tutorial, for providing output file for gene ontology analysis I have to run below code that  is based on MicroArray data set:

# Read in the probe annotation
GeneAnnotation=read.csv(file="GeneAnnotation.csv")
# Match probes in the data set to those of the annotation file
probes = names(datExprFemale)
probes2annot = match(probes,GeneAnnotation$substanceBXH)
# data frame with gene significances (cor with the traits)
datGS.Traits=data.frame(cor(datExprFemale,datTraits,use="p"))
names(datGS.Traits)=paste("cor",names(datGS.Traits),sep=".")
datOutput=data.frame(ProbeID=names(datExprFemale),
                     GeneAnnotation[probes2annot,],moduleColorsFemale,datKME,datGS.Traits)
# save the results in a comma delimited file
write.table(datOutput,"FemaleMouseResults.csv",row.names=F,sep=",")

Asumption my expression file is datExprSTLMS. How can I customize that code for RNA-seq?

I appreciate if anybody share his/her comment with me.

 

Best Regrds,

Mohammad

WGCNA RNA-seq • 3.1k views
ADD COMMENT
3
Entering edit mode
@peter-langfelder-4469
Last seen 9 weeks ago
United States

It depends what your RNA seq data is summarized to. If it summarized to genes, you don't necessarily need any probe to gene mapping, that is, you can drop all parts of code referring to GeneAnnotation.

As an aside, I (and my collaborators) like to have some basic information about the genes (e.g., if the gene ID is Ensembl ID, I like to have the gene symbol, name and perhaps Entrez ID). Where to get it depends really on what identifiers your RNA seq is annotated by.

ADD COMMENT
0
Entering edit mode

Dear Dr. Langfelder

Thanks for your comment. for better understanding your mean I present example of my RNA-seq(9 case)  as below:

> head(datExprSTLMS02,3)
                 case[1]  case[2]   case[3]  case[4]   case[5]  case[6]  case[7]  case[8]  case[9]
ENSG00000000003 15.65229 16.47096 17.831792 15.68051 16.828615 16.11393 15.67620 15.52688 16.91999
ENSG00000000005 17.35388 16.49920  8.138663 16.15056  9.519406 19.95802 12.73388 14.13277 11.21169
ENSG00000000419 19.01195 19.60623 19.425689 19.29729 18.552686 19.13429 19.28964 19.17179 19.22746

and head() for annotation file is:

> head(GeneAnnotation_STLMS,2)
  seqname source feature start    end score strand frame         gene_id    gene_name gene_status                          gene_type          havana_gene level
1    chr1 HAVANA    gene 11869  14409     .      +     . ENSG00000223972      DDX11L1       KNOWN transcribed_unprocessed_pseudogene OTTHUMG00000000961.2     2
2    chr1 HAVANA    gene 89295 133723     .      -     . ENSG00000238009 RP11-34P13.7       NOVEL                            lincRNA OTTHUMG00000001096.2     2
   tag full_length exon_length exon_num        first_exon         last_exon canonical_transcript    one_transcript one_transcript_start one_transcript_end
1 <NA>        2541        1735        9 ENSE00002234944.1 ENSE00001863096.1    ENST00000456328.2 ENST00000456328.2                11869              14409
2 <NA>       44429        3726       17 ENSE00003748456.1 ENSE00001846804.1    ENST00000466430.4 ENST00000466430.4                89295             120932

based on that information, I appreciate if you share your comment with me and give me solution for rewrite the code about output file for gene ontology analysis in WGCNA tutorial code.

P.S: I know how to convert Ensemble gene_id to gene name via BiomaRt package.

Best Regards,

Mohammad

ADD REPLY
1
Entering edit mode

Change the line

probes2annot = match(probes,GeneAnnotation$substanceBXH)

to

probes2annot = match(probes,GeneAnnotation$gene_id)

 

You may also want to change

ProbeID=names(datExprFemale),
GeneAnnotation[probes2annot,]

to just

GeneAnnotation[probes2annot, c("gene_id", "gene_name")]

and you can add whichever columns from GeneAnnotation that you want in the output table.

 

 

ADD REPLY
0
Entering edit mode

Dear Professor Langfelder

Thanks for your comment. As you know, in WGCNA analysis for  Gene Enrichment Analysis and also GO study I need "Locus Link ID". based on below code I have to have "Locus Link ID" but in my annotation file(gencode.v22.genes.csv) I don't have this field:

annot = read.csv(file = "GeneAnnotation.csv");

# Match probes in the data set to the probe IDs in the annotation file 
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH)
# Get the corresponding Locuis Link IDs
allLLIDs = annot$LocusLinkID[probes2annot];
# $ Choose interesting modules
intModules = c("brown", "red", "salmon")
for (module in intModules)
{
  # Select module probes
  modGenes = (moduleColors==module)
  # Get their entrez ID codes
  modLLIDs = allLLIDs[modGenes];
  # Write them into a file
  fileName = paste("LocusLinkIDs-", module, ".txt", sep="");
  write.table(as.data.frame(modLLIDs), file = fileName,
              row.names = FALSE, col.names = FALSE)
}
# As background in the enrichment analysis, we will use all probes in the analysis.
fileName = paste("LocusLinkIDs-all.txt", sep="");
write.table(as.data.frame(allLLIDs), file = fileName,
            row.names = FALSE, col.names = FALSE)

I deeply appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLY
1
Entering edit mode

LocusLinkID is another name for Entrez ID. You can change all references to LocusLinkID to Entrez, and you will need the Entrez column in your gene annotation table.

For this particular piece of code, however, you can use any gene identifier you like (or your enrichment software can use). You could use symbol or even the Ensembl IDs.

ADD REPLY
0
Entering edit mode

Dear Professor Langfelder

Thanks for your comment. Now, I used Ensemble gene_id instead Probes and alternate of Locus Link ID used Gene_Symbol. I have to say that I tried to mapping my Ensemble Gene IDs to Entrez IDs by "biomaRt" package but unfortunately my all gene types didn't have one to one mapping. In fact I have 56930 Ensemble IDs but I could find just 35950 Entrez IDs. because of this experience, I used Gene Symbol instead of Entrez ID. So, for GO analysis I ran "GOenrichmentAnalysis" function based on below syntax:

GOenr = GOenrichmentAnalysis(bwModuleColors, allSymbol, organism = "human", nBestP = 10);

But I got this Error:

Error in GOenrichmentAnalysis(bwModuleColors_STLMS, allSymbol, organism = "human",  : 
None of the supplied gene identifiers map to the GO database.
Please make sure you have specified the correct organism (default is human).

Could you please share your comment with me for solving this problem?

Best Regards,

Mohammad

 

ADD REPLY
0
Entering edit mode

Would you please be so kind and read the help page for GOenrichmentAnalysis? The identifiers must be Entrez IDs. Missing Entrez IDs are removed from the enrichment analysis.

ADD REPLY

Login before adding your answer.

Traffic: 544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6