Search
Question: WGCNA with distance matrix only from newick data.
0
13 months ago by
yifangt20
yifangt20 wrote:

Hello group!

While I am trying WGCNA package to get modules of the candidate genes without expression data but distance matrix derived from sequence alignment. The distance matrix was converted from newick file with python script.

The WGCNA package does not require gene expression data. Rather many functions directly apply to an adjacency matrix (or conversely a distance matrix). e.g. dissTOM, networkConcepts, flashClust.

Regarding your question: Intramodular hub genes are equivalent to module eigengenes (as shown in Horvath Dong 2008)

Thus, you can simply represent a module by the most highly connected intramodular node.

Toward this end, you can use the following function from the link
> intramodularConnectivity(adjMat, colors, scaleByMax = FALSE)

but I am still not clear about the next steps following the tutorial stuck from this step:

# MEList = moduleEigengenes(datExpr, colors = dynamicColors)
> MEList <- intramodularConnectivity(distMatrix, colors, scaleByMax = FALSE)
> MEs = MEList$eigengenes # Calculate dissimilarity of module eigengenes > MEDiss <- 1-cor(MEs); # Cluster module eigengenes > METree <- hclust(as.dist(MEDiss), method = "average"); > MEDissThres = 0.25 # Plot the cut line into the dendrogram > abline(h=MEDissThres, col = "red") # Call an automatic merging function > merge = mergeCloseModules(distMatrix, dynamicColors, cutHeight = MEDissThres, verbose = 3) > mergedColors = merge$colors;
# Eigengenes of the new merged modules:
> mergedMEs = merge\$newMEs;
> plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors), c("Dynamic Tree Cut", "Merged dynamic"), dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)
> # dev.off()

To be more specific, my questions are:

1) How do I get adjacency matrix from the distance matrix for intramodularConnectivity() function as, again, I do not have expression data;

2) If distance matrix can be used for adjacency matrix, how do I get  "eigengenes" from  intramodularConnectivity() function which output 4 column of a dataframe that does not have "eigengenes" and feed to the next steps?

3) How to handle another step with mergeCloseModules() which needs datExpr object but I have distance Matrix?

I am aware that the whole issue is to handle distance matrix as input instead of expression data. Appreciate if anyone has experience with similar scenario, which I thought this might be useful for phylogenetic study.

Thanks a lot!
Yifang

modified 13 months ago • written 13 months ago by yifangt20
1
13 months ago by
United States
Peter Langfelder1.6k wrote:

If you only have the distance matrix, you cannot get the expression profile of anything, be it the eigengene or the intramodular hub genes. But you can identify which genes are the intramodular hub genes. Run the function intramodularConnectivity and for each module identify the gene or a few genes with highest intramodular connectivity. Again, the result will be the index of the columns/rows in the adjacency matrix, not the actual expression profiles.

You cannot use mergeCloseModules without expression data.

You won't be able to relate modules to any traits using correlation of module representatives (eigengenes or hub genes) with traits since it requires the actual expression profile.

If you have some kind of a measure of dependence/association between your genes and traits, you can average this in each module to define a measure of module relatedness to a trait.

Thanks Peter!
You clarified many of my questions.
First, about the trait, I have an idea to use the gene functions as trait, say, different group of transcription factors, as discrete input (without replicate etc) and, I am not sure about the adequacy at this moment either.  But, this is not my priority. I am in more need on the techniques to get the intramodular hub genes.

I have tried:

dynamicMods <- cutreeDynamic(dendro = geneTree, distM = rawData, method = "hybrid", deepSplit = 2, minClusterSize = minModuleSize)

to categorize each gene to cluster groups. However, this simple way seems not self-learning to get the modules as of regular WGCNA.

1) How to get the hub genes?

2) How to match the resulted cluster group (from cutreeDynamic) with the hub genes, if this is appropriate?
Thanks!

ADD REPLYlink modified 13 months ago • written 13 months ago by yifangt20

Hello:
I am still working on my distance matrix dataset (from clustalw alignment and tree creation) to get it run.

adjMatrix <-adjacency.fromSimilarity(data.matrix(distMatrix), type = "distance", power = 1)
Error in checkAdjMat(similarity, min, max) :
some entries are not between -1 and 1

The error is clear that my distance matrix is not between -1 and 1. Spent hours searching for conversion between distance matrix to similarity matrix, could not get a clear answer but more confusion. Unfortunately, no example similar to my case as input data is distance matrix only.  Not sure I am doing the right way.
Appreciate any input to help me get out the problem. Thanks a lot!
Yifang

ADD REPLYlink modified 13 months ago • written 13 months ago by yifangt20

The simplest way of turning a distance matrix into a similarity consists of two steps. First, scale the distances to lie between 0 and 1, e.g. by dividing the distances by their maximum or by taking tanh() of the distance matrix (perhaps scaled by some characteristic distance, e.g. mean or 2*mean). Any number of other transformations are also possible. Second, you take 1-scaled distance as similarity. The 1-dist is necessary since similarity is supposed to be near 1 for similar (close) objects, whereas distance of such objects is close to 0.

HTH,

Peter