Search
Question: WGCNA error : Error in hclust
1
gravatar for lwang.daniel
11 weeks ago by
lwang.daniel10
lwang.daniel10 wrote:

Hello, I am using WGCNA to construct co-expression networks.

Scripts:

#!/usr/bin/Rscript
getwd()
workingDir = "."
setwd(workingDir)
library(WGCNA)
options(stringsAsFactors = FALSE)
enableWGCNAThreads()
lnames = load(file ="dataInput.RData")
lnames = load(file ="networkConstruction.RData")
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
softpower = 6
dissTOM = 1 - TOMsimilarityFromExpr(datExpr, power = softpower)
nSelect = 1500
set.seed(10)
select = sample(nGenes, size = nSelect)
selectTOM = dissTOM[select, select]
selectTree = hclust(as.dist(selectTOM), method = "average")
selectColors = moduleColors[select]
plotDiss = selectTOM^7
diag(plotDiss) = NA
pdf(file = "Plots/networkHeatmap.pdf", width = 15, height = 15)
TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes")
dev.off()
save(dissTOM, file = "dissTOM.RData")
MEs = moduleEigengenes(datExpr, moduleColors)$eigengenes
MET = orderMEs(MEs)
pdf(file = "Plots/hubGeneHeatmap.pdf", width = 6, height = 6)
plotEigengeneNetworks(MET, "Eigengene adjacency heatmap", marHeatmap = c(3,4,2,2), plotDendrograms = FALSE, xLabelsAngle = 90)
dev.off()

And I got a error:

Error in hclust(as.dist(selectTOM), method = "average") : 
  NaN dissimilarity value.
Calls: hclust -> .Call
Execution halte

I double checked and I am sure there is non-numeric fpkm data in my input file. Does anyone meet this problem and solved it?

Thank you!

 

ADD COMMENTlink modified 10 weeks ago by Peter Langfelder1.2k • written 11 weeks ago by lwang.daniel10
0
gravatar for Peter Langfelder
11 weeks ago by
United States
Peter Langfelder1.2k wrote:

My guess is that your data contain some non-varying (constant) genes. Try running goodSamplesGenes on datExpr, then check the component allOK of the result. If that is FALSE, you will need to remove genes with zero variance as illustrated in section 1b of Tutorial I of the WGCNA tutorials.

ADD COMMENTlink written 11 weeks ago by Peter Langfelder1.2k

Dear Petter:

       Thank you for your patient explanation.

       I have filtered data already by using goodSamplesGenes, before co-expression network construction.

       Do you have other suggestion?

       Thank you!

       

ADD REPLYlink written 11 weeks ago by lwang.daniel10
0
gravatar for Peter Langfelder
11 weeks ago by
United States
Peter Langfelder1.2k wrote:

Well, I would run

adj = adjacency(datExpr)

and check whether there are any missing or NaN values. If there are any, get theirs rows and/or columns, and check the corresponding columns in datExpr (check their variance).

Also make sure you run the most recent version of WGCNA.

I don't have any other bright ideas, it is hard to debug a problem without the actual data.

ADD COMMENTlink written 11 weeks ago by Peter Langfelder1.2k

Dear Peter:

        Thank you for your suggestion.

        I did find NA from result of adjacency(datExpr).

                            ENSG00000184647.10_1 ENSG00000184650.10_2
ENSG00000000003.14_1         7.008855e-05                   NA
ENSG00000000005.5_1          1.045386e-08                   NA

        And the expression profile is as follow:

ENSG00000184650.10_2 0 0 0 0 0 0 0 0 0.018154 0 0 0 0 0 0 0 0 0 0.021175 0 0 0 0 0 0.016796
ENSG00000184647.10_1 0 0 0 0.001065 0 0.008215 0 0 0.006638 0 0 0.006115 0 0.004287 0 0 0 0 0 0 0 0 0 0 0.008637
ENSG00000000003.14_1 8.347438 6.54812 11.78981 4.094741 2.713539 9.611229 9.865166 2.525487 4.12737 6.10589 7.512624 6.973087 10.63914 6.62466 3.451262 7.860533 5.780477 6.401347 33.63262 9.634759 5.701372 3.655462 1.862599 8.341952 8.780938
ENSG00000000005.5_1 0.005077 0 0 0.005777 0 0 0 0 0.011768 0 0 0 0 0 0 0 0 0 0.089062 0 0 0 0 0 0

The variance between ENSG00000184650.10_2 and ENSG00000000003.14_1 are 32.9416. And the variance between ENSG00000184650.10_2 and ENSG00000000005.5_1 are 0.000173. They are so numeric data. Why did adjacency got NA for these pairs? I use adjacency(datExpr, power = 6) for my data.

How can I solve this problem?

Thank you!

ADD REPLYlink written 11 weeks ago by lwang.daniel10
0
gravatar for Peter Langfelder
10 weeks ago by
United States
Peter Langfelder1.2k wrote:

I assume you have transposed the expression data before you calculate adjacency (WGCNA requires samples in rows and genes in columns).

I took your data, loaded it into R and am getting no missing or NaN results, so your description or code are missing something. Here's the data I have, transposed into genes in columns format.

> dput(data0)
structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0.018154, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0.021175, 0, 0, 0, 0, 0, 0.016796, 0, 0, 0, 0.001065,
0, 0.008215, 0, 0, 0.006638, 0, 0, 0.006115, 0, 0.004287, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0.008637, 8.347438, 6.54812, 11.78981,
4.094741, 2.713539, 9.611229, 9.865166, 2.525487, 4.12737, 6.10589,
7.512624, 6.973087, 10.63914, 6.62466, 3.451262, 7.860533, 5.780477,
6.401347, 33.63262, 9.634759, 5.701372, 3.655462, 1.862599, 8.341952,
8.780938, 0.005077, 0, 0, 0.005777, 0, 0, 0, 0, 0.011768, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0.089062, 0, 0, 0, 0, 0, 0), .Dim = c(25L,
4L), .Dimnames = list(c("V2", "V3", "V4", "V5", "V6", "V7", "V8",
"V9", "V10", "V11", "V12", "V13", "V14", "V15", "V16", "V17",
"V18", "V19", "V20", "V21", "V22", "V23", "V24", "V25", "V26"
), c("ENSG00000184650.10_2 ", "ENSG00000184647.10_1 ", "ENSG00000000003.14_1 ",
"ENSG00000000005.5_1 ")))

> adjacency(data0, power = 6)
                      ENSG00000184650.10_2  ENSG00000184647.10_1
ENSG00000184650.10_2            1.000000000          7.189324e-03
ENSG00000184647.10_1            0.007189324          1.000000e+00
ENSG00000000003.14_1            0.029619245          5.234850e-10
ENSG00000000005.5_1             0.110284643          4.246166e-08
                      ENSG00000000003.14_1  ENSG00000000005.5_1
ENSG00000184650.10_2           2.961925e-02         1.102846e-01
ENSG00000184647.10_1           5.234850e-10         4.246166e-08
ENSG00000000003.14_1           1.000000e+00         4.387697e-01
ENSG00000000005.5_1            4.387697e-01         1.000000e+00

 

 

ADD COMMENTlink written 10 weeks ago by Peter Langfelder1.2k

Dear Peter:

Thank you for your hint. I do transfor the data by using log2(x+1). Is it not suitable for my data? Cause WGCNA suggests to do. Should I use the row fpkm to analyze the data? And Is there any differences between using raw fpkm and log2(x+1)?

Thank you!

ADD REPLYlink written 10 weeks ago by lwang.daniel10
0
gravatar for Peter Langfelder
10 weeks ago by
United States
Peter Langfelder1.2k wrote:

I don't see why log2(x+1) should cause any trouble. It certainly does not cause any trouble with the data I use. Inspect the transformed data - are all entries of the 4 genes you sent finite after the transformation?

ADD COMMENTlink written 10 weeks ago by Peter Langfelder1.2k

Dear Peter:

Thank you for your advice. I check the log(x+1) result. And the fpkm of four genes after this transform are all numeric data.

Do you mind check my script and data? I can give it to you, if you do not mind.

ADD REPLYlink written 10 weeks ago by lwang.daniel10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 137 users visited in the last hour