WGCNA error : Error in hclust
4
1
Entering edit mode
lwang.daniel ▴ 10
@lwangdaniel-12125
Last seen 7.2 years ago

Hello, I am using WGCNA to construct co-expression networks.

Scripts:

#!/usr/bin/Rscript
getwd()
workingDir = "."
setwd(workingDir)
library(WGCNA)
options(stringsAsFactors = FALSE)
enableWGCNAThreads()
lnames = load(file ="dataInput.RData")
lnames = load(file ="networkConstruction.RData")
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
softpower = 6
dissTOM = 1 - TOMsimilarityFromExpr(datExpr, power = softpower)
nSelect = 1500
set.seed(10)
select = sample(nGenes, size = nSelect)
selectTOM = dissTOM[select, select]
selectTree = hclust(as.dist(selectTOM), method = "average")
selectColors = moduleColors[select]
plotDiss = selectTOM^7
diag(plotDiss) = NA
pdf(file = "Plots/networkHeatmap.pdf", width = 15, height = 15)
TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes")
dev.off()
save(dissTOM, file = "dissTOM.RData")
MEs = moduleEigengenes(datExpr, moduleColors)$eigengenes
MET = orderMEs(MEs)
pdf(file = "Plots/hubGeneHeatmap.pdf", width = 6, height = 6)
plotEigengeneNetworks(MET, "Eigengene adjacency heatmap", marHeatmap = c(3,4,2,2), plotDendrograms = FALSE, xLabelsAngle = 90)
dev.off()

And I got a error:

Error in hclust(as.dist(selectTOM), method = "average") : 
  NaN dissimilarity value.
Calls: hclust -> .Call
Execution halte

I double checked and I am sure there is non-numeric fpkm data in my input file. Does anyone meet this problem and solved it?

Thank you!

 

WGCNA • 4.3k views
ADD COMMENT
0
Entering edit mode
@peter-langfelder-4469
Last seen 1 day ago
United States

My guess is that your data contain some non-varying (constant) genes. Try running goodSamplesGenes on datExpr, then check the component allOK of the result. If that is FALSE, you will need to remove genes with zero variance as illustrated in section 1b of Tutorial I of the WGCNA tutorials.

ADD COMMENT
0
Entering edit mode

Dear Petter:

       Thank you for your patient explanation.

       I have filtered data already by using goodSamplesGenes, before co-expression network construction.

       Do you have other suggestion?

       Thank you!

       

ADD REPLY
0
Entering edit mode
@peter-langfelder-4469
Last seen 1 day ago
United States

Well, I would run

adj = adjacency(datExpr)

and check whether there are any missing or NaN values. If there are any, get theirs rows and/or columns, and check the corresponding columns in datExpr (check their variance).

Also make sure you run the most recent version of WGCNA.

I don't have any other bright ideas, it is hard to debug a problem without the actual data.

ADD COMMENT
0
Entering edit mode

Dear Peter:

        Thank you for your suggestion.

        I did find NA from result of adjacency(datExpr).

                            ENSG00000184647.10_1 ENSG00000184650.10_2
ENSG00000000003.14_1         7.008855e-05                   NA
ENSG00000000005.5_1          1.045386e-08                   NA

        And the expression profile is as follow:

ENSG00000184650.10_2 0 0 0 0 0 0 0 0 0.018154 0 0 0 0 0 0 0 0 0 0.021175 0 0 0 0 0 0.016796
ENSG00000184647.10_1 0 0 0 0.001065 0 0.008215 0 0 0.006638 0 0 0.006115 0 0.004287 0 0 0 0 0 0 0 0 0 0 0.008637
ENSG00000000003.14_1 8.347438 6.54812 11.78981 4.094741 2.713539 9.611229 9.865166 2.525487 4.12737 6.10589 7.512624 6.973087 10.63914 6.62466 3.451262 7.860533 5.780477 6.401347 33.63262 9.634759 5.701372 3.655462 1.862599 8.341952 8.780938
ENSG00000000005.5_1 0.005077 0 0 0.005777 0 0 0 0 0.011768 0 0 0 0 0 0 0 0 0 0.089062 0 0 0 0 0 0

The variance between ENSG00000184650.10_2 and ENSG00000000003.14_1 are 32.9416. And the variance between ENSG00000184650.10_2 and ENSG00000000005.5_1 are 0.000173. They are so numeric data. Why did adjacency got NA for these pairs? I use adjacency(datExpr, power = 6) for my data.

How can I solve this problem?

Thank you!

ADD REPLY
0
Entering edit mode
@peter-langfelder-4469
Last seen 1 day ago
United States

I assume you have transposed the expression data before you calculate adjacency (WGCNA requires samples in rows and genes in columns).

I took your data, loaded it into R and am getting no missing or NaN results, so your description or code are missing something. Here's the data I have, transposed into genes in columns format.

> dput(data0)
structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0.018154, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0.021175, 0, 0, 0, 0, 0, 0.016796, 0, 0, 0, 0.001065,
0, 0.008215, 0, 0, 0.006638, 0, 0, 0.006115, 0, 0.004287, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0.008637, 8.347438, 6.54812, 11.78981,
4.094741, 2.713539, 9.611229, 9.865166, 2.525487, 4.12737, 6.10589,
7.512624, 6.973087, 10.63914, 6.62466, 3.451262, 7.860533, 5.780477,
6.401347, 33.63262, 9.634759, 5.701372, 3.655462, 1.862599, 8.341952,
8.780938, 0.005077, 0, 0, 0.005777, 0, 0, 0, 0, 0.011768, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0.089062, 0, 0, 0, 0, 0, 0), .Dim = c(25L,
4L), .Dimnames = list(c("V2", "V3", "V4", "V5", "V6", "V7", "V8",
"V9", "V10", "V11", "V12", "V13", "V14", "V15", "V16", "V17",
"V18", "V19", "V20", "V21", "V22", "V23", "V24", "V25", "V26"
), c("ENSG00000184650.10_2 ", "ENSG00000184647.10_1 ", "ENSG00000000003.14_1 ",
"ENSG00000000005.5_1 ")))

> adjacency(data0, power = 6)
                      ENSG00000184650.10_2  ENSG00000184647.10_1
ENSG00000184650.10_2            1.000000000          7.189324e-03
ENSG00000184647.10_1            0.007189324          1.000000e+00
ENSG00000000003.14_1            0.029619245          5.234850e-10
ENSG00000000005.5_1             0.110284643          4.246166e-08
                      ENSG00000000003.14_1  ENSG00000000005.5_1
ENSG00000184650.10_2           2.961925e-02         1.102846e-01
ENSG00000184647.10_1           5.234850e-10         4.246166e-08
ENSG00000000003.14_1           1.000000e+00         4.387697e-01
ENSG00000000005.5_1            4.387697e-01         1.000000e+00

 

 

ADD COMMENT
0
Entering edit mode

Dear Peter:

Thank you for your hint. I do transfor the data by using log2(x+1). Is it not suitable for my data? Cause WGCNA suggests to do. Should I use the row fpkm to analyze the data? And Is there any differences between using raw fpkm and log2(x+1)?

Thank you!

ADD REPLY
0
Entering edit mode
@peter-langfelder-4469
Last seen 1 day ago
United States

I don't see why log2(x+1) should cause any trouble. It certainly does not cause any trouble with the data I use. Inspect the transformed data - are all entries of the 4 genes you sent finite after the transformation?

ADD COMMENT
0
Entering edit mode

Dear Peter:

Thank you for your advice. I check the log(x+1) result. And the fpkm of four genes after this transform are all numeric data.

Do you mind check my script and data? I can give it to you, if you do not mind.

ADD REPLY

Login before adding your answer.

Traffic: 893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6