Question: WGCNA: 1) low soft thresholding power, 2) large modules, 3) best correlation for different types of trait variables
0
gravatar for stu111538
4 weeks ago by
stu1115380
Germany, Kiel, University Hospital Kiel
stu1115380 wrote:

Hello, I performed WGCNA on RNA-Seq data of 55 samples and used the code exactly as provided at the WGCNA website for the network analysis of the female mice data. There are three issues I am not sure about:

1) According to the tutorial recommendations I would need to choose a soft thresholding power of 3, since it reaches already R^2 of 0.8 and is also the maximum. However, the power recommendations in the table of the FAQs suggest a power of 6-12 for my sample size. What would you recommend?

enter image description here

enter image description here

2) I am using about 20,000 genes as input, and both the signed and the unsigned network analysis yield 4 or 5 modules containing thousands of genes (the largest module contains 9,000 genes), and about 15 modules containing hundreds of genes. Should I be concerned about the large modules?

3) I want to correlate the gene modules with continuous (BMI), categorial (e.g. smoking habits) and binary variables (e.g. mutation yes/no). What correlation is the best for all types of variables? bicor(x,y, robustY = FALSE, maxPOutliers = 0.05) or simple pearson? Or is a separated correlation according to variable type the best? I have NAs in every kind of variable.

Thank you in advance

wgcna • 103 views
ADD COMMENTlink modified 4 weeks ago by Peter Langfelder2.2k • written 4 weeks ago by stu1115380

1) According to the tutorial recommendations I would need to choose a soft thresholding power of 3, since it reaches already R^2 of 0.8 and is also the maximum. However, the power recommendations in the table of the FAQs suggest a power of 6-12 for my sample size. What would you recommend?

A soft thresholdin power of 3 is really low. I would recommend to look at your data (just do a PCA) because you might just have a very strong driver of variation, which explains why you ends up with a module of 9000 genes; perhaps is the smoking habits or other categorical variables that you did not take into account.

I want to correlate the gene modules with continuous (BMI), categorial (e.g. smoking habits) and binary variables (e.g. mutation yes/no). What correlation is the best for all types of variables? bicor(x,y, robustY = FALSE, maxPOutliers = 0.05) or simple pearson? Or is a separated correlation according to variable type the best? I have NAs in every kind of variable.

I would use a pearson for both categorical and continuous variables. NAs should not be a problem

ADD REPLYlink written 4 weeks ago by andres.firrincieli20
Answer: WGCNA: 1) low soft thresholding power, 2) large modules, 3) best correlation for
0
gravatar for Peter Langfelder
4 weeks ago by
United States
Peter Langfelder2.2k wrote:

I'd go with 6 for unsigned or signed hybrid networks, and 12 for signed network. Power 3 is really too low with 55 samples. As Andres mentioned, check the sample clustering tree for large drivers (strong branches); large modules are often the result of having very strong global drivers of expression. For working with categorical variables with more than 2 levels, you may want to read https://peterlangfelder.com/2018/11/25/working-with-categorical-variables/ .

ADD COMMENTlink written 4 weeks ago by Peter Langfelder2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 182 users visited in the last hour