Question: WGCNA: 1) low soft thresholding power, 2) large modules, 3) best correlation for different types of trait variables
0
gravatar for stu111538
12 weeks ago by
stu1115380
Germany, Kiel, University Hospital Kiel
stu1115380 wrote:

Hello, I performed WGCNA on RNA-Seq data of 55 samples and used the code exactly as provided at the WGCNA website for the network analysis of the female mice data. There are three issues I am not sure about:

1) According to the tutorial recommendations I would need to choose a soft thresholding power of 3, since it reaches already R^2 of 0.8 and is also the maximum. However, the power recommendations in the table of the FAQs suggest a power of 6-12 for my sample size. What would you recommend?

enter image description here

enter image description here

2) I am using about 20,000 genes as input, and both the signed and the unsigned network analysis yield 4 or 5 modules containing thousands of genes (the largest module contains 9,000 genes), and about 15 modules containing hundreds of genes. Should I be concerned about the large modules?

3) I want to correlate the gene modules with continuous (BMI), categorial (e.g. smoking habits) and binary variables (e.g. mutation yes/no). What correlation is the best for all types of variables? bicor(x,y, robustY = FALSE, maxPOutliers = 0.05) or simple pearson? Or is a separated correlation according to variable type the best? I have NAs in every kind of variable.

Thank you in advance

wgcna • 249 views
ADD COMMENTlink modified 12 weeks ago by Peter Langfelder2.3k • written 12 weeks ago by stu1115380

1) According to the tutorial recommendations I would need to choose a soft thresholding power of 3, since it reaches already R^2 of 0.8 and is also the maximum. However, the power recommendations in the table of the FAQs suggest a power of 6-12 for my sample size. What would you recommend?

A soft thresholdin power of 3 is really low. I would recommend to look at your data (just do a PCA) because you might just have a very strong driver of variation, which explains why you ends up with a module of 9000 genes; perhaps is the smoking habits or other categorical variables that you did not take into account.

I want to correlate the gene modules with continuous (BMI), categorial (e.g. smoking habits) and binary variables (e.g. mutation yes/no). What correlation is the best for all types of variables? bicor(x,y, robustY = FALSE, maxPOutliers = 0.05) or simple pearson? Or is a separated correlation according to variable type the best? I have NAs in every kind of variable.

I would use a pearson for both categorical and continuous variables. NAs should not be a problem

ADD REPLYlink written 12 weeks ago by andres.firrincieli30
Answer: WGCNA: 1) low soft thresholding power, 2) large modules, 3) best correlation for
1
gravatar for Peter Langfelder
12 weeks ago by
United States
Peter Langfelder2.3k wrote:

I'd go with 6 for unsigned or signed hybrid networks, and 12 for signed network. Power 3 is really too low with 55 samples. As Andres mentioned, check the sample clustering tree for large drivers (strong branches); large modules are often the result of having very strong global drivers of expression. For working with categorical variables with more than 2 levels, you may want to read https://peterlangfelder.com/2018/11/25/working-with-categorical-variables/ .

ADD COMMENTlink written 12 weeks ago by Peter Langfelder2.3k

Thank you for your comments and support! I will check whether there are global drivers of expression. It might just be that those drivers are exactly the variables I am interested in.

ADD REPLYlink written 7 weeks ago by stu1115380

I have a similar situation. I am working with 40 samples (20 groupA + 20 groupB). For first part of my analysis, I used DEseq2 to identify DEG between groupA and groupB samples. I then used the vst transformed values of ~16k genes (all protein coding genes filtered on low counts) for WGCNA. The dendrogram of sample and trait relation showed two groups clearly. However, the soft threshold power I got was 4 at 0.8. Using this, I obtained 12 modules of which a single module contained ~10k genes and I also observed it had high negative correlation with my trait of interest. Is getting such large module usual? For module-trait relationship, as the samples were from 2 groups I used 1 for groupA and 2 for groupB. Is this the correct approach? Also, is there a way to tell which modules are related to groupA and which are related to groupB.

ADD REPLYlink modified 28 days ago • written 28 days ago by ag1805x20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 224 users visited in the last hour