WGCNA: 1) low soft thresholding power, 2) large modules, 3) best correlation for different types of trait variables
1
0
Entering edit mode
stu111538 • 0
@stu111538-13994
Last seen 4.2 years ago
Germany, Kiel, University Hospital Kiel

Hello, I performed WGCNA on RNA-Seq data of 55 samples and used the code exactly as provided at the WGCNA website for the network analysis of the female mice data. There are three issues I am not sure about:

1) According to the tutorial recommendations I would need to choose a soft thresholding power of 3, since it reaches already R^2 of 0.8 and is also the maximum. However, the power recommendations in the table of the FAQs suggest a power of 6-12 for my sample size. What would you recommend?

enter image description here

enter image description here

2) I am using about 20,000 genes as input, and both the signed and the unsigned network analysis yield 4 or 5 modules containing thousands of genes (the largest module contains 9,000 genes), and about 15 modules containing hundreds of genes. Should I be concerned about the large modules?

3) I want to correlate the gene modules with continuous (BMI), categorial (e.g. smoking habits) and binary variables (e.g. mutation yes/no). What correlation is the best for all types of variables? bicor(x,y, robustY = FALSE, maxPOutliers = 0.05) or simple pearson? Or is a separated correlation according to variable type the best? I have NAs in every kind of variable.

Thank you in advance

WGCNA • 7.4k views
ADD COMMENT
0
Entering edit mode

1) According to the tutorial recommendations I would need to choose a soft thresholding power of 3, since it reaches already R^2 of 0.8 and is also the maximum. However, the power recommendations in the table of the FAQs suggest a power of 6-12 for my sample size. What would you recommend?

A soft thresholdin power of 3 is really low. I would recommend to look at your data (just do a PCA) because you might just have a very strong driver of variation, which explains why you ends up with a module of 9000 genes; perhaps is the smoking habits or other categorical variables that you did not take into account.

I want to correlate the gene modules with continuous (BMI), categorial (e.g. smoking habits) and binary variables (e.g. mutation yes/no). What correlation is the best for all types of variables? bicor(x,y, robustY = FALSE, maxPOutliers = 0.05) or simple pearson? Or is a separated correlation according to variable type the best? I have NAs in every kind of variable.

I would use a pearson for both categorical and continuous variables. NAs should not be a problem

ADD REPLY
2
Entering edit mode
@peter-langfelder-4469
Last seen 4 weeks ago
United States

I'd go with 6 for unsigned or signed hybrid networks, and 12 for signed network. Power 3 is really too low with 55 samples. As Andres mentioned, check the sample clustering tree for large drivers (strong branches); large modules are often the result of having very strong global drivers of expression. For working with categorical variables with more than 2 levels, you may want to read https://peterlangfelder.com/2018/11/25/working-with-categorical-variables/ .

ADD COMMENT
0
Entering edit mode

Thank you for your comments and support! I will check whether there are global drivers of expression. It might just be that those drivers are exactly the variables I am interested in.

ADD REPLY
0
Entering edit mode

I have a similar situation. I am working with 40 samples (20 groupA + 20 groupB). For first part of my analysis, I used DEseq2 to identify DEG between groupA and groupB samples. I then used the vst transformed values of ~16k genes (all protein coding genes filtered on low counts) for WGCNA. The dendrogram of sample and trait relation showed two groups clearly. However, the soft threshold power I got was 4 at 0.8. Using this, I obtained 12 modules of which a single module contained ~10k genes and I also observed it had high negative correlation with my trait of interest. Is getting such large module usual? For module-trait relationship, as the samples were from 2 groups I used 1 for groupA and 2 for groupB. Is this the correct approach? Also, is there a way to tell which modules are related to groupA and which are related to groupB.

ADD REPLY

Login before adding your answer.

Traffic: 432 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6