Question: Questions on Pigengene
0
20 months ago by
boting.m.ning0 wrote:

I am working on a project trying to extract features from RNAseq data from monkeys that were challenged with Ebola Virus, and to build a classifier that could predict the disease stages of Ebola infection. During the process of literature research, I came across your the Bioconductor package Pigengene, which almost fits my needs perfectly.

However, as I was trying to use the main function, one.step.pigengene(), directly, there showed an error message "power is NA!". After I studied the source code, I found the issues lied in the calculate.beta() function, were a soft-threshold power should be picked to fit a scale free network. I tried to run each step from the WGCNA package sequentially (following your package), while feeding in a random power term, and everything worked fine, even though the results were not satisfying (obviously the net work was not optimized). I started to suspect this might be caused by the data I tried to analyzed (the number of genes included vs. the number of data point).

For information, I have 15 monkeys, with 4 time points each (so total of around 60 datapoints from RNAseq), and I fed around 4000 genes that were significantly differentially expressed into the package.

Any suggestions or help would be really appreciate!!!

Thank you so much!!!

wgcna pigengene • 384 views
modified 19 months ago by Habil Zare170 • written 20 months ago by boting.m.ning0
2
20 months ago by
Habil Zare170
United States/Austin Area
Habil Zare170 wrote:

WGCNA authors recommend not to exclude any genes from the analysis. I think it is reasonable to filter out the genes with too low coverage and the genes with almost constant expression levels. Using 4,000 genes should be fine.

I have not used DESeq normalization before in this context. Please post your updates when you have results with log(RPKM+1).

Hi Prof. Zare, I converted all my read counts into log(RPKM+1), but the same issue still persisted... Do you know if there is any other parts I should modify? Thanks a ton!

$sft$fitIndices
Power     SFT.R.sq        slope truncated.R.sq   mean.k. median.k.   max.k.
1      1 6.849859e-01  4.364179088      0.6872780 198.34019 212.50254 216.3609
2      2 4.726176e-01  1.842884552      0.5106602 170.65618 192.17180 199.0158
3      3 3.342839e-01  1.071322555      0.4306177 151.94607 177.01233 186.9238
4      4 1.620569e-01  0.662319574      0.5480922 138.38870 164.78224 177.9557
5      5 9.447289e-02  0.444193524      0.5211303 128.00389 155.61834 170.8953
6      6 5.908899e-02  0.355537818      0.5286616 119.69407 147.34727 165.0949
7      7 2.914369e-02  0.219540486      0.5001622 112.81508 140.42515 160.1689
8      8 2.838860e-02  0.200584411      0.4553509 106.96918 133.65243 155.8773
9      9 1.868442e-02  0.151434225      0.3777199 101.89928 127.73575 152.0640
10    10 1.102951e-02  0.115880938      0.5349902  97.43194 122.07467 148.6239
11    11 1.650998e-04  0.014912252      0.4146315  93.44560 116.85869 145.4831
12    12 2.906552e-06  0.001913254      0.5459920  89.85209 111.99498 142.5883
13    13 1.137779e-04 -0.011870141      0.6282004  86.58545 107.88611 139.8996
14    14 1.581301e-04  0.014310166      0.7863718  83.59502 104.03663 137.3863
15    16 1.755617e-03 -0.051580054      0.6945797  78.29177  97.14252 132.7951
16    18 1.069838e-03 -0.036162626      0.8167061  73.70814  90.68792 128.6728
17    20 8.047458e-04 -0.033666916      0.8537297  69.68587  84.82697 124.9231

Here is the results from calculateBetaRes.

1
20 months ago by
Habil Zare170
United States/Austin Area
Habil Zare170 wrote:

Do you use RPKM values from RNA-Seq data? Did you transformed the expression values using, say, log(1+ <RPKM>)?

I used the DESeq normalization. I would now assume this would not be the perfect normalization methods, correct? (Especially I compared the sample datasets you put in the package). I will try normalization suing log(RPKM+1) and see if that would work.

In the meantime, is there any recommended "number of genes" that are used for the analysis based on the number of samples we got? I am trying to match the condition for our analysis to your publication...

Thanks a ton!

0
19 months ago by
Habil Zare170
United States/Austin Area
Habil Zare170 wrote:

Can you run the calculate.beta() function with different values for the RsquaredCut argument, say 0:4/20 +0.6? I recommend you choose the highest value that does not lead to an NA value for beta. In Pigengene version 1.5.9, I added RsquaredCut to one.step.pigengene(), arguments. So, when you find an appropriate value for RsquaredCut, you can pass it to the one.step.pigengene() function.

Problem source: WGCNA raises the adjacency matrix to a power (a.k.a beta) before the clustering step. It automatically determines a value for the power by examining several values, and choosing the minimum value that leads to an R^2 index below a predefined cutoff (RsquaredCut). If such a beta value is not found, NA will be returned. See ?WGCNA::pickSoftThreshold for more details.