Questions on Pigengene
3
0
Entering edit mode
@botingmning-15082
Last seen 6.8 years ago

I am working on a project trying to extract features from RNAseq data from monkeys that were challenged with Ebola Virus, and to build a classifier that could predict the disease stages of Ebola infection. During the process of literature research, I came across your the Bioconductor package Pigengene, which almost fits my needs perfectly.

However, as I was trying to use the main function, one.step.pigengene(), directly, there showed an error message "power is NA!". After I studied the source code, I found the issues lied in the calculate.beta() function, were a soft-threshold power should be picked to fit a scale free network. I tried to run each step from the WGCNA package sequentially (following your package), while feeding in a random power term, and everything worked fine, even though the results were not satisfying (obviously the net work was not optimized). I started to suspect this might be caused by the data I tried to analyzed (the number of genes included vs. the number of data point).

For information, I have 15 monkeys, with 4 time points each (so total of around 60 datapoints from RNAseq), and I fed around 4000 genes that were significantly differentially expressed into the package.

Any suggestions or help would be really appreciate!!!

Thank you so much!!!

Pigengene WGCNA • 1.6k views
ADD COMMENT
2
Entering edit mode
Habil Zare ▴ 200
@habil-zare-7836
Last seen 13 months ago
United States/Austin Area

WGCNA authors recommend not to exclude any genes from the analysis. I think it is reasonable to filter out the genes with too low coverage and the genes with almost constant expression levels. Using 4,000 genes should be fine.

I have not used DESeq normalization before in this context. Please post your updates when you have results with log(RPKM+1).

ADD COMMENT
0
Entering edit mode

Hi Prof. Zare, I converted all my read counts into log(RPKM+1), but the same issue still persisted... Do you know if there is any other parts I should modify? Thanks a ton!

 

ADD REPLY
0
Entering edit mode

$sft$fitIndices
   Power     SFT.R.sq        slope truncated.R.sq   mean.k. median.k.   max.k.
1      1 6.849859e-01  4.364179088      0.6872780 198.34019 212.50254 216.3609
2      2 4.726176e-01  1.842884552      0.5106602 170.65618 192.17180 199.0158
3      3 3.342839e-01  1.071322555      0.4306177 151.94607 177.01233 186.9238
4      4 1.620569e-01  0.662319574      0.5480922 138.38870 164.78224 177.9557
5      5 9.447289e-02  0.444193524      0.5211303 128.00389 155.61834 170.8953
6      6 5.908899e-02  0.355537818      0.5286616 119.69407 147.34727 165.0949
7      7 2.914369e-02  0.219540486      0.5001622 112.81508 140.42515 160.1689
8      8 2.838860e-02  0.200584411      0.4553509 106.96918 133.65243 155.8773
9      9 1.868442e-02  0.151434225      0.3777199 101.89928 127.73575 152.0640
10    10 1.102951e-02  0.115880938      0.5349902  97.43194 122.07467 148.6239
11    11 1.650998e-04  0.014912252      0.4146315  93.44560 116.85869 145.4831
12    12 2.906552e-06  0.001913254      0.5459920  89.85209 111.99498 142.5883
13    13 1.137779e-04 -0.011870141      0.6282004  86.58545 107.88611 139.8996
14    14 1.581301e-04  0.014310166      0.7863718  83.59502 104.03663 137.3863
15    16 1.755617e-03 -0.051580054      0.6945797  78.29177  97.14252 132.7951
16    18 1.069838e-03 -0.036162626      0.8167061  73.70814  90.68792 128.6728
17    20 8.047458e-04 -0.033666916      0.8537297  69.68587  84.82697 124.9231

 

Here is the results from calculateBetaRes.

 

 

ADD REPLY
1
Entering edit mode
Habil Zare ▴ 200
@habil-zare-7836
Last seen 13 months ago
United States/Austin Area

Do you use RPKM values from RNA-Seq data? Did you transformed the expression values using, say, log(1+ <RPKM>)?

ADD COMMENT
0
Entering edit mode

 I used the DESeq normalization. I would now assume this would not be the perfect normalization methods, correct? (Especially I compared the sample datasets you put in the package). I will try normalization suing log(RPKM+1) and see if that would work.

In the meantime, is there any recommended "number of genes" that are used for the analysis based on the number of samples we got? I am trying to match the condition for our analysis to your publication...

Thanks a ton!

ADD REPLY
0
Entering edit mode
Habil Zare ▴ 200
@habil-zare-7836
Last seen 13 months ago
United States/Austin Area

Can you run the calculate.beta() function with different values for the RsquaredCut argument, say 0:4/20 +0.6? I recommend you choose the highest value that does not lead to an NA value for beta. In Pigengene version 1.5.9, I added RsquaredCut to one.step.pigengene(), arguments. So, when you find an appropriate value for RsquaredCut, you can pass it to the one.step.pigengene() function.

Problem source: WGCNA raises the adjacency matrix to a power (a.k.a beta) before the clustering step. It automatically determines a value for the power by examining several values, and choosing the minimum value that leads to an R^2 index below a predefined cutoff (RsquaredCut). If such a beta value is not found, NA will be returned. See ?WGCNA::pickSoftThreshold for more details. 

ADD COMMENT

Login before adding your answer.

Traffic: 726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6