Question

Questions on Pigengene

0

Entering edit mode

boting.m.ning • 0

@botingmning-15082

Last seen 6.2 years ago

I am working on a project trying to extract features from RNAseq data from monkeys that were challenged with Ebola Virus, and to build a classifier that could predict the disease stages of Ebola infection. During the process of literature research, I came across your the Bioconductor package Pigengene, which almost fits my needs perfectly.

However, as I was trying to use the main function, one.step.pigengene(), directly, there showed an error message "power is NA!". After I studied the source code, I found the issues lied in the calculate.beta() function, were a soft-threshold power should be picked to fit a scale free network. I tried to run each step from the WGCNA package sequentially (following your package), while feeding in a random power term, and everything worked fine, even though the results were not satisfying (obviously the net work was not optimized). I started to suspect this might be caused by the data I tried to analyzed (the number of genes included vs. the number of data point).

For information, I have 15 monkeys, with 4 time points each (so total of around 60 datapoints from RNAseq), and I fed around 4000 genes that were significantly differentially expressed into the package.

Any suggestions or help would be really appreciate!!!

Thank you so much!!!

Pigengene WGCNA • 1.4k views

ADD COMMENT • link updated 6.1 years ago by Habil Zare ▴ 200 • written 6.2 years ago by boting.m.ning • 0

score 2 · Answer 1 · 2018-02-22

2

Entering edit mode

Habil Zare ▴ 200

@habil-zare-7836

Last seen 5 months ago

United States/Austin Area

WGCNA authors recommend not to exclude any genes from the analysis. I think it is reasonable to filter out the genes with too low coverage and the genes with almost constant expression levels. Using 4,000 genes should be fine.

I have not used DESeq normalization before in this context. Please post your updates when you have results with log(RPKM+1).

ADD COMMENT • link 6.2 years ago Habil Zare ▴ 200

0

Entering edit mode

Hi Prof. Zare, I converted all my read counts into log(RPKM+1), but the same issue still persisted... Do you know if there is any other parts I should modify? Thanks a ton!

ADD REPLY • link 6.2 years ago boting.m.ning • 0

0

Entering edit mode

$sft$fitIndices
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 6.849859e-01 4.364179088 0.6872780 198.34019 212.50254 216.3609
2 2 4.726176e-01 1.842884552 0.5106602 170.65618 192.17180 199.0158
3 3 3.342839e-01 1.071322555 0.4306177 151.94607 177.01233 186.9238
4 4 1.620569e-01 0.662319574 0.5480922 138.38870 164.78224 177.9557
5 5 9.447289e-02 0.444193524 0.5211303 128.00389 155.61834 170.8953
6 6 5.908899e-02 0.355537818 0.5286616 119.69407 147.34727 165.0949
7 7 2.914369e-02 0.219540486 0.5001622 112.81508 140.42515 160.1689
8 8 2.838860e-02 0.200584411 0.4553509 106.96918 133.65243 155.8773
9 9 1.868442e-02 0.151434225 0.3777199 101.89928 127.73575 152.0640
10 10 1.102951e-02 0.115880938 0.5349902 97.43194 122.07467 148.6239
11 11 1.650998e-04 0.014912252 0.4146315 93.44560 116.85869 145.4831
12 12 2.906552e-06 0.001913254 0.5459920 89.85209 111.99498 142.5883
13 13 1.137779e-04 -0.011870141 0.6282004 86.58545 107.88611 139.8996
14 14 1.581301e-04 0.014310166 0.7863718 83.59502 104.03663 137.3863
15 16 1.755617e-03 -0.051580054 0.6945797 78.29177 97.14252 132.7951
16 18 1.069838e-03 -0.036162626 0.8167061 73.70814 90.68792 128.6728
17 20 8.047458e-04 -0.033666916 0.8537297 69.68587 84.82697 124.9231

Here is the results from calculateBetaRes.

ADD REPLY • link 6.2 years ago boting.m.ning • 0

score 1 · Answer 2 · 2018-02-22

1

Entering edit mode

Habil Zare ▴ 200

@habil-zare-7836

Last seen 5 months ago

United States/Austin Area

Do you use RPKM values from RNA-Seq data? Did you transformed the expression values using, say, log(1+ <RPKM>)?

ADD COMMENT • link 6.2 years ago Habil Zare ▴ 200

0

Entering edit mode

I used the DESeq normalization. I would now assume this would not be the perfect normalization methods, correct? (Especially I compared the sample datasets you put in the package). I will try normalization suing log(RPKM+1) and see if that would work.

In the meantime, is there any recommended "number of genes" that are used for the analysis based on the number of samples we got? I am trying to match the condition for our analysis to your publication...

Thanks a ton!

ADD REPLY • link 6.2 years ago boting.m.ning • 0

score 0 · Answer 3 · 2018-03-12

Can you run the calculate.beta() function with different values for the RsquaredCut argument, say 0:4/20 +0.6? I recommend you choose the highest value that does not lead to an NA value for beta. In Pigengene version 1.5.9, I added RsquaredCut to one.step.pigengene(), arguments. So, when you find an appropriate value for RsquaredCut, you can pass it to the one.step.pigengene() function.

Problem source: WGCNA raises the adjacency matrix to a power (a.k.a beta) before the clustering step. It automatically determines a value for the power by examining several values, and choosing the minimum value that leads to an R^2 index below a predefined cutoff (RsquaredCut). If such a beta value is not found, NA will be returned. See ?WGCNA::pickSoftThreshold for more details.