Question: SPIA::spia function output
1
gravatar for theobroma22
2.7 years ago by
theobroma2210
theobroma2210 wrote:

Dear SPIA package maintainer,

My name is Franklin Johnson. I think of myself as a data scientist, and I have no professional affiliation at this moment with regard to this post. 

I downloaded the 134 Malus domestica ('mdm') pathways using the KEGGREST package and successfully made the spia data using the makeSPIAdata() function. Then, I called the spia() function, and after it completed it only showed 10 pathways that were 'Done.'

   > length(dir(mydir)) #Directory containing 134 Malus kgml/xml pathway files.
   [1] 134

 

My spia input vectors 'de' and 'all' contained 243 unique probesets, and I took the mean logFC for each probeset. This dataset was a subset of the topTable limma result of adj.P.Value < 0.001. 

    res<-spia(de=DE_malus, all=entrez_only, organism="mdm", nB = 2000, pathids = NULL, data.dir="./", combine =         'fisher', plots = TRUE)

    Done pathway 1 : RNA transport..
    Done pathway 2 : RNA degradation..
    Done pathway 3 : MAPK signaling pathway - plant..
    Done pathway 4 : Plant hormone signal transduct..
    Done pathway 5 : Sulfur relay system..
    Done pathway 6 : SNARE interactions in vesicula..
    Done pathway 7 : Autophagy..
    Done pathway 8 : Protein processing in endoplas..
    Done pathway 9 : Plant-pathogen interaction..
    Done pathway 10 : Circadian rhythm - plant..> 

    res[ , -12]
                                         Name    ID pSize NDE pNDE          tA pPERT        pG pGFdr pGFWER    Status
1              MAPK signaling pathway - plant 04016     4   4    1 24.29737166 0.160 0.4532130     1      1 Activated
2           Plant hormone signal transduction 04075    14  14    1 11.93279398 0.292 0.6514524     1      1 Activated
3                    Circadian rhythm - plant 04712     3   3    1  9.17881852 0.440 0.8012314     1      1 Activated
4                  Plant-pathogen interaction 04626     2   2    1  0.02234003 0.987 0.9999151     1      1 Activated
5 Protein processing in endoplasmic reticulum 04141     1   1    1  0.00000000    NA 1.0000000     1      1 Inhibited

While these results are exciting, I have four questions to make sure this output is correct:

1) What is the printed 'Done' pathway list, and why does it print only 10 pathways rather than all 134 pathways?

2) For res[5, ] (Protein processing in endoplasmic reticulum), why is this pathway in the list although pPERT = 'NA.'   So, how come the other 'Done' pathways (e.g. RNA transport, RNA degradation, Autophagy....) do not show up in the res output? Is this Protein processing in endoplasmic reticulum pathway considered significant if tA = 0 and pG = 1?

3) I didn't expect pSize and NDE to be equal to each other, and so all the pNDEs are equal to 1...Why?

4)  The fold change values for the 243 probe dataset range from +3.25 to -4.38. Since I'm starting with gene probesets with an adj.P.Value < 0.001, I lowered the nB value to say 100, but the results are identical as if I used nB =2000. Why? 

I'm trying to understand this SPIA package in a bit more detail than explained in the vignette and ref manual.   Hope to hear from you soon.                                                                                                                                   Thanks,                                                                                                                                                                 Franklin

spia • 601 views
ADD COMMENTlink modified 2.7 years ago by Tarca, Adi570 • written 2.7 years ago by theobroma2210
Answer: SPIA::spia function output
4
gravatar for Tarca, Adi
2.7 years ago by
Tarca, Adi570
United States
Tarca, Adi570 wrote:

>Dear SPIA package maintainer,

My name is Franklin Johnson. I think of myself as a data scientist, and I have no professional affiliation at this moment with regard to this post. I downloaded the 134 Malus domestica ('mdm') pathways using the KEGGREST package and successfully made the spia data using the makeSPIAdata() function. Then, I called the spia() function, and after it completed it only showed 10 pathways that were 'Done.'

   > length(dir(mydir)) #Directory containing 134 Malus kgml/xml pathway files.
   [1] 134<

Hi Franklin,

There are only 10 pathways processed by the function makeSPIAdata()  since those have enough genes connected with the types of relations that SPIA uses for analysis (see vignette and paper for more details).

 

>My spia input vectors 'de' and 'all' contained 243 unique probesets, and I took the mean logFC for each probeset. This dataset was a subset of the topTable limma result of adj.P.Value < 0.001. <

This is the main issue you are having.  'de' and 'all' being the same defies the purpose of enrichment analysis. 'de' should be a subset of 'all' (usually a small percentage.

 

  >  res<-spia(de=DE_malus, all=entrez_only, organism="mdm", nB = 2000, pathids = NULL, data.dir="./", combine =         'fisher', plots = TRUE)

    Done pathway 1 : RNA transport..
    Done pathway 2 : RNA degradation..
    

    res[ , -12]
                                         Name    ID pSize NDE pNDE          tA pPERT        pG pGFdr pGFWER    Status
1              MAPK signaling pathway - plant 04016     4   4    1 24.29737166 0.160 0.4532130     1      1 Activated
2           Plant hormone signal transduction 04075    14  14    1 11.93279398 0.292 0.6514524     1      1 Activated
3                    Circadian rhythm - plant 04712     3   3    1  9.17881852 0.440 0.8012314     1      1 Activated
4                  Plant-pathogen interaction 04626     2   2    1  0.02234003 0.987 0.9999151     1      1 Activated
5 Protein processing in endoplasmic reticulum 04141     1   1    1  0.00000000    NA 1.0000000     1      1 Inhibited

While these results are exciting, I have four questions to make sure this output is correct:

1) What is the printed 'Done' pathway list, and why does it print only 10 pathways rather than all 134 pathways?

See answer above.

>2) For res[5, ] (Protein processing in endoplasmic reticulum), why is this pathway in the list although pPERT = 'NA.'  <

Again, is an issue of 'de' being same as 'all'. 

>So, how come the other 'Done' pathways (e.g. RNA transport, RNA degradation, Autophagy....) do not show up in the res output?<

Because it may be a valid pathway (to have been processed by makeSPIAdata) but has no hits in the current 'de' list.

>Is this Protein processing in endoplasmic reticulum pathway considered significant if tA = 0 and pG = 1?<

There is no point looking at the results obtained with de=all

 

>3) I didn't expect pSize and NDE to be equal to each other, and so all the pNDEs are equal to 1...Why?<

de=all

4)  The fold change values for the 243 probe dataset range from +3.25 to -4.38. Since I'm starting with gene probesets with an adj.P.Value < 0.001, I lowered the nB value to say 100, but the results are identical as if I used nB =2000. Why? 

de=all

Hope this helps,

Adi Tarca

ADD COMMENTlink written 2.7 years ago by Tarca, Adi570

Hi Adi,

Thanks for the reply. I made the adjustments accordingly. However, I'm still getting an 'NA' for the pPERT value for Protein processing in the endoplasmic reticulum, which is now ranked as the top pathway. Please see below:

    > head(cp_mean) #df of unique DEs with Q.Value < 0.05
    ENTREZ      logFC    Q.Value
    1  13630213  1.8062437 0.02197336
    2  29071792  1.0768734 0.03514726
    3 103400014 -0.9435437 0.01468803
    4 103400027 -0.7126018 0.03260799
    5 103400049 -1.0686249 0.01136842
    6 103400377  0.6382776 0.02896627
    > dim(cp_mean)
    [1] 1628    3

    > length(DE_CP)  #vector of unique DEs with Q.Value < 0.01                                                                                                 [1] 233

    > length(Entrez_CP) #vector of all Entrez IDs with Q.Value < 0.05 from 'cp_mean' 
    [1] 1628

    > res[,-12]
                                      Name    ID pSize NDE       pNDE        tA pPERT         pG     pGFdr    pGFWER    Status
1 Protein processing in ...ER 04141    11   4 0.05938292  0.000000    NA 0.05938292 0.2375317 0.2375317 Inhibited
2                  Plant-pathogen interaction 04626    15   3 0.36566850 -1.026967 0.721 0.61512645 0.9532347 1.0000000 Inhibited
3           Plant hormone signal transduction 04075    34   4 0.73974950 -1.475442 0.682 0.84967883 0.9532347 1.0000000 Inhibited
4                             RNA degradation 03018     8   1 0.71019445  0.000000 1.000 0.95323469 0.9532347 1.0000000 Inhibited

I adjusted the name to "Protein processing in ... ER" just now so that it fit this screen in one line.

What's going on with this pathway's pPERT value??  

Franklin

     

 

 

ADD REPLYlink written 2.7 years ago by theobroma2210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 299 users visited in the last hour