Entering edit mode
Ian Sudbery
▴
30
@ian-sudbery-5757
Last seen 10.2 years ago
Hi All,
I've been investigating the SPIA package as a more
satisfactory way of assessing whether a pathway is perturbed in a set
of RNA-seq experiments than simple over enrichment analysis on a set
of KEGG pathways. One thing I've noticed is that the choice of
background for the test makes an enormous difference. I firsted
noticed this when converting my ensembl ids to entrez ids
inadvertently led to me including non-coding RNAs in my background
list to the spia all parameter. Removing these (about 5,000 ids) from
the background reduced the number of differential pathways from 51 to
6. Removing genes that we didn't test for differential expression
because they were too lowly expressed reduced it even further (and
annoyingly removing any interesting pathways from the results).
I further realised that really, the background set for
the over-representation analysis part of the test should only include
those genes with a KEGG annotation (only about 5,000 in the case of
humans). I'd have thought that SPIA should do this automatically: it
has access to the list of genes that are in any pathway, but poking
around in the code, it doesn't seem to. I could restrict the
background to the all parameter to only include genes with keg
annotations, but would this be the correct thing to do for the IF
calculation?
Ian
[[alternative HTML version deleted]]