Hey community members,
I have a question using GSVA for Nanostring data (789 targeted genes per sample). I know previously there was discussion here about using GSVA for Nanostring, and the big thing is to decide appropriate argument for kcdf (https://support.bioconductor.org/p/111096/).
My question is: 1) In literatures, I have seen people using both GSVA and ssGSEA for Nanostring and I am not sure why. I am wondering if there is a preferred method for targeted gene panel in general. 2) I used DESeq2 in my pipeline. Interestingly I got very different GSVA results when using normalized counts as input (with kcdf = "Poisson") versus when using rlog as inputs (with kcdf = "Gaussian") . I am wondering how to explain the discrepancy and which method is better.
Many thanks for the kind help.
Thank you for the kind reply Robert. This is tremendously helpful.
As you sharply pointed out, I think the issue may be that I have a small sample size (n=10). In this case, I will look into ssGSEA which may give more robust result.
Thanks for sharing the previous post, I will try to calculate r score myself first before I bother you.
Thanks for pointing out the latest preprint article on Nanostring. I actually used the method described in this article for my analysis, and have communicated closely with the author. Their results for gene DE analysis worked beautifully, and I am just wondering how to proceed with pathway analysis therefore looked into GSVA.
May I ask a separate question? In the gsva(), the argument min.size has the explanation of "Minimum size of the resulting gene sets". However from my testing, it appears min.size defined the minimal overlap genes in a given gene sets. So for example if I set a min.size =3, only those gene sets with at least 3 genes overlapping with my input data will be included. Is this understanding correct?
Deeply appreciate the kind assistance.
Yes, your understanding is correct. First, genes in gene sets are mapped to genes in the expression data, which implies that some gene sets may lose genes for which there are not expression profiles, even some of those gene sets might become empty. Second, gene sets are filtered by the given minimum and maximum sizes, which by default are set to 1 and infinity, respectively. If you are satisfied with the answer, it's a good practice to upvote it and accept it. This not only shows appreciation for the work of developers giving support to their software, but also helps others to more easily identify questions that have been already answered. Thanks!
Thank you so much Robert! This is my first question in the forum and deeply appreciate you guidance. Just learnt how to upvote and accept! Thanks for all the answers!