Question: loadGSC() Piano package
gravatar for Biorunner88
2.5 years ago by
Biorunner8810 wrote:


I've use limma and I want to use the output of toptable with the piano package.

I'm having a problem with the function loadGSC() because I'm not really sure the input file (it's a gene set).

My data comes from Saccharomyces cervisiae, the genes are in systematic name, I've cheked in here

So, my problem is that I don't know which file I have to use in this case.



ADD COMMENTlink modified 2.5 years ago by Leif Väremo60 • written 2.5 years ago by Biorunner8810
gravatar for Leif Väremo
2.5 years ago by
Leif Väremo60
Leif Väremo60 wrote:

Dear Biorunner88,

The loadGSC function is used to parse a file describing all gene to gene-set associations into the format used by downstream piano functions (like runGSA). In its simplest form it can be a two column text file with gene-sets in one column and genes in the other, and accordingly one gene to gene-set association per line. As you mention it is also possible to use the gmt-format provided by the MSigDB.

It is essential that the gene IDs of your data are the same as the ones used in the gene-set collection (i.e. the file you use as input to loadGSC). If this is not the case, you need to translate one of your set of IDs to match the other. For this you can e.g. use the biomaRt package or the different Bioconductor Annotation Packages (I am not sure where your data comes from but as an example both the yeast2.db package and the org.Sc.sgd.db package provide mappings between various gene IDs).

You don't mention what kind of gene-sets you are interested in analyzing, but note that you can use the packages mentioned above to construct a GO-term gene-set collection, without using the ones from MSigDB (which mainly use, if I am not misinformed, human gene IDs).

I hope this helps!

Best wishes

Leif Väremo

ADD COMMENTlink written 2.5 years ago by Leif Väremo60


I've already read the manual. I believe I'm just going to need the gene onotoly gene set.

I'll try to use the org.Sc.sgd.db package to get the GO in order to create the data.frame needed for this function.


Thanks again.

ADD REPLYlink written 2.5 years ago by Biorunner8810

Is there any problem if there are two or more rows with the same gene and GO id in the dataframe used for the loadGSC function? Or should I delete duplicated rows?


I'm guessing this is due to the possibility that in org.Sc.sgd package the same gene has different evidences and so it appears duplicated.



ADD REPLYlink written 2.5 years ago by Biorunner8810

Sorry for missing your follow-up questions, for some reason I was not notified by the system. Regarding duplicate rows, i.e. same gene AND same gene-set, these will be removed automatically by loadGSC since this information is redundant. If your gene-level data has duplicate gene names (perhaps due to mapping from probesetIDs to gene names), runGSA will use the values from all instances of a given gene, if it is present in the gene-set. If you prefer to instead summarize multiple values for the same gene (e.g. the max or mean) you need to do this prior to feeding your gene-level data to runGSA.

Hope this helps, and sorry again for the delayed answer.

ADD REPLYlink written 2.3 years ago by Leif Väremo60

Thanks for it. I already solved it.

I've another question. I'm reviewing some of what I've done and I saw that after running the function consensusHeatmap with mean, maxmean, gsea, page, median, sum and wilcoxon outputs and with the mean method I get p-values of 0 (0.0000000) or some gene sets. Is this normal for low values? ¿Values under 10^-7 are shown as 0?


Thanks again.

ADD REPLYlink written 2.3 years ago by Biorunner8810
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 262 users visited in the last hour