Question

loadGSC() Piano package

0

Entering edit mode

nonCodingGene ▴ 10

@noncodinggene-7018

Last seen 5.8 years ago

Hi.

I've use limma and I want to use the output of toptable with the piano package.

I'm having a problem with the function loadGSC() because I'm not really sure the input file (it's a gene set).

My data comes from Saccharomyces cervisiae, the genes are in systematic name, I've cheked in here http://www.broadinstitute.org/gsea/msigdb/index.jsp

So, my problem is that I don't know which file I have to use in this case.

Thanks.

piano • 2.3k views

ADD COMMENT • link updated 9.1 years ago by Leif Väremo ▴ 70 • written 9.1 years ago by nonCodingGene ▴ 10

score 0 · Answer 1 · 2015-03-16

0

Entering edit mode

Leif Väremo ▴ 70

@leif-varemo-5897

Last seen 4.6 years ago

Sweden

Dear Biorunner88,

The loadGSC function is used to parse a file describing all gene to gene-set associations into the format used by downstream piano functions (like runGSA). In its simplest form it can be a two column text file with gene-sets in one column and genes in the other, and accordingly one gene to gene-set association per line. As you mention it is also possible to use the gmt-format provided by the MSigDB.

It is essential that the gene IDs of your data are the same as the ones used in the gene-set collection (i.e. the file you use as input to loadGSC). If this is not the case, you need to translate one of your set of IDs to match the other. For this you can e.g. use the biomaRt package or the different Bioconductor Annotation Packages (I am not sure where your data comes from but as an example both the yeast2.db package and the org.Sc.sgd.db package provide mappings between various gene IDs).

You don't mention what kind of gene-sets you are interested in analyzing, but note that you can use the packages mentioned above to construct a GO-term gene-set collection, without using the ones from MSigDB (which mainly use, if I am not misinformed, human gene IDs).

I hope this helps!

Best wishes

Leif Väremo

ADD COMMENT • link 9.1 years ago Leif Väremo ▴ 70

0

Entering edit mode

Thanks.

I've already read the manual. I believe I'm just going to need the gene onotoly gene set.

I'll try to use the org.Sc.sgd.db package to get the GO in order to create the data.frame needed for this function.

Thanks again.

ADD REPLY • link 9.1 years ago nonCodingGene ▴ 10

0

Entering edit mode

Is there any problem if there are two or more rows with the same gene and GO id in the dataframe used for the loadGSC function? Or should I delete duplicated rows?

I'm guessing this is due to the possibility that in org.Sc.sgd package the same gene has different evidences and so it appears duplicated.

Thanks

ADD REPLY • link 9.1 years ago nonCodingGene ▴ 10

0

Entering edit mode

Sorry for missing your follow-up questions, for some reason I was not notified by the system. Regarding duplicate rows, i.e. same gene AND same gene-set, these will be removed automatically by loadGSC since this information is redundant. If your gene-level data has duplicate gene names (perhaps due to mapping from probesetIDs to gene names), runGSA will use the values from all instances of a given gene, if it is present in the gene-set. If you prefer to instead summarize multiple values for the same gene (e.g. the max or mean) you need to do this prior to feeding your gene-level data to runGSA.

Hope this helps, and sorry again for the delayed answer.

ADD REPLY • link 8.9 years ago Leif Väremo ▴ 70

0

Entering edit mode

Thanks for it. I already solved it.

I've another question. I'm reviewing some of what I've done and I saw that after running the function consensusHeatmap with mean, maxmean, gsea, page, median, sum and wilcoxon outputs and with the mean method I get p-values of 0 (0.0000000) or some gene sets. Is this normal for low values? ¿Values under 10^-7 are shown as 0?

Thanks again.

ADD REPLY • link 8.9 years ago nonCodingGene ▴ 10