Cleaning GeneSets from GSEA
1
0
Entering edit mode
@0693d951
Last seen 5 weeks ago
Italy

Hi everyone,

I'm running a GSEA procedure on R (in particular I am using the GSVA package). I have downloaded the lists of genes composing the gene sets from the MSigDB using the package msigdbr.

Anyway, if I extract the gene symbols for a gene set from the downloaded object, they are not unique. It happens for example that genes with the same symbol and EntrezID have different EnsemblID, hence they are listed as different.

How should I deal with these when using the gsva function? I have gene symbols as rows in my expression matrix, thus I can't match the different EnsemblIDs. If I keep duplicates this would be a sort of increased weight for the considered genes, leading to a slightly skew distribution for the enrichment scores. Is it safe to delete duplicates or am I losing relevant information?

msigdb GSEABase GSVA • 135 views
1
Entering edit mode
Robert Castelo ★ 2.9k
@rcastelo
Last seen 49 minutes ago
Barcelona/Universitat Pompeu Fabra

This kind of question has been answered previously at least in two posts (this one and this other one). In essence, you need to stick to the annotations you used, or were used, to produce the gene expression data matrix that you are using and do the id mapping at the level of gene set. As you rightly suspect, if you duplicate the expresion profiles in your gene expression data matrix, you're going to introduce colinearities at gene level that, at the very least, are going to distort your inferences and more likely trigger some error during downstream calculations. Doing the id mapping at the level of gene set has the advantage that it doesn't matter much whether a gene set has 10 genes in terms of gene symbols, 8 in terms of Entrez and 15 in terms of Ensembl, because in all three cases, the gene set remains being the same gene set, just that its definition changes slightly depending on the gene nomenclature and most likely, with little impact in the actual calculations.

cheers,

robert.