Question

How to use compareCluster in ClusterProfiler?

0

Entering edit mode

b8177 • 0

@a2b6a3b2

Last seen 3 months ago

United States

Hello!

I am new to using compareCluster and just want to double check - the input forcompareCluster can be DESeq2 results by using both the log2fc and Entrez id: is that correct?

And one can basically just order the log2fc and Entrez id in descending order (from greatest to lowest log2fc) and extract the entire gene list?... or just the significant ones (ex: <0.05 padj) ... or does it not matter?

So, say I have 4 DESeq results/comparisons:

Treatment A vs. control, Treatment B vs. control, Treatment C vs. control Treatment D vs. control

For compareCluster, I would:

Take the DESeq results for each comparison and convert gene id/symbol to Entrez id
Extract entrez id and log2fc values and order them in descending order; and
Pull just the entrez id and save it as a list (I would then have, for example: GeneListA, GeneListB, GeneListC and GeneListD).

Then, I would save these 4 gene lists into a single list (e.g., myGeneList) and use it as input for compareCluster?

My ultimate goal is to run gene set enrichment and over representation analysis and being able to compare and visualize the results "all together", instead of running one for each DESeq comparisons, in a dotplot or barplot to investigate the GO terms/KEGG pathways that each treatment has in common/uncommon.

Clarification is much appreciated! I've been searching online for details as to what the "gene list" should look like but haven't found anything that can answer my question.

Thank you.

RNASeqData GSEA clusterProfiler • 1.3k views

ADD COMMENT • link written 4 months ago by b8177 • 0

score 1 · Answer 1 · 2024-09-04

First of all, check the help page of compareCluster and here for some example code.

Whether or not to use all genes, or a selection, depends on the type of analysis you would like to perform. It is basically a functional class scoring (FCS) method (GSEA) versus an over-representation analysis (ORA). See e.g. here for more info on this. For GSEA you indeed will need to rank all genes based on a metric, for ORA you will need to select a subset of your genes based on a metric.

Your questions:

1): yes, the central / preferred id type being used are entrez ids.

2): for GSEA thus 'yes', for ORA 'no'.

3): for ORA 'yes', for GSEA 'no'. For GSEA you will need to included the ranking metric. (The input for ORA is a set of ids (character vector), but GSEA requires a named numeric vector that is already ranked (sorted) on the metric from high-to-low).

4) yes, then combine these 'lists of genes' into an R list-object.

Example input for GSEA:

> data(geneList, package="DOSE")
> head(geneList)
    4312     8318    10874    55143    55388      991 
4.572613 4.514594 4.418218 4.144075 3.876258 3.677857 
>
> class(geneList)
[1] "numeric"
>

Example input for ORA:

> genes <- names(geneList)[abs(geneList) > 2]
> head(genes)
[1] "4312"  "8318"  "10874" "55143" "55388" "991"  
> class(genes)
[1] "character"
>

Example input for ORA-based compareCluster:

> data(gcSample, package="clusterProfiler")
> str(gcSample)
List of 8
 $ X1: chr [1:216] "4597" "7111" "5266" "2175" ...
 $ X2: chr [1:805] "23450" "5160" "7126" "26118" ...
 $ X3: chr [1:392] "894" "7057" "22906" "3339" ...
 $ X4: chr [1:838] "5573" "7453" "5245" "23450" ...
 $ X5: chr [1:929] "5982" "7318" "6352" "2101" ...
 $ X6: chr [1:585] "5337" "9295" "4035" "811" ...
 $ X7: chr [1:582] "2621" "2665" "5690" "3608" ...
 $ X8: chr [1:237] "2665" "4735" "1327" "3192" ...
> class(gcSample)
[1] "list"
>