fgsea/clusterProfiler Packages for nCounter data enrichment analysis
0
0
Entering edit mode
pg45863 ▴ 10
@e44d727a
Last seen 7 months ago
Portugal

Hi there guys.

I have gene expression data collected from nCounter with a panel of around 700 genes (Pancancer immune profiling panel). I performed differential expression analysis and now I want to see which immune-related pathways are over or under expressed in my samples. In my nCounter samples I do not have normal tissue so I can´t perform a differential expression analysis between tumor and non-tumor samples (instead I can make between tumor subgroups, but that is not of interest for the discussion).

So, in order to overcome this I downloaded a public dataset that consists of several tumor and non-tumor datasets that were joined through batch-correction. The data is on Log2 RUV-normalized format.

So in order to perform GSEA analysis between a subgroup of the tumor samples and the non-tumor samples I tried to use both the fgsea and clusterProfiler packages in R, but both packages either provide me with p-significant values but no significant adjusted p-values or with only one pathway being significant. I am using the C7 ImmuneSigDB genesets for both packages. Below I send you my data and script for the gsea analysis with both packages.

I am wondering if the fact that I am using a small and already "enriched" (for immune pathways) panel can have an impact on my results. Also if the data format can have something to do with it.

Should I define the background genes of the fgsea() function to the genes present in my panel? If so how can I do that?

Hope you can help me, Thank you in advance

> head(data_limma, c(6L, 6L))

       Patient0 Patient1 Patient2 Patient3 Patient4 Patient5
A2M      0.2895   1.2463  -0.1309  -0.8332  -0.5526   0.0017
ABCB1    0.5023   1.2426  -0.5671  -0.5815  -0.1716   1.1230
ABCF1    0.0848   0.5559   0.1182   0.0412  -0.0118   0.1299
ACKR2   -0.0767   0.2744   0.3907  -0.3495   0.0176   0.2393
ADA     -0.0226  -0.0375   0.7206  -0.0117  -0.3979  -0.1945
ADGRE2   0.1158   0.1684  -0.2642   0.1162  -0.3081   0.0663

fgsea script

genes <- rownames(de_genes_SHH_NORMAL)
de_genes_SHH_NORMAL$symbol <- genes

library(tidyverse)
res2 <- de_genes_SHH_NORMAL %>% 
  dplyr::select(symbol, logFC) %>% 
  na.omit() %>% 
  distinct() %>% 
  group_by(symbol) %>%
  arrange(desc(logFC))
library(msigdbr)
hs_immune_df <- msigdbr(species = "Homo sapiens",
                          category = "C7",
                          subcategory = "IMMUNESIGDB")
ranks <- deframe(res2)
head(ranks, 20)
fgsea_SHH <- fgsea(pathways = hs_immune_df, stats=ranks, nperm = 1000)

fgsea_SHH2 <- fgseaMultilevel(pathways = hs_immune_df, stats = ranks, maxSize = 500)

fgseaResTidy <- fgsea_SHH %>%
  as_tibble() %>%
  arrange(desc(NES))

# Show in a nice table:
fgseaResTidy %>% 
  dplyr::select(-leadingEdge, -ES) %>% 
  arrange(padj) %>% 
  DT::datatable()

ClusterProfiler script

library(clusterProfiler)
hs_immune_df <- msigdbr(species = "Homo sapiens",
                          category = "C7",
                          subcategory = "IMMUNESIGDB")

Pre-Ranked gene list

The GSEA() function takes a pre-ranked (sorted) named vector of statistics, where the names in the vector are gene identifiers. This is step 1 – gene-level statistics.

lfc_vector <- de_genes_SHH_NORMAL %>%
  # Extract a vector of `log2FoldChange` named by `gene_symbol`
  dplyr::pull(logFC, name = symbol)
lfc_vector <- sort(lfc_vector, decreasing = TRUE)

# Look at first entries of the log2 fold change vector
head(lfc_vector)

# Look at the last entries of the log2 fold change vector
tail(lfc_vector)

Run GSEA

Now for the analysis!

gsea_results <- GSEA(geneList = lfc_vector,  # ordered ranked gene list
                     minGSSize = 25,  # minimum gene set size
                     maxGSSize = 500,  # maximum gene set set
                     pvalueCutoff = 0.05,
                     pAdjustMethod = "fdr",  # correction for multiple hypothesis testing
                     TERM2GENE = dplyr::select(hs_immune_df,
                                               gs_name,
                                               gene_symbol))
#see the results
View(gsea_results@result %>%
       dplyr::arrange(dplyr::desc(NES))
)
NanoStringDiff clusterProfiler fgsea • 848 views
ADD COMMENT

Login before adding your answer.

Traffic: 556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6