Hi there guys.
I have gene expression data collected from nCounter with a panel of around 700 genes (Pancancer immune profiling panel). I performed differential expression analysis and now I want to see which immune-related pathways are over or under expressed in my samples. In my nCounter samples I do not have normal tissue so I can´t perform a differential expression analysis between tumor and non-tumor samples (instead I can make between tumor subgroups, but that is not of interest for the discussion).
So, in order to overcome this I downloaded a public dataset that consists of several tumor and non-tumor datasets that were joined through batch-correction. The data is on Log2 RUV-normalized format.
So in order to perform GSEA analysis between a subgroup of the tumor samples and the non-tumor samples I tried to use both the fgsea and clusterProfiler packages in R, but both packages either provide me with p-significant values but no significant adjusted p-values or with only one pathway being significant. I am using the C7 ImmuneSigDB genesets for both packages. Below I send you my data and script for the gsea analysis with both packages.
I am wondering if the fact that I am using a small and already "enriched" (for immune pathways) panel can have an impact on my results. Also if the data format can have something to do with it.
Should I define the background genes of the fgsea() function to the genes present in my panel? If so how can I do that?
Hope you can help me, Thank you in advance
> head(data_limma, c(6L, 6L))
       Patient0 Patient1 Patient2 Patient3 Patient4 Patient5
A2M      0.2895   1.2463  -0.1309  -0.8332  -0.5526   0.0017
ABCB1    0.5023   1.2426  -0.5671  -0.5815  -0.1716   1.1230
ABCF1    0.0848   0.5559   0.1182   0.0412  -0.0118   0.1299
ACKR2   -0.0767   0.2744   0.3907  -0.3495   0.0176   0.2393
ADA     -0.0226  -0.0375   0.7206  -0.0117  -0.3979  -0.1945
ADGRE2   0.1158   0.1684  -0.2642   0.1162  -0.3081   0.0663
fgsea script
genes <- rownames(de_genes_SHH_NORMAL)
de_genes_SHH_NORMAL$symbol <- genes
library(tidyverse)
res2 <- de_genes_SHH_NORMAL %>% 
  dplyr::select(symbol, logFC) %>% 
  na.omit() %>% 
  distinct() %>% 
  group_by(symbol) %>%
  arrange(desc(logFC))
library(msigdbr)
hs_immune_df <- msigdbr(species = "Homo sapiens",
                          category = "C7",
                          subcategory = "IMMUNESIGDB")
ranks <- deframe(res2)
head(ranks, 20)
fgsea_SHH <- fgsea(pathways = hs_immune_df, stats=ranks, nperm = 1000)
fgsea_SHH2 <- fgseaMultilevel(pathways = hs_immune_df, stats = ranks, maxSize = 500)
fgseaResTidy <- fgsea_SHH %>%
  as_tibble() %>%
  arrange(desc(NES))
# Show in a nice table:
fgseaResTidy %>% 
  dplyr::select(-leadingEdge, -ES) %>% 
  arrange(padj) %>% 
  DT::datatable()
ClusterProfiler script
library(clusterProfiler)
hs_immune_df <- msigdbr(species = "Homo sapiens",
                          category = "C7",
                          subcategory = "IMMUNESIGDB")
Pre-Ranked gene list
The GSEA() function takes a pre-ranked (sorted) named vector of statistics, where the names in the vector are gene identifiers. This is step 1 â gene-level statistics.
lfc_vector <- de_genes_SHH_NORMAL %>%
  # Extract a vector of `log2FoldChange` named by `gene_symbol`
  dplyr::pull(logFC, name = symbol)
lfc_vector <- sort(lfc_vector, decreasing = TRUE)
# Look at first entries of the log2 fold change vector
head(lfc_vector)
# Look at the last entries of the log2 fold change vector
tail(lfc_vector)
Run GSEA
Now for the analysis!
gsea_results <- GSEA(geneList = lfc_vector,  # ordered ranked gene list
                     minGSSize = 25,  # minimum gene set size
                     maxGSSize = 500,  # maximum gene set set
                     pvalueCutoff = 0.05,
                     pAdjustMethod = "fdr",  # correction for multiple hypothesis testing
                     TERM2GENE = dplyr::select(hs_immune_df,
                                               gs_name,
                                               gene_symbol))
#see the results
View(gsea_results@result %>%
       dplyr::arrange(dplyr::desc(NES))
)
                    
                
                