Search
Question: GSVA & Limma for DE of Gene Sets using NanoString Data
0
gravatar for Sarah B.
8 weeks ago by
Sarah B.0
United States
Sarah B.0 wrote:

Dear Bioconductors,  

I recently used GSVA to get enrichment scores for 3 pathways (i.e. gene sets) using an expression matrix that was generated with NanoString nCounter. Prior to using the expression matrix with GSVA, the data was normalized according to the NanoString manual and log2 transformed. The expression matrix simply consists of 48 genes by 24 samples (i.e. 12 unique samples with two replicates each). More specifically, the expression matrix is composed of 3 donors each with 4 different phenotypes and two replicates of each phenotype. Since the input expression values were continuous, I used kcdf="Gaussian" as an input. Also, I used mx.diff = TRUE to acquire a standard Gaussian distribution of enrichment scores so that the scores could be analyzed with limma. 

From the enrichment scores, I am interested in determining if they are differentially expressed across the different phenotypes for each donor. Specifically, I am interested in determining if the scores are differentially expressed when comparing a certain phenotype across the three donors as well as when comparing the different phenotypes across a given donor. Shown below is the code I have used to determine the differentially expressed gene sets for one group of comparisons. I have never used Limma nor microarray before and thus, I am unsure if I have adapted the code properly (if possible) for my NanoString data. While I believe I have set up the design matrix and contrast matrix correctly for my intended question, I have shown them below the code for your review. As shown, I intend to do many sets of comparisons and thus I was planning to create separate contrast matrices for each set of comparisons. However, I am wondering if it is statistically flawed to break up these comparisons into separate matrices and if I instead should have combined all comparisons for the study into one contrast matrix. Further, though I have used the Benjamini-Hochberg procedure for multiple comparisons, what I really want to do are planned comparisons (i.e. only a few sensible comparisons which were decided before looking at the data). As such, is it incorrect to use the BH method? If so, would you recommend using Bonferroni corretion (if possible with limma)? 

I appreciate any advice you can offer.

Thank you for your time,

SB

———

R code:

gsva_es <- gsva(eset, geneSets, mx.diff = TRUE)

 

samples <- colnames(gsva_es) # extract sample names

s <- gsub('.{2}$', '', samples) # samples with replicate format for factoring

f <- factor(s,levels=unique(s)) # factor samples (groups replicates)

 

adjPvalueCutoff <- 0.05

design <- model.matrix(~0+f) 

fit1 <- lmFit(gsva_es, design)

 

# Contrast matrix comparing M1 phenotype across donors

contrast.matrix <- makeContrasts(D1.M1-D2.M1, D3.M1-D2.M1, D3.M1-D1.M1, levels = design) 

fit2 <- contrasts.fit(fit1, contrast.matrix) 

fit2 <- eBayes(fit2)

DEGeneSets <-topTable(fit2, p.value=adjPvalueCutoff, adjust="BH")

results <- decideTests(fit2, p.value=adjPvalueCutoff)

pvals <- fit2$p.value # p values associated with the comparisons of interest

 

# Contrast matrix comparing M2a phenotype across donors

# Contrast matrix comparing M2c phenotype across donors

# Contrast matrix comparing Donor 1 phenotypes

# Contrast matrix comparing Donor 2 phenotypes

# Contrast matrix comparing Donor 3 phenotypes

———

Enrichment Scores shown as gene sets by samples:

> gsva_es
        D1.M0.1    D1.M0.2    D1.M1.1     D1.M1.2   D1.M2a.1   D1.M2a.2   D1.M2c.1   D1.M2c.2     D2.M0.1    D2.M0.2
M1  -0.09316770 -0.2771739  0.8695652  0.78260870 -0.8260870 -0.8043478 -0.7713366 -0.6897233  0.02173913  0.1521739
M2a -0.07474747  0.3972222 -0.7729730 -0.66523297  0.7966102  0.6444444 -0.1333333 -0.1311828  0.11111111  0.2750958
M2c  0.91001011  0.8267442 -0.3874078 -0.06976744 -0.8139535 -0.7328724  0.8028101  0.8417594 -0.10003186 -0.5507956
        D2.M1.1    D2.M1.2   D2.M2a.1   D2.M2a.2  D2.M2c.1   D2.M2c.2       D3.M0.1    D3.M0.2    D3.M1.1    D3.M1.2
M1   0.71739130  0.6739130 -0.3777174 -0.8260870 0.6086957 -0.1304348  2.775558e-16 -0.3339921  0.8478261  0.8913043
M2a -0.57992832 -0.3333333  0.5777778  0.5111111 0.0000000 -0.1000000 -8.722222e-01 -0.1111111 -0.3535354 -0.3871795
M2c -0.09838998 -0.2564103 -0.7845157 -0.7906977 0.4532978  0.6526878  6.744186e-01  0.7229236  0.1627907 -0.3814769
       D3.M2a.1   D3.M2a.2   D3.M2c.1   D3.M2c.2
M1  -0.73913043 -0.7207358 -0.3777174 -0.6521739
M2a  0.54329502  0.5515152 -0.4437500 -0.4315245
M2c -0.05923651 -0.2831497  0.8604651  0.8715258

Design matrix shown as 24 rows (number of samples) by 12 unique samples (i.e. samples without replicates) :

> design

   D1.M0 D1.M1 D1.M2a D1.M2c D2.M0 D2.M1 D2.M2a D2.M2c D3.M0 D3.M1 D3.M2a D3.M2c
1      1     0      0      0     0     0      0      0     0     0      0      0
2      1     0      0      0     0     0      0      0     0     0      0      0
3      0     1      0      0     0     0      0      0     0     0      0      0
4      0     1      0      0     0     0      0      0     0     0      0      0
5      0     0      1      0     0     0      0      0     0     0      0      0
6      0     0      1      0     0     0      0      0     0     0      0      0
7      0     0      0      1     0     0      0      0     0     0      0      0
8      0     0      0      1     0     0      0      0     0     0      0      0
9      0     0      0      0     1     0      0      0     0     0      0      0
10     0     0      0      0     1     0      0      0     0     0      0      0
11     0     0      0      0     0     1      0      0     0     0      0      0
12     0     0      0      0     0     1      0      0     0     0      0      0
13     0     0      0      0     0     0      1      0     0     0      0      0
14     0     0      0      0     0     0      1      0     0     0      0      0
15     0     0      0      0     0     0      0      1     0     0      0      0
16     0     0      0      0     0     0      0      1     0     0      0      0
17     0     0      0      0     0     0      0      0     1     0      0      0
18     0     0      0      0     0     0      0      0     1     0      0      0
19     0     0      0      0     0     0      0      0     0     1      0      0
20     0     0      0      0     0     0      0      0     0     1      0      0
21     0     0      0      0     0     0      0      0     0     0      1      0
22     0     0      0      0     0     0      0      0     0     0      1      0
23     0     0      0      0     0     0      0      0     0     0      0      1
24     0     0      0      0     0     0      0      0     0     0      0      1

 

The coefficients from the lmFit show the averaged replicate enrichment scores for each sample. 

> cfs
         D1.M0      D1.M1     D1.M2a     D1.M2c       D2.M0      D2.M1     D2.M2a     D2.M2c      D3.M0
M1  -0.1851708  0.8260870 -0.8152174 -0.7305299  0.08695652  0.6956522 -0.6019022  0.2391304 -0.1669960
M2a  0.1612374 -0.7191030  0.7205273 -0.1322581  0.19310345 -0.4566308  0.5444444 -0.0500000 -0.4916667
M2c  0.8683771 -0.2285876 -0.7734129  0.8222847 -0.32541373 -0.1774001 -0.7876067  0.5529928  0.6986711
         D3.M1     D3.M2a     D3.M2c
M1   0.8695652 -0.7299331 -0.5149457
M2a -0.3703574  0.5474051 -0.4376373
M2c -0.1093431 -0.1711931  0.8659955

> contrast.matrix

        Contrasts
Levels   D3.M2a - D3.M1 D3.M2c - D3.M2a D3.M2c - D3.M1
  D1.M0               0               0              0
  D1.M1               0               0              0
  D1.M2a              0               0              0
  D1.M2c              0               0              0
  D2.M0               0               0              0
  D2.M1               0               0              0
  D2.M2a              0               0              0
  D2.M2c              0               0              0
  D3.M0               0               0              0
  D3.M1              -1               0             -1
  D3.M2a              1              -1              0
  D3.M2c              0               1              1

ADD COMMENTlink modified 8 weeks ago by Robert Castelo2.2k • written 8 weeks ago by Sarah B.0

Does the gsva_es matrix that you enter to limma have only 3 rows?

ADD REPLYlink written 8 weeks ago by Gordon Smyth35k

Yes, three rows since I am only interested in examining enrichment of three gene sets (M1, M2a, and M2c) in the above samples.​

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Sarah B.0

With only three rows, there's hardly any empirical Bayes moderation for limma to do. So the results from limma will be nearly the same as if you did a linear model or anova analysis for each row separately. There's no possible harm in using limma however if you find it convenient.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Gordon Smyth35k
0
gravatar for Gordon Smyth
8 weeks ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

I have no experience with entering GSVA scores into limma, nor with Nanostring in general. I can make a couple of comments though.

It's fine to break up the contrasts into groups that are tested separately.

In limma, p-value adjustments are usually used to adjust for multiple testing over genes (or pathways in your case) rather than over contrasts. If you want to adjust for multiple contrasts as well as multiple pathways, then use method="global" in decideTests().

Using Bonferroni is never a good idea as the "holm" method does the same thing and is less conservative, see help("p.adjust"). I'm not clear why you don't want to use the BH method but, if you want to use family-wise type I error rate control instead of FDRs, then specify adjust.method="holm" to decideTests().

If you really only make a few pre-planned tests, then you might not need multiple testing adjustment at all -- in that case you might simply use the p-values.

 

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Gordon Smyth35k
0
gravatar for Robert Castelo
8 weeks ago by
Robert Castelo2.2k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.2k wrote:

hi, regarding the GSVA transformation, the parameters you have used seem fine to me. The rest of the question seems to be more related to general experimental design. I would try though to have both analysis, at feature level and gene-set level and try to have a sense whether you are using the appropriate gene set definitions. The fact that you are only using 3 gene sets might also require attention since the limma pipeline informs the differential expression statistics with the variability of all the features, which in this case are only three.

cheers,

robert.

ADD COMMENTlink written 8 weeks ago by Robert Castelo2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 232 users visited in the last hour