Hello,
I've carried out a gene-level differential expression analysis in limma with a human-derived clinical RNA-Seq dataset, utilising voomWithQualityWeights in order to account for variation in sample quality and heteroscedasticity between groups.
I'd now like to run the same model in limma with GSVA scores derived from the MSigDB C2 collection of gene sets and some other previously defined gene sets that are of particular interest in our study.
Exploratory analysis of the GSVA scores (PCA and MDS) suggests that the sample quality variation/group heteroscedasticity that we observed at the gene-level is also there at the gene set-level, albeit to a lesser extent.
Would it therefore be problematic to re-use the sample weights calculated for the gene-level analysis in the gene set analysis (i.e. the weights stored in v$targets$sample.weights) or to use arrayWeights to generate sample-level weights from the GSVA scores?
I have in fact already tried both approaches and found, particularly when re-using the gene-level analysis weights, that the results for the gene set analysis are more similar to the gene-level analysis compared to when no weights are used.
By similarity I mean that the numbers of significant hits for each contrast are more aligned between the gene-level analysis and the gene set analysis when weights are used, i.e. contrasts with fewer significant DEGs have fewer DE gene sets and contrasts with more DEGs have more DE gene sets. When no weights are used in the gene set analysis, there is less alignment with the gene-level analysis.
Thanks very much again for your help

Hi Gordon, thanks for giving your advice on how to use sample weights with GSVA scores. I have not used so far sample weights with GSVA scores either, but my intuition is also that since sample weights are data-driven, it is probably better to re-calculate them from the GSVA scores using
arrayWeights(). I also notice now that in this section of the GSVA vignette I was giving the wrong advice on how to set thetrendparameter in the limma-trend pipeline (usingngenesinstead ofsqrt(ngenes)), I'll correct that today.Thanks both for this useful information.