Bug in GSVA with methods other than default?

0

Entering edit mode

Jon Manning ▴ 90

@jon-manning-5708

Last seen 7.7 years ago

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

• 2.2k views

ADD COMMENT • link 10.2 years ago Jon Manning ▴ 90

0

Entering edit mode

Jon Manning ▴ 90

@jon-manning-5708

Last seen 7.7 years ago

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ADD COMMENT • link 10.2 years ago Jon Manning ▴ 90

0

Entering edit mode

On Fri, Feb 21, 2014 at 1:55 PM, Jonathan Manning <jonathan.manning at="" ed.ac.uk=""> wrote: > gsva_es <- gsva(y, geneSets, mx.diff=1, method='plage')$es.obs For plage you simply leave the $es.obs out. You only need to put it when you use method = "gsva" Like this: gsva_es <- gsva(y, geneSets, mx.diff=1, method='plage') On Fri, Feb 21, 2014 at 1:55 PM, Jonathan Manning <jonathan.manning at="" ed.ac.uk=""> wrote: > Hi Sonja, > > Thanks. > > The error occurs in ALL situations where an alternate method is provided. > Tweaking your own example does the trick, try this modification to > reproduce: > > gsva_es <- gsva(y, geneSets, mx.diff=1, method='plage')$es.obs > > Jon > > > On 21/02/2014 12:34, Sonja Haenzelmann wrote: > > Hi Jonathan. > > Justin, the co author of the paper added the bootstrapping. As far as > I know it does not work. There was an earlier post on this in the > mailng list. > > How does your input look like? gsva can handle expression set or > matrix, nothing else. > > Did I answer your question? If not, please provide me a *reproducible* > example, otherwise I cannot help you :/ > > Cheers > Sonja > > > On Fri, Feb 21, 2014 at 1:19 PM, Jonathan Manning > <jonathan.manning at="" ed.ac.uk=""> wrote: > > Hi Sonja, > > Thanks for the quick reply, but I don't think I've made myself clear. I'm > aware that gsva() returns that list, from which you need to extract es.obs, > but the error lies internal to the method when that list is being > constructed. > > For example, when I run: > > gsva_es <- gsva(my.expressionset, c2BroadSets, verbose=TRUE, > method='plage')$es.obs > > I get the error I described: > > Error in eSco$es.obs : $ operator is invalid for atomic vectors > > From what I can determine, GSVA:::.gsva() (called from gsva()) returns the > results of PLAGE et al directly, as matrices, rather than doing > bootstrapping etc and making a list. But the top-level gsva() seems to > expect a list with the results of bootstrapping etc, as happens in the > default case. The lines from gsva() in question are: > > eSco <- GSVA:::.gsva(Biobase::exprs(expr), mapped.gset.idx.list, > > method, rnaseq, abs.ranking, no.bootstraps, > bootstrap.percent, > > parallel.sz, parallel.type, mx.diff, tau, kernel, > > verbose) > > eScoEset <- expr > > Biobase::exprs(eScoEset) <- eSco$es.obs > > > > Biobase::annotation(eScoEset) <- "" > > return(list(es.obs = eScoEset, bootstrap = eSco$bootstrap, > > p.vals.sign = eSco$p.vals.sign)) > > If I specify method='plage', then 'eSco' is a matrix and 'eSco$es.obs' won't > work. > > Is that clearer? It seems that bootstrapping doesn't happen right now where > method is PLAGE etc- should that be the case? > > Jon > > > > On 21/02/2014 12:01, Sonja Haenzelmann wrote: > > gsva_es <- gsva(y, geneSets, mx.diff=1)$es.obs > > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. >

ADD REPLY • link 10.2 years ago Sonja Haenzelmann ▴ 20

0

Entering edit mode

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ADD REPLY • link 10.2 years ago Jon Manning ▴ 90

0

Entering edit mode

I think the problem lies in your expression set. The gene names have to be in ENTREZ gene Id format, as the c2BroadSets are in this format. gsva maps the genes in your expression set with the genes in the gene set list. If they do not have the same identifiers it cannot map the genes, and hence not calculate any statistics. library(GSVAdata) data(c2BroadSets) library(GSVA) example <- as.matrix(read.table("example.txt")) eset_example <- ExpressionSet(example, annotation='illuminaHumanv4') pdata_example <- data.frame(class=c(rep('class1', 5), c(rep('class2', 5)))) rownames(pdata_example) <- letters[1:10] pData(eset_example) <- pdata_example > featureNames(eset_example) [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" "ILMN_1806310" [6] "ILMN_1779670" "ILMN_1653355" "ILMN_1717783" "ILMN_1705025" "ILMN_1814316" you will have to map your ILMN ids to entrez gene ids, then gsva should be working. For example in the toy example that shows up, when you do ?gsva, you will find that the gene names are the same in the gene set list as well as in the gene expression matrix. p <- 10 ## number of genes n <- 30 ## number of samples nGrp1 <- 15 ## number of samples in group 1 nGrp2 <- n - nGrp1 ## number of samples in group 2 ## consider three disjoint gene sets geneSets <- list(set1=paste("g", 1:3, sep=""), set2=paste("g", 4:6, sep=""), set3=paste("g", 7:10, sep="")) > geneSets $set1 [1] "g1" "g2" "g3" $set2 [1] "g4" "g5" "g6" $set3 [1] "g7" "g8" "g9" "g10" ## sample data from a normal distribution with mean 0 and st.dev. 1 y <- matrix(rnorm(n*p), nrow=p, ncol=n, dimnames=list(paste("g", 1:p, sep="") , paste("s", 1:n, sep=""))) rownames(y) [1] "g1" "g2" "g3" "g4" "g5" "g6" "g7" "g8" "g9" "g10" Best Sonja On Fri, Feb 21, 2014 at 2:39 PM, Jonathan Manning <jonathan.manning at="" ed.ac.uk=""> wrote: > example <- as.matrix(read.table("example.txt")) > eset_example <- ExpressionSet(example, annotation='illuminaHumanv4') > pdata_example <- data.frame(class=c(rep('class1', 5), c(rep('class2', 5)))) > rownames(pdata_example) <- letters[1:10] > pData(eset_example) <- pdata_example > gsva_es <- gsva(eset_example, c2BroadSets , mx.diff=1, method='plage')

ADD REPLY • link 10.2 years ago Sonja Haenzelmann ▴ 20

0

Entering edit mode

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ADD REPLY • link 10.2 years ago Jon Manning ▴ 90

0

Entering edit mode

hi Jonathan cc Sonja, you're right, this is a bug occuring when the input is an 'ExpressionSet' object and the argument 'method' as a value different than 'gsva'. i've just submitted a fix to both, the devel and the release versions of GSVA, they should become available via biocLite() in the following 24/48 hrs as versions 1.11.4 and 1.10.3, respectively. in the meantime, you can always check out directly from the corresponding svn repository (current release or devel) the updated source and build it yourself; see http://master.bioconductor.org/developers/how-to/source-control sorry for the inconvenience, robert. On 02/21/2014 03:42 PM, Jonathan Manning wrote: > Hi Sonja, > > No, that's not it. > > The probe IDs map perfectly fine through the annotation package and > GSEABase::mapIdentifiers() etc, and everything works fine if I hack > around the bug I described previously. Again, what you have is some code > (e.g. /'Biobase::exprs(eScoEset) <- eSco$es.obs) /), which assumes that > a list is returned by /GSVA:::.gsva()/. In the case of " > /method='plage'/ " et al, it is not. If you tweak things with that in > mind, there's no problem. > > Here is the code for gsva() for the ExpressionSet signature (retrieved > with /getMethod(gsva, signature=c(expr="ExpressionSet", > gset.idx.list="GeneSetCollection", annotation="missing"))/ ), with my > hacks included in bold: > > > function (expr, gset.idx.list, annotation = NULL, ...) > { > .local <- function (expr, gset.idx.list, annotation = NULL, > method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, > abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, > bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", > mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, > NA), kernel = TRUE, verbose = TRUE) > { > sdGenes <- Biobase::esApply(expr, 1, sd) > if (any(sdGenes == 0)) { > if (verbose) > cat("Filtering out", sum(sdGenes == 0), "genes with constant expression > values throuhgout the samples\n") > expr <- expr[sdGenes > 0, ] > } > if (nrow(expr) < 2) > stop("Less than two genes in the input ExpressionSet object\n") > if (verbose) > cat("Mapping identifiers between gene sets and feature names\n") > mapped.gset.idx.list <- GSEABase::mapIdentifiers(gset.idx.list, > GSEABase::AnnoOrEntrezIdentifier(Biobase::annotation(expr))) > tmp <- lapply(geneIds(mapped.gset.idx.list), function(x, > y) na.omit(match(x, y)), featureNames(expr)) > names(tmp) <- names(mapped.gset.idx.list) > mapped.gset.idx.list <- filterGeneSets(tmp, min.sz = max(1, min.sz), > max.sz = max.sz) > eSco <- GSVA:::.gsva(Biobase::exprs(expr), mapped.gset.idx.list, > method, rnaseq, abs.ranking, no.bootstraps, bootstrap.percent, > parallel.sz, parallel.type, mx.diff, tau, kernel, > verbose) > eScoEset <- expr > #Biobase::exprs(eScoEset) <- eSco$es.obs > * if (method=='gsva'){ > Biobase::exprs(eScoEset) <- eSco$es.obs > }else{ > Biobase::exprs(eScoEset) <- eSco > }* > > Biobase::annotation(eScoEset) <- "" > #return(list(es.obs = eScoEset, bootstrap = eSco$bootstrap, > # p.vals.sign = eSco$p.vals.sign)) > *return (eScoEset)* > } > .local(expr, gset.idx.list, annotation, ...) > } > > Since you say the bootstrapping isn't much use anyway, I've just set it > to return the matrix in all cases. > > Jon > > > > > On 21/02/2014 14:05, Sonja Haenzelmann wrote: >> I think the problem lies in your expression set. >> The gene names have to be in ENTREZ gene Id format, as the c2BroadSets >> are in this format. gsva maps the genes in your expression set with >> the genes in the gene set list. If they do not have the same >> identifiers it cannot map the genes, and hence not calculate any >> statistics. >> >> >> library(GSVAdata) >> data(c2BroadSets) >> library(GSVA) >> >> >> example<- as.matrix(read.table("example.txt")) >> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4') >> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', >> 5)))) >> rownames(pdata_example)<- letters[1:10] >> pData(eset_example)<- pdata_example >> >> >>> featureNames(eset_example) >> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >> "ILMN_1806310" >> [6] "ILMN_1779670" "ILMN_1653355" "ILMN_1717783" "ILMN_1705025" >> "ILMN_1814316" >> >> you will have to map your ILMN ids to entrez gene ids, then gsva >> should be working. >> >> >> For example in the toy example that shows up, when you do ?gsva, you >> will find that the gene names are the same in the gene set list as >> well as in the gene expression matrix. >> >> p<- 10 ## number of genes >> n<- 30 ## number of samples >> nGrp1<- 15 ## number of samples in group 1 >> nGrp2<- n - nGrp1 ## number of samples in group 2 >> >> ## consider three disjoint gene sets >> geneSets<- list(set1=paste("g", 1:3, sep=""), >> set2=paste("g", 4:6, sep=""), >> set3=paste("g", 7:10, sep="")) >> >> >> >>> geneSets >> $set1 >> [1] "g1" "g2" "g3" >> >> $set2 >> [1] "g4" "g5" "g6" >> >> $set3 >> [1] "g7" "g8" "g9" "g10" >> >> >> ## sample data from a normal distribution with mean 0 and st.dev. 1 >> y<- matrix(rnorm(n*p), nrow=p, ncol=n, >> dimnames=list(paste("g", 1:p, sep="") , paste("s", >> 1:n, sep=""))) >> >> rownames(y) >> [1] "g1" "g2" "g3" "g4" "g5" "g6" "g7" "g8" "g9" "g10" >> >> Best >> Sonja >> >> >> >> >> On Fri, Feb 21, 2014 at 2:39 PM, Jonathan Manning >> <jonathan.manning at="" ed.ac.uk=""> wrote: >>> example<- as.matrix(read.table("example.txt")) >>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4') >>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', >>> 5)))) >>> rownames(pdata_example)<- letters[1:10] >>> pData(eset_example)<- pdata_example >>> gsva_es<- gsva(eset_example, c2BroadSets , mx.diff=1, method='plage') > > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550

ADD REPLY • link 10.2 years ago Robert Castelo ★ 3.3k

0

Entering edit mode

Hi Robert, I confirm the fix worked (though I don't think the new version is in the repo yet so I installed from source). Could there maybe be some more documentation in the gsva() help page on bootstrapping and how it doesn't apply to the other methods? Thanks, Jon On 21/02/2014 15:08, Robert Castelo wrote: > hi Jonathan cc Sonja, > > you're right, this is a bug occuring when the input is an > 'ExpressionSet' object and the argument 'method' as a value different > than 'gsva'. > > i've just submitted a fix to both, the devel and the release versions > of GSVA, they should become available via biocLite() in the following > 24/48 hrs as versions 1.11.4 and 1.10.3, respectively. > > in the meantime, you can always check out directly from the > corresponding svn repository (current release or devel) the updated > source and build it yourself; see > > http://master.bioconductor.org/developers/how-to/source-control > > > > sorry for the inconvenience, > > robert. > > On 02/21/2014 03:42 PM, Jonathan Manning wrote: >> Hi Sonja, >> >> No, that's not it. >> >> The probe IDs map perfectly fine through the annotation package and >> GSEABase::mapIdentifiers() etc, and everything works fine if I hack >> around the bug I described previously. Again, what you have is some code >> (e.g. /'Biobase::exprs(eScoEset) <- eSco$es.obs) /), which assumes that >> a list is returned by /GSVA:::.gsva()/. In the case of " >> /method='plage'/ " et al, it is not. If you tweak things with that in >> mind, there's no problem. >> >> Here is the code for gsva() for the ExpressionSet signature (retrieved >> with /getMethod(gsva, signature=c(expr="ExpressionSet", >> gset.idx.list="GeneSetCollection", annotation="missing"))/ ), with my >> hacks included in bold: >> >> >> function (expr, gset.idx.list, annotation = NULL, ...) >> { >> .local <- function (expr, gset.idx.list, annotation = NULL, >> method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, >> abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, >> bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", >> mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, >> NA), kernel = TRUE, verbose = TRUE) >> { >> sdGenes <- Biobase::esApply(expr, 1, sd) >> if (any(sdGenes == 0)) { >> if (verbose) >> cat("Filtering out", sum(sdGenes == 0), "genes with constant expression >> values throuhgout the samples\n") >> expr <- expr[sdGenes > 0, ] >> } >> if (nrow(expr) < 2) >> stop("Less than two genes in the input ExpressionSet object\n") >> if (verbose) >> cat("Mapping identifiers between gene sets and feature names\n") >> mapped.gset.idx.list <- GSEABase::mapIdentifiers(gset.idx.list, >> GSEABase::AnnoOrEntrezIdentifier(Biobase::annotation(expr))) >> tmp <- lapply(geneIds(mapped.gset.idx.list), function(x, >> y) na.omit(match(x, y)), featureNames(expr)) >> names(tmp) <- names(mapped.gset.idx.list) >> mapped.gset.idx.list <- filterGeneSets(tmp, min.sz = max(1, min.sz), >> max.sz = max.sz) >> eSco <- GSVA:::.gsva(Biobase::exprs(expr), mapped.gset.idx.list, >> method, rnaseq, abs.ranking, no.bootstraps, bootstrap.percent, >> parallel.sz, parallel.type, mx.diff, tau, kernel, >> verbose) >> eScoEset <- expr >> #Biobase::exprs(eScoEset) <- eSco$es.obs >> * if (method=='gsva'){ >> Biobase::exprs(eScoEset) <- eSco$es.obs >> }else{ >> Biobase::exprs(eScoEset) <- eSco >> }* >> >> Biobase::annotation(eScoEset) <- "" >> #return(list(es.obs = eScoEset, bootstrap = eSco$bootstrap, >> # p.vals.sign = eSco$p.vals.sign)) >> *return (eScoEset)* >> } >> .local(expr, gset.idx.list, annotation, ...) >> } >> >> Since you say the bootstrapping isn't much use anyway, I've just set it >> to return the matrix in all cases. >> >> Jon >> >> >> >> >> On 21/02/2014 14:05, Sonja Haenzelmann wrote: >>> I think the problem lies in your expression set. >>> The gene names have to be in ENTREZ gene Id format, as the c2BroadSets >>> are in this format. gsva maps the genes in your expression set with >>> the genes in the gene set list. If they do not have the same >>> identifiers it cannot map the genes, and hence not calculate any >>> statistics. >>> >>> >>> library(GSVAdata) >>> data(c2BroadSets) >>> library(GSVA) >>> >>> >>> example<- as.matrix(read.table("example.txt")) >>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4') >>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', >>> 5)))) >>> rownames(pdata_example)<- letters[1:10] >>> pData(eset_example)<- pdata_example >>> >>> >>>> featureNames(eset_example) >>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>> "ILMN_1806310" >>> [6] "ILMN_1779670" "ILMN_1653355" "ILMN_1717783" "ILMN_1705025" >>> "ILMN_1814316" >>> >>> you will have to map your ILMN ids to entrez gene ids, then gsva >>> should be working. >>> >>> >>> For example in the toy example that shows up, when you do ?gsva, you >>> will find that the gene names are the same in the gene set list as >>> well as in the gene expression matrix. >>> >>> p<- 10 ## number of genes >>> n<- 30 ## number of samples >>> nGrp1<- 15 ## number of samples in group 1 >>> nGrp2<- n - nGrp1 ## number of samples in group 2 >>> >>> ## consider three disjoint gene sets >>> geneSets<- list(set1=paste("g", 1:3, sep=""), >>> set2=paste("g", 4:6, sep=""), >>> set3=paste("g", 7:10, sep="")) >>> >>> >>> >>>> geneSets >>> $set1 >>> [1] "g1" "g2" "g3" >>> >>> $set2 >>> [1] "g4" "g5" "g6" >>> >>> $set3 >>> [1] "g7" "g8" "g9" "g10" >>> >>> >>> ## sample data from a normal distribution with mean 0 and st.dev. 1 >>> y<- matrix(rnorm(n*p), nrow=p, ncol=n, >>> dimnames=list(paste("g", 1:p, sep="") , paste("s", >>> 1:n, sep=""))) >>> >>> rownames(y) >>> [1] "g1" "g2" "g3" "g4" "g5" "g6" "g7" "g8" "g9" "g10" >>> >>> Best >>> Sonja >>> >>> >>> >>> >>> On Fri, Feb 21, 2014 at 2:39 PM, Jonathan Manning >>> <jonathan.manning at="" ed.ac.uk=""> wrote: >>>> example<- as.matrix(read.table("example.txt")) >>>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4') >>>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', >>>> 5)))) >>>> rownames(pdata_example)<- letters[1:10] >>>> pData(eset_example)<- pdata_example >>>> gsva_es<- gsva(eset_example, c2BroadSets , mx.diff=1, method='plage') >> >> >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ADD REPLY • link 10.2 years ago Jon Manning ▴ 90

0

Entering edit mode

hi Jonathan, i'm cc'ing your email to Justin Guinney, who's the actual maintainer of GSVA and who developed that bootstrapping part, he should be able to clarify this. cheers, robert. On 02/24/2014 11:36 AM, Jonathan Manning wrote: > Hi Robert, > > I confirm the fix worked (though I don't think the new version is in the > repo yet so I installed from source). > > Could there maybe be some more documentation in the gsva() help page on > bootstrapping and how it doesn't apply to the other methods? > > Thanks, > > Jon > > > On 21/02/2014 15:08, Robert Castelo wrote: >> hi Jonathan cc Sonja, >> >> you're right, this is a bug occuring when the input is an >> 'ExpressionSet' object and the argument 'method' as a value different >> than 'gsva'. >> >> i've just submitted a fix to both, the devel and the release versions >> of GSVA, they should become available via biocLite() in the following >> 24/48 hrs as versions 1.11.4 and 1.10.3, respectively. >> >> in the meantime, you can always check out directly from the >> corresponding svn repository (current release or devel) the updated >> source and build it yourself; see >> >> http://master.bioconductor.org/developers/how-to/source-control >> >> >> >> sorry for the inconvenience, >> >> robert. >> >> On 02/21/2014 03:42 PM, Jonathan Manning wrote: >>> Hi Sonja, >>> >>> No, that's not it. >>> >>> The probe IDs map perfectly fine through the annotation package and >>> GSEABase::mapIdentifiers() etc, and everything works fine if I hack >>> around the bug I described previously. Again, what you have is some code >>> (e.g. /'Biobase::exprs(eScoEset) <- eSco$es.obs) /), which assumes that >>> a list is returned by /GSVA:::.gsva()/. In the case of " >>> /method='plage'/ " et al, it is not. If you tweak things with that in >>> mind, there's no problem. >>> >>> Here is the code for gsva() for the ExpressionSet signature (retrieved >>> with /getMethod(gsva, signature=c(expr="ExpressionSet", >>> gset.idx.list="GeneSetCollection", annotation="missing"))/ ), with my >>> hacks included in bold: >>> >>> >>> function (expr, gset.idx.list, annotation = NULL, ...) >>> { >>> .local <- function (expr, gset.idx.list, annotation = NULL, >>> method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, >>> abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, >>> bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", >>> mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, >>> NA), kernel = TRUE, verbose = TRUE) >>> { >>> sdGenes <- Biobase::esApply(expr, 1, sd) >>> if (any(sdGenes == 0)) { >>> if (verbose) >>> cat("Filtering out", sum(sdGenes == 0), "genes with constant expression >>> values throuhgout the samples\n") >>> expr <- expr[sdGenes > 0, ] >>> } >>> if (nrow(expr) < 2) >>> stop("Less than two genes in the input ExpressionSet object\n") >>> if (verbose) >>> cat("Mapping identifiers between gene sets and feature names\n") >>> mapped.gset.idx.list <- GSEABase::mapIdentifiers(gset.idx.list, >>> GSEABase::AnnoOrEntrezIdentifier(Biobase::annotation(expr))) >>> tmp <- lapply(geneIds(mapped.gset.idx.list), function(x, >>> y) na.omit(match(x, y)), featureNames(expr)) >>> names(tmp) <- names(mapped.gset.idx.list) >>> mapped.gset.idx.list <- filterGeneSets(tmp, min.sz = max(1, min.sz), >>> max.sz = max.sz) >>> eSco <- GSVA:::.gsva(Biobase::exprs(expr), mapped.gset.idx.list, >>> method, rnaseq, abs.ranking, no.bootstraps, bootstrap.percent, >>> parallel.sz, parallel.type, mx.diff, tau, kernel, >>> verbose) >>> eScoEset <- expr >>> #Biobase::exprs(eScoEset) <- eSco$es.obs >>> * if (method=='gsva'){ >>> Biobase::exprs(eScoEset) <- eSco$es.obs >>> }else{ >>> Biobase::exprs(eScoEset) <- eSco >>> }* >>> >>> Biobase::annotation(eScoEset) <- "" >>> #return(list(es.obs = eScoEset, bootstrap = eSco$bootstrap, >>> # p.vals.sign = eSco$p.vals.sign)) >>> *return (eScoEset)* >>> } >>> .local(expr, gset.idx.list, annotation, ...) >>> } >>> >>> Since you say the bootstrapping isn't much use anyway, I've just set it >>> to return the matrix in all cases. >>> >>> Jon >>> >>> >>> >>> >>> On 21/02/2014 14:05, Sonja Haenzelmann wrote: >>>> I think the problem lies in your expression set. >>>> The gene names have to be in ENTREZ gene Id format, as the c2BroadSets >>>> are in this format. gsva maps the genes in your expression set with >>>> the genes in the gene set list. If they do not have the same >>>> identifiers it cannot map the genes, and hence not calculate any >>>> statistics. >>>> >>>> >>>> library(GSVAdata) >>>> data(c2BroadSets) >>>> library(GSVA) >>>> >>>> >>>> example<- as.matrix(read.table("example.txt")) >>>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4') >>>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', >>>> 5)))) >>>> rownames(pdata_example)<- letters[1:10] >>>> pData(eset_example)<- pdata_example >>>> >>>> >>>>> featureNames(eset_example) >>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" >>>> "ILMN_1806310" >>>> [6] "ILMN_1779670" "ILMN_1653355" "ILMN_1717783" "ILMN_1705025" >>>> "ILMN_1814316" >>>> >>>> you will have to map your ILMN ids to entrez gene ids, then gsva >>>> should be working. >>>> >>>> >>>> For example in the toy example that shows up, when you do ?gsva, you >>>> will find that the gene names are the same in the gene set list as >>>> well as in the gene expression matrix. >>>> >>>> p<- 10 ## number of genes >>>> n<- 30 ## number of samples >>>> nGrp1<- 15 ## number of samples in group 1 >>>> nGrp2<- n - nGrp1 ## number of samples in group 2 >>>> >>>> ## consider three disjoint gene sets >>>> geneSets<- list(set1=paste("g", 1:3, sep=""), >>>> set2=paste("g", 4:6, sep=""), >>>> set3=paste("g", 7:10, sep="")) >>>> >>>> >>>> >>>>> geneSets >>>> $set1 >>>> [1] "g1" "g2" "g3" >>>> >>>> $set2 >>>> [1] "g4" "g5" "g6" >>>> >>>> $set3 >>>> [1] "g7" "g8" "g9" "g10" >>>> >>>> >>>> ## sample data from a normal distribution with mean 0 and st.dev. 1 >>>> y<- matrix(rnorm(n*p), nrow=p, ncol=n, >>>> dimnames=list(paste("g", 1:p, sep="") , paste("s", >>>> 1:n, sep=""))) >>>> >>>> rownames(y) >>>> [1] "g1" "g2" "g3" "g4" "g5" "g6" "g7" "g8" "g9" "g10" >>>> >>>> Best >>>> Sonja >>>> >>>> >>>> >>>> >>>> On Fri, Feb 21, 2014 at 2:39 PM, Jonathan Manning >>>> <jonathan.manning at="" ed.ac.uk=""> wrote: >>>>> example<- as.matrix(read.table("example.txt")) >>>>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4') >>>>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', >>>>> 5)))) >>>>> rownames(pdata_example)<- letters[1:10] >>>>> pData(eset_example)<- pdata_example >>>>> gsva_es<- gsva(eset_example, c2BroadSets , mx.diff=1, method='plage') >>> >>> >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550

ADD REPLY • link 10.2 years ago Robert Castelo ★ 3.3k

Login before adding your answer.