nsFilter error in genefilter
1
0
Entering edit mode
steven wink ▴ 90
@steven-wink-5440
Last seen 4.9 years ago
Dear Jim, I am facing the same problem, and your idea would be great for me but I ran into a problem: cannot change featureNames of Affybatch. > my.data my.data AffyBatch object size of arrays=744x744 features (23 kb) cdf=HT_HG-U133_Plus_PM (54715 affyids) number of samples=16 number of genes=54715 annotation=hthgu133pluspm notes= > featureNames(my.data) <- gsub("_PM","", featureNames(my.data)) Error in `featureNames<-`(`*tmp*`, value = c("1007_s_at", "1053_at", "117_at", : * Cannot change featureNames of AffyBatch* I tried running R as super user but same result. I also want to replace the default cdf by a brainarray cdf after this step. ps. I can perform vsnrma(), but e.g. nsFilter apparently needs the annotation file so I have to switch to the plus2 or make the "hthgu133pluspm.db" package (which I never tried before) Do you have any suggestions on the *Cannot change featureNames of AffyBatch? * R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] hthgu133pluspmcdf_2.12.0 genefilter_1.42.0 vsn_3.28.0 [4] arrayQualityMetrics_3.16.0 hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0 [7] RSQLite_0.11.4 DBI_0.2-7 hthgu133pluspmhsentrezgcdf_17.1.0 [10] AnnotationDbi_1.22.5 limma_3.16.5 affy_1.38.1 [13] Biobase_2.20.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.28.0 affyPLM_1.36.0 annotate_1.38.0 beadarray_2.10.0 BeadDataPackR_1.12.0 BiocInstaller_1.10.2 [7] Biostrings_2.28.0 Cairo_1.5-2 cluster_1.14.4 colorspace_1.2-2 gcrma_2.32.0 grid_3.0.1 [13] Hmisc_3.10-1.1 hwriter_1.3 IRanges_1.18.1 lattice_0.20-15 latticeExtra_0.6-24 plyr_1.8 [19] preprocessCore_1.22.0 RColorBrewer_1.0-5 reshape2_1.2.2 setRNG_2011.11-2 splines_3.0.1 stats4_3.0.1 [25] stringr_0.6.2 survival_2.37-4 SVGAnnotation_0.93-1 tools_3.0.1 XML_3.98-1.1 xtable_1.7-1 [31] zlibbioc_1.6.0 2013/4/17 James W. MacDonald <jmacdon@uw.edu> > Hi Zhenya, > > > On 4/17/2013 12:02 PM, Zhenya [guest] wrote: > >> Hi All, >> >> I am trying to run the code for GSVA (library with the same name). The >> code is below, but the main error is around annotation: >> >>> source("http://bioconductor.**org/biocLite.R<http: bioconductor.o="" rg="" bioclite.r=""> >>> ") >>> >> Bioconductor version 2.12 (BiocInstaller 1.10.0), ?biocLite for help >> >>> biocLite("hthgu133pluspm.db") >>> >> > There is no such package. You could easily create one yourself using the > AnnotationForge package (see the vignette). Or you could note that the > hthgu133pluspm array has identical content as the hgu133plus2 array, except > for a few extra control probesets, and the fact that they insisted on > adding an extra _PM to all the probesets. > > > sum(ls(hgu133plus2cdf) %in% gsub("_PM","", ls(hthgu133pluspmcdf))) > [1] 54675 > > length(ls(hgu133plus2cdf)) > [1] 54675 > > length(ls(hthgu133pluspmcdf)) > [1] 54715 > > ls(hthgu133pluspmcdf)[!gsub("_**PM","", ls(hthgu133pluspmcdf)) %in% > ls(hgu133plus2cdf)] > [1] "AFFX-NonspecificGC10_at" "AFFX-NonspecificGC11_at" > [3] "AFFX-NonspecificGC12_at" "AFFX-NonspecificGC13_at" > [5] "AFFX-NonspecificGC14_at" "AFFX-NonspecificGC15_at" > [7] "AFFX-NonspecificGC16_at" "AFFX-NonspecificGC17_at" > [9] "AFFX-NonspecificGC18_at" "AFFX-NonspecificGC19_at" > [11] "AFFX-NonspecificGC20_at" "AFFX-NonspecificGC21_at" > [13] "AFFX-NonspecificGC22_at" "AFFX-NonspecificGC23_at" > [15] "AFFX-NonspecificGC24_at" "AFFX-NonspecificGC25_at" > [17] "AFFX-NonspecificGC3_at" "AFFX-NonspecificGC4_at" > [19] "AFFX-NonspecificGC5_at" "AFFX-NonspecificGC6_at" > [21] "AFFX-NonspecificGC7_at" "AFFX-NonspecificGC8_at" > [23] "AFFX-NonspecificGC9_at" "AFFX-r2-TagA_at" > [25] "AFFX-r2-TagB_at" "AFFX-r2-TagC_at" > [27] "AFFX-r2-TagD_at" "AFFX-r2-TagE_at" > [29] "AFFX-r2-TagF_at" "AFFX-r2-TagG_at" > [31] "AFFX-r2-TagH_at" "AFFX-r2-TagIN-3_at" > [33] "AFFX-r2-TagIN-5_at" "AFFX-r2-TagIN-M_at" > [35] "AFFX-r2-TagJ-3_at" "AFFX-r2-TagJ-5_at" > [37] "AFFX-r2-TagO-3_at" "AFFX-r2-TagO-5_at" > [39] "AFFX-r2-TagQ-3_at" "AFFX-r2-TagQ-5_at" > > So you could either go to the trouble of building and installing a .db > package for this array, or you could do something like > > featureNames(EsetData) <- gsub("_PM","", featureNames(EsetData)) > annotation(EsetData) <- "hgu133plus2.db" > > and carry on as before. > > Best, > > Jim > > > > BioC_mirror: http://bioconductor.org >> Using Bioconductor version 2.12 (BiocInstaller 1.10.0), R version 3.0.0. >> Installing package(s) 'hthgu133pluspm.db' >> Warning message: >> package ‘hthgu133pluspm.db’ is not available (for R version 3.0.0) >> >> Code: >> >> # CREATE GeneSetCollection >> library(GSEABase) >> x<- scan("GeneSets.gmt", what="", sep="\n") >> GeneSets.gmt<- strsplit(x, "[[:space:]]+") >> names(GeneSets.gmt)<- sapply(GeneSets.gmt, `[[`, 1) >> GeneSets.gmt<- lapply(GeneSets.gmt, `[`, -1) >> n<- names(GeneSets.gmt) >> uniqueList<- lapply(GeneSets.gmt, unique) >> makeSet<- function(geneIds, n) {GeneSet(geneIds, >> geneIdType=SymbolIdentifier(), setName=n)} >> gsList<- gsc<- mapply(makeSet, uniqueList[], n) >> gsc<- GeneSetCollection(gsList) >> >> # DATASET >> # CREATE ExpressionSet >> exprs<- as.matrix(read.table("**ExprData.txt", header=TRUE, sep="\t", >> row.names=1, as.is=TRUE)) >> pData<- read.table("DesignFile.txt",**row.names=1, header=T,sep="\t") >> phenoData<- new("AnnotatedDataFrame",data=**pData) >> annotation<- "hthgu133pluspm.db" >> EsetData<- ExpressionSet(assayData=exprs,**phenoData=phenoData,** >> annotation="hthgu133pluspm") >> head(ExprData) >> >> #Gene Filtering >> library(genefilter) >> library("hthgu133pluspm") >> filtered_eset<- nsFilter(EsetData, require.entrez=TRUE, >> remove.dupEntrez=TRUE, var.func=IQR, var.filter=FALSE, var.cutoff=0.25, >> filterByQuantile=TRUE, feature.exclude="^AFFX") >> # get stats for numbers of probesets removed >> filtered_eset >> EsetData_f<- filtered_eset$eset >> >> # GSVA >> library(GSVA) >> gsva_es<- gsva(EsetData_f,gsc,abs.**ranking=FALSE,min.sz=1,max.sz=** >> 1000,mx.diff=TRUE)$es.obs >> >> I downloaded hthgu133pluspm from http://nmg-r.bioinformatics.** >> nl/NuGO_R.html <http: nmg-r.bioinformatics.nl="" nugo_r.html=""> >> and R still complains. The available on Bioconductor: >> hthgu133pluspmprobe >> and >> hthgu133pluspmcdf >> are not correct and give error for nsFilter and gsva: >> Error in (function (classes, fdef, mtable) : >> unable to find an inherited method for function ‘cols’ for >> signature ‘"environment"’ >> >> Mapping identifiers between gene sets and feature names >> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., verbose >> = verbose)) : >> error in evaluating the argument 'object' in selecting a method for >> function 'GeneSetCollection': Error in (function (classes, fdef, mtable) : >> unable to find an inherited method for function ‘cols’ for >> signature ‘"environment"’ >> >> >> Thank you, >> Zhenya >> >> -- output of sessionInfo(): >> >> R version 3.0.0 (2013-04-03) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] GSVA_1.8.0 BiocInstaller_1.10.0 >> hthgu133pluspmprobe_2.12.0 hthgu133pluspmcdf_2.12.0 genefilter_1.42.0 >> GSEABase_1.22.0 >> [7] graph_1.38.0 annotate_1.38.0 >> AnnotationDbi_1.22.1 Biobase_2.20.0 BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] DBI_0.2-5 IRanges_1.18.0 RSQLite_0.11.2 splines_3.0.0 >> stats4_3.0.0 survival_2.37-4 tools_3.0.0 XML_3.96-1.1 xtable_1.7-1 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
GO hgu133plus2 cdf GSVA AnnotationForge GO hgu133plus2 cdf GSVA AnnotationForge • 1.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States
Hi Steven, On 7/5/2013 11:11 AM, steven wink wrote: > Dear Jim, > > I am facing the same problem, and your idea would be great for me but > I ran into a problem: cannot change featureNames of Affybatch. > > > my.data > my.data > AffyBatch object > size of arrays=744x744 features (23 kb) > cdf=HT_HG-U133_Plus_PM (54715 affyids) > number of samples=16 > number of genes=54715 > annotation=hthgu133pluspm > notes= > > > featureNames(my.data) <- gsub("_PM","", featureNames(my.data)) > > Error in `featureNames<-`(`*tmp*`, value = c("1007_s_at", "1053_at", > "117_at", : > *Cannot change featureNames of AffyBatch* > > I tried running R as super user but same result. > I also want to replace the default cdf by a brainarray cdf after this > step. > > ps. I can perform vsnrma(), but e.g. nsFilter apparently needs the > annotation file so I have to switch to the plus2 or make the > "hthgu133pluspm.db" package (which I never tried before) > Do you have any suggestions on the *Cannot change featureNames of > AffyBatch?* You could use my original suggestion, which was to change the featureNames of the ExpressionSet object that you get after summarizing. You are trying to change the featureNames on the AffyBatch object, prior to summarizing, which is not what I suggested. Best, Jim > > > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] hthgu133pluspmcdf_2.12.0 > genefilter_1.42.0 vsn_3.28.0 > [4] arrayQualityMetrics_3.16.0 > hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0 > [7] RSQLite_0.11.4 > DBI_0.2-7 hthgu133pluspmhsentrezgcdf_17.1.0 > [10] AnnotationDbi_1.22.5 > limma_3.16.5 affy_1.38.1 > [13] Biobase_2.20.0 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 affyPLM_1.36.0 annotate_1.38.0 > beadarray_2.10.0 BeadDataPackR_1.12.0 BiocInstaller_1.10.2 > [7] Biostrings_2.28.0 Cairo_1.5-2 cluster_1.14.4 > colorspace_1.2-2 gcrma_2.32.0 grid_3.0.1 > [13] Hmisc_3.10-1.1 hwriter_1.3 IRanges_1.18.1 > lattice_0.20-15 latticeExtra_0.6-24 plyr_1.8 > [19] preprocessCore_1.22.0 RColorBrewer_1.0-5 reshape2_1.2.2 > setRNG_2011.11-2 splines_3.0.1 stats4_3.0.1 > [25] stringr_0.6.2 survival_2.37-4 SVGAnnotation_0.93-1 > tools_3.0.1 XML_3.98-1.1 xtable_1.7-1 > [31] zlibbioc_1.6.0 > > > 2013/4/17 James W. MacDonald <jmacdon at="" uw.edu="" <mailto:jmacdon="" at="" uw.edu="">> > > Hi Zhenya, > > > On 4/17/2013 12:02 PM, Zhenya [guest] wrote: > > Hi All, > > I am trying to run the code for GSVA (library with the same > name). The code is below, but the main error is around annotation: > > source("http://bioconductor.org/biocLite.R") > > Bioconductor version 2.12 (BiocInstaller 1.10.0), ?biocLite > for help > > biocLite("hthgu133pluspm.db") > > > There is no such package. You could easily create one yourself > using the AnnotationForge package (see the vignette). Or you could > note that the hthgu133pluspm array has identical content as the > hgu133plus2 array, except for a few extra control probesets, and > the fact that they insisted on adding an extra _PM to all the > probesets. > > > sum(ls(hgu133plus2cdf) %in% gsub("_PM","", ls(hthgu133pluspmcdf))) > [1] 54675 > > length(ls(hgu133plus2cdf)) > [1] 54675 > > length(ls(hthgu133pluspmcdf)) > [1] 54715 > > ls(hthgu133pluspmcdf)[!gsub("_PM","", ls(hthgu133pluspmcdf)) > %in% ls(hgu133plus2cdf)] > [1] "AFFX-NonspecificGC10_at" "AFFX-NonspecificGC11_at" > [3] "AFFX-NonspecificGC12_at" "AFFX-NonspecificGC13_at" > [5] "AFFX-NonspecificGC14_at" "AFFX-NonspecificGC15_at" > [7] "AFFX-NonspecificGC16_at" "AFFX-NonspecificGC17_at" > [9] "AFFX-NonspecificGC18_at" "AFFX-NonspecificGC19_at" > [11] "AFFX-NonspecificGC20_at" "AFFX-NonspecificGC21_at" > [13] "AFFX-NonspecificGC22_at" "AFFX-NonspecificGC23_at" > [15] "AFFX-NonspecificGC24_at" "AFFX-NonspecificGC25_at" > [17] "AFFX-NonspecificGC3_at" "AFFX-NonspecificGC4_at" > [19] "AFFX-NonspecificGC5_at" "AFFX-NonspecificGC6_at" > [21] "AFFX-NonspecificGC7_at" "AFFX-NonspecificGC8_at" > [23] "AFFX-NonspecificGC9_at" "AFFX-r2-TagA_at" > [25] "AFFX-r2-TagB_at" "AFFX-r2-TagC_at" > [27] "AFFX-r2-TagD_at" "AFFX-r2-TagE_at" > [29] "AFFX-r2-TagF_at" "AFFX-r2-TagG_at" > [31] "AFFX-r2-TagH_at" "AFFX-r2-TagIN-3_at" > [33] "AFFX-r2-TagIN-5_at" "AFFX-r2-TagIN-M_at" > [35] "AFFX-r2-TagJ-3_at" "AFFX-r2-TagJ-5_at" > [37] "AFFX-r2-TagO-3_at" "AFFX-r2-TagO-5_at" > [39] "AFFX-r2-TagQ-3_at" "AFFX-r2-TagQ-5_at" > > So you could either go to the trouble of building and installing a > .db package for this array, or you could do something like > > featureNames(EsetData) <- gsub("_PM","", featureNames(EsetData)) > annotation(EsetData) <- "hgu133plus2.db" > > and carry on as before. > > Best, > > Jim > > > > BioC_mirror: http://bioconductor.org > Using Bioconductor version 2.12 (BiocInstaller 1.10.0), R > version 3.0.0. > Installing package(s) 'hthgu133pluspm.db' > Warning message: > package ???hthgu133pluspm.db??? is not available (for R > version 3.0.0) > > Code: > > # CREATE GeneSetCollection > library(GSEABase) > x<- scan("GeneSets.gmt", what="", sep="\n") > GeneSets.gmt<- strsplit(x, "[[:space:]]+") > names(GeneSets.gmt)<- sapply(GeneSets.gmt, `[[`, 1) > GeneSets.gmt<- lapply(GeneSets.gmt, `[`, -1) > n<- names(GeneSets.gmt) > uniqueList<- lapply(GeneSets.gmt, unique) > makeSet<- function(geneIds, n) {GeneSet(geneIds, > geneIdType=SymbolIdentifier(), setName=n)} > gsList<- gsc<- mapply(makeSet, uniqueList[], n) > gsc<- GeneSetCollection(gsList) > > # DATASET > # CREATE ExpressionSet > exprs<- as.matrix(read.table("ExprData.txt", header=TRUE, > sep="\t", row.names=1, as.is <http: as.is="">=TRUE)) > pData<- read.table("DesignFile.txt",row.names=1, > header=T,sep="\t") > phenoData<- new("AnnotatedDataFrame",data=pData) > annotation<- "hthgu133pluspm.db" > EsetData<- > ExpressionSet(assayData=exprs,phenoData=phenoData,annotation ="hthgu133pluspm") > head(ExprData) > > #Gene Filtering > library(genefilter) > library("hthgu133pluspm") > filtered_eset<- nsFilter(EsetData, require.entrez=TRUE, > remove.dupEntrez=TRUE, var.func=IQR, var.filter=FALSE, > var.cutoff=0.25, filterByQuantile=TRUE, feature.exclude="^AFFX") > # get stats for numbers of probesets removed > filtered_eset > EsetData_f<- filtered_eset$eset > > # GSVA > library(GSVA) > gsva_es<- gsva(EsetData_f,gsc,abs.ranking=FALSE,min.sz > <http: min.sz="">=1,max.sz <http: max.sz="">=1000,mx.diff=TRUE)$es.obs > > I downloaded hthgu133pluspm from > http://nmg-r.bioinformatics.nl/NuGO_R.html > and R still complains. The available on Bioconductor: > hthgu133pluspmprobe > and > hthgu133pluspmcdf > are not correct and give error for nsFilter and gsva: > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function ???cols??? > for signature ???"environment"??? > > Mapping identifiers between gene sets and feature names > Error in GeneSetCollection(lapply(what, mapIdentifiers, to, > ..., verbose = verbose)) : > error in evaluating the argument 'object' in selecting a > method for function 'GeneSetCollection': Error in (function > (classes, fdef, mtable) : > unable to find an inherited method for function ???cols??? > for signature ???"environment"??? > > > Thank you, > Zhenya > > -- output of sessionInfo(): > > R version 3.0.0 (2013-04-03) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > LC_CTYPE=English_United States.1252 > LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] GSVA_1.8.0 BiocInstaller_1.10.0 > hthgu133pluspmprobe_2.12.0 hthgu133pluspmcdf_2.12.0 > genefilter_1.42.0 GSEABase_1.22.0 > [7] graph_1.38.0 annotate_1.38.0 > AnnotationDbi_1.22.1 Biobase_2.20.0 > BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] DBI_0.2-5 IRanges_1.18.0 RSQLite_0.11.2 > splines_3.0.0 stats4_3.0.0 survival_2.37-4 tools_3.0.0 > XML_3.96-1.1 xtable_1.7-1 > > > -- > Sent via the guest posting facility at bioconductor.org > <http: bioconductor.org="">. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6