Non-specific filtering of Affymetrix Microarray data
1
0
Entering edit mode
Guest User ★ 12k
@guest-user-4897
Last seen 7.4 years ago
During non-specific filtering, I am using parameters for filtering probes (require.entrez=TRUE, remove.dupEntrez=TRUE,feature.exclude="^AFFX) in addition to the filters of intensity and variance. Independently, both filters works fine, but when I try to use them together, I am getting an error written below: Error in apply(expr, 1, flist) : dim(X) must have a positive length Please help me with this. I have pasted the code below. #1.Getting the data source("http://bioconductor.org/biocLite.R") biocLite("GEOquery") biocLite("affycoretools") library(GEOquery) setwd("/home/vinay/R/R-3.0.2") getGEOSuppFiles("GSE6631") setwd("/home/vinay/R/R-3.0.2/GSE6631") system("tar -xvf GSE6631_RAW.tar") cels <- list.files( pattern = "[gz]") sapply(cels, gunzip) #2.Loading and normalising the data using GC-RMA # You may need to copy your phenodata.txt file into the GSE6631 folder library(affy) library(affycoretools) data <- ReadAffy() pData(data)<-read.table("phenodata.txt", header=T,row.names=1, sep="\t") pData(data) eset <- gcrma(data) eset dim(eset) pData(eset) write.exprs(eset, file="Expression_values_GCRMA_normalize.xls") eset2<-eset[,pData(eset)[,"Condition"]%in%c("Normal","Cancer")] #3. Non-specific Filtering data library(genefilter) celfiles_filtered <- nsFilter(eset2, require.entrez=TRUE, remove.dupEntrez=TRUE,feature.exclude="^AFFX") f1<-pOverA(0.10,log2(100)) #intensity filter-the intensity of a gene should be above log2(100) in at least 25 percent of the samples f2<-function(x)(IQR(x)>0.5) #variance filter-the interquartile range of log2???intensities should be at least 0.5 ff<-filterfun(f1,f2) selected<-genefilter(celfiles_filtered,ff) -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN [4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN [7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] hgu95av2.db_2.10.1 org.Hs.eg.db_2.10.1 [3] arrayQualityMetrics_3.18.0 affyPLM_1.38.0 [5] preprocessCore_1.24.0 RColorBrewer_1.0-5 [7] hgu95av2probe_2.13.0 affycoretools_1.34.0 [9] KEGG.db_2.10.1 GO.db_2.10.1 [11] RSQLite_0.11.4 DBI_0.2-7 [13] limma_3.18.12 hgu95av2cdf_2.13.0 [15] AnnotationDbi_1.24.0 simpleaffy_2.38.0 [17] genefilter_1.44.0 gcrma_2.34.0 [19] affy_1.40.0 GEOquery_2.28.0 [21] Biobase_2.22.0 BiocGenerics_0.8.0 [23] BiocInstaller_1.12.0 loaded via a namespace (and not attached): [1] affyio_1.30.0 annaffy_1.34.0 annotate_1.40.0 [4] AnnotationForge_1.4.4 beadarray_2.12.0 BeadDataPackR_1.14.0 [7] biomaRt_2.18.0 Biostrings_2.30.1 biovizBase_1.10.7 [10] bit_1.1-11 bitops_1.0-6 BSgenome_1.30.0 [13] Cairo_1.5-5 Category_2.28.0 caTools_1.16 [16] cluster_1.14.4 codetools_0.2-8 colorspace_1.2-4 [19] DESeq2_1.2.10 dichromat_2.0-0 digest_0.6.4 [22] edgeR_3.4.2 ff_2.2-12 foreach_1.4.1 [25] Formula_1.1-1 gdata_2.13.2 GenomicFeatures_1.14.2 [28] GenomicRanges_1.14.4 ggbio_1.10.11 ggplot2_0.9.3.1 [31] GOstats_2.28.0 gplots_2.12.1 graph_1.40.1 [34] grid_3.0.2 gridExtra_0.9.1 GSEABase_1.24.0 [37] gtable_0.1.2 gtools_3.3.0 Hmisc_3.14-0 [40] hwriter_1.3 IRanges_1.20.6 iterators_1.0.6 [43] KernSmooth_2.23-10 labeling_0.2 lattice_0.20-24 [46] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-29 [49] Matrix_1.1-2 munsell_0.4.2 oligoClasses_1.24.0 [52] PFAM.db_2.10.1 plyr_1.8 proto_0.3-10 [55] R2HTML_2.2.1 RBGL_1.38.0 Rcpp_0.11.0 [58] RcppArmadillo_0.4.000.2 RCurl_1.95-4.1 ReportingTools_2.2.0 [61] reshape2_1.2.2 R.methodsS3_1.6.1 R.oo_1.17.0 [64] Rsamtools_1.14.3 rtracklayer_1.22.3 R.utils_1.29.8 [67] scales_0.2.3 setRNG_2011.11-2 splines_3.0.2 [70] stats4_3.0.2 stringr_0.6.2 survival_2.37-7 [73] SVGAnnotation_0.93-1 tcltk_3.0.2 tools_3.0.2 [76] VariantAnnotation_1.8.12 vsn_3.30.0 XML_3.98-1.1 [79] xtable_1.7-1 XVector_0.2.0 zlibbioc_1.8.0 > -- Sent via the guest posting facility at bioconductor.org.
GO hgu95av2 GO hgu95av2 • 1.2k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 10 days ago
EMBL European Molecular Biology Laborat…
Hi Vinay a look in the man page of ?nsFilter? indicates that its output is a list, one of whose elements is ? eset?, the filtered ExpressionSet. You could try (I haven?t checked) with selected<-genefilter(celfiles_filtered$est, ff) But I also wonder why you would want to do this? DId you explore the ' var.cutoff?, ?filterByQuantile? arguments of ?nsFilter?? Wolfgang On 18 Feb 2014, at 05:07, Vinay Randhawa [guest] <guest at="" bioconductor.org=""> wrote: > > During non-specific filtering, I am using parameters for filtering probes (require.entrez=TRUE, remove.dupEntrez=TRUE,feature.exclude="^AFFX) in addition to the filters of intensity and variance. Independently, both filters works fine, but when I try to use them together, I am getting an error written below: > Error in apply(expr, 1, flist) : dim(X) must have a positive length > > > Please help me with this. > > > I have pasted the code below. > > #1.Getting the data > source("http://bioconductor.org/biocLite.R") > biocLite("GEOquery") > biocLite("affycoretools") > library(GEOquery) > setwd("/home/vinay/R/R-3.0.2") > getGEOSuppFiles("GSE6631") > setwd("/home/vinay/R/R-3.0.2/GSE6631") > > system("tar -xvf GSE6631_RAW.tar") > cels <- list.files( pattern = "[gz]") > sapply(cels, gunzip) > > #2.Loading and normalising the data using GC-RMA > # You may need to copy your phenodata.txt file into the GSE6631 folder > library(affy) > library(affycoretools) > data <- ReadAffy() > pData(data)<-read.table("phenodata.txt", header=T,row.names=1, sep="\t") > pData(data) > eset <- gcrma(data) > eset > dim(eset) > pData(eset) > write.exprs(eset, file="Expression_values_GCRMA_normalize.xls") > eset2<-eset[,pData(eset)[,"Condition"]%in%c("Normal","Cancer")] > > > #3. Non-specific Filtering data > library(genefilter) > celfiles_filtered <- nsFilter(eset2, require.entrez=TRUE, remove.dupEntrez=TRUE,feature.exclude="^AFFX") > f1<-pOverA(0.10,log2(100)) #intensity filter-the intensity of a gene should be above log2(100) in at least 25 percent of the samples > f2<-function(x)(IQR(x)>0.5) #variance filter-the interquartile range of log2???intensities should be at least 0.5 > ff<-filterfun(f1,f2) > selected<-genefilter(celfiles_filtered,ff) > > > > > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN > [4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN > [7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] hgu95av2.db_2.10.1 org.Hs.eg.db_2.10.1 > [3] arrayQualityMetrics_3.18.0 affyPLM_1.38.0 > [5] preprocessCore_1.24.0 RColorBrewer_1.0-5 > [7] hgu95av2probe_2.13.0 affycoretools_1.34.0 > [9] KEGG.db_2.10.1 GO.db_2.10.1 > [11] RSQLite_0.11.4 DBI_0.2-7 > [13] limma_3.18.12 hgu95av2cdf_2.13.0 > [15] AnnotationDbi_1.24.0 simpleaffy_2.38.0 > [17] genefilter_1.44.0 gcrma_2.34.0 > [19] affy_1.40.0 GEOquery_2.28.0 > [21] Biobase_2.22.0 BiocGenerics_0.8.0 > [23] BiocInstaller_1.12.0 > > loaded via a namespace (and not attached): > [1] affyio_1.30.0 annaffy_1.34.0 annotate_1.40.0 > [4] AnnotationForge_1.4.4 beadarray_2.12.0 BeadDataPackR_1.14.0 > [7] biomaRt_2.18.0 Biostrings_2.30.1 biovizBase_1.10.7 > [10] bit_1.1-11 bitops_1.0-6 BSgenome_1.30.0 > [13] Cairo_1.5-5 Category_2.28.0 caTools_1.16 > [16] cluster_1.14.4 codetools_0.2-8 colorspace_1.2-4 > [19] DESeq2_1.2.10 dichromat_2.0-0 digest_0.6.4 > [22] edgeR_3.4.2 ff_2.2-12 foreach_1.4.1 > [25] Formula_1.1-1 gdata_2.13.2 GenomicFeatures_1.14.2 > [28] GenomicRanges_1.14.4 ggbio_1.10.11 ggplot2_0.9.3.1 > [31] GOstats_2.28.0 gplots_2.12.1 graph_1.40.1 > [34] grid_3.0.2 gridExtra_0.9.1 GSEABase_1.24.0 > [37] gtable_0.1.2 gtools_3.3.0 Hmisc_3.14-0 > [40] hwriter_1.3 IRanges_1.20.6 iterators_1.0.6 > [43] KernSmooth_2.23-10 labeling_0.2 lattice_0.20-24 > [46] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-29 > [49] Matrix_1.1-2 munsell_0.4.2 oligoClasses_1.24.0 > [52] PFAM.db_2.10.1 plyr_1.8 proto_0.3-10 > [55] R2HTML_2.2.1 RBGL_1.38.0 Rcpp_0.11.0 > [58] RcppArmadillo_0.4.000.2 RCurl_1.95-4.1 ReportingTools_2.2.0 > [61] reshape2_1.2.2 R.methodsS3_1.6.1 R.oo_1.17.0 > [64] Rsamtools_1.14.3 rtracklayer_1.22.3 R.utils_1.29.8 > [67] scales_0.2.3 setRNG_2011.11-2 splines_3.0.2 > [70] stats4_3.0.2 stringr_0.6.2 survival_2.37-7 > [73] SVGAnnotation_0.93-1 tcltk_3.0.2 tools_3.0.2 > [76] VariantAnnotation_1.8.12 vsn_3.30.0 XML_3.98-1.1 > [79] xtable_1.7-1 XVector_0.2.0 zlibbioc_1.8.0 >> > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6