Hello,
I am trying to filter an expression set (GSE85127) for input into GSVA (following the vignette). I'm having problems when I attempt to filter the eset that I think are related to my lack of understanding regarding how to work with the custom CDF files provided by MBNI. Any help is greatly appreciated.
My code:
> library(GEOquery) > library(genefilter) > install.packages("http://mbni.org/customcdf/22.0.0/ensg.download/hugene11sthsensgcdf_22.0.0.tar.gz", repos = NULL) > library(hugene11sthsensgcdf) > cavalliMB <- getGEO("GSE85217") > cavalliMB <- cavalliMB[[1]] > filtered_eset <- nsFilter(cavalliMB, require.entrez=FALSE, remove.dupEntrez=TRUE, var.func=IQR, var.filter=TRUE, var.cutoff=0.5, filterByQuantile=TRUE, feature.exclude="^AFFX") Error: getAnnMap: package GPL22286 not available
When I set remove.dupEntrez = FALSE
the filtering seems to work fine but I get the same Error: getAnnMap: package GPL22286 not available
when running gsva
.
Alternatively, when I try to read in and normalize with oligo
first, I'm getting a different error:
> library(oligo) > library(hugene11sthsensgcdf) > CEL.files <- list.files(path = "Data/Raw", pattern = ".CEL", full.names = TRUE) > raw.data <- read.celfiles(CEL.files, pkgname = "hugene11sthsensgcdf") Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘kind’ for signature ‘"environment"’
I tried to run install.packages("http://mbni.org/customcdf/21.0.0/entrezg.download/pd.hugene11st.hs.entrezg_21.0.0.tar.gz")
but got back a warning that this isn't available for R 3.5. I also tried to use read.celfiles(list.celfiles(), pkgname = "pd.hugene11st.hs.entrezg")
but got back:
Loading required package: pd.hugene11st.hs.ensg Attempting to obtain 'pd.hugene11st.hs.ensg' from BioConductor website. Checking to see if your internet connection works... Package 'pd.hugene11st.hs.ensg' was not found in the BioConductor repository. The 'pdInfoBuilder' package can often be used in situations like this. Error in read.celfiles(CEL.files, pkgname = "pd.hugene11st.hs.ensg") : The annotation package, pd.hugene11st.hs.ensg, could not be loaded. In addition: Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘pd.hugene11st.hs.ensg’
Not sure if this is helpful/relevant but in the original publication the methods section states that "hugene11sthsensgcdf (v19.0.0)" CDF was used for annotation.
> sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GSVA_1.28.0 RColorBrewer_1.1-2 limma_3.36.1 GSVAdata_1.16.0 hgu95a.db_3.2.3 [6] org.Hs.eg.db_3.6.0 GSEABase_1.42.0 graph_1.58.0 annotate_1.58.0 XML_3.98-1.11 [11] hugene11sthsensgcdf_22.0.0 AnnotationDbi_1.42.1 IRanges_2.14.10 S4Vectors_0.18.2 bindrcpp_0.2.2 [16] affycoretools_1.52.1 genefilter_1.62.0 GEOquery_2.48.0 Biobase_2.40.0 BiocGenerics_0.26.0 loaded via a namespace (and not attached): [1] backports_1.1.2 GOstats_2.46.0 Hmisc_4.1-1 plyr_1.8.4 [5] lazyeval_0.2.1 splines_3.5.0 BiocParallel_1.14.1 GenomeInfoDb_1.16.0 [9] ggplot2_2.2.1 digest_0.6.15 foreach_1.4.4 BiocInstaller_1.30.0 [13] ensembldb_2.4.1 htmltools_0.3.6 GO.db_3.6.0 gdata_2.18.0 [17] magrittr_1.5 checkmate_1.8.5 memoise_1.1.0 BSgenome_1.48.0 [21] cluster_2.0.7-1 gcrma_2.52.0 Biostrings_2.48.0 readr_1.1.1 [25] matrixStats_0.53.1 R.utils_2.6.0 ggbio_1.28.0 prettyunits_1.0.2 [29] colorspace_1.3-2 blob_1.1.1 dplyr_0.7.5 RCurl_1.95-4.10 [33] bindr_0.1.1 survival_2.41-3 VariantAnnotation_1.26.0 iterators_1.0.9 [37] glue_1.2.0 gtable_0.2.0 zlibbioc_1.26.0 XVector_0.20.0 [41] DelayedArray_0.6.0 Rgraphviz_2.24.0 scales_0.5.0 DBI_1.0.0 [45] GGally_1.4.0 edgeR_3.22.2 Rcpp_0.12.17 xtable_1.8-2 [49] progress_1.1.2 htmlTable_1.12 foreign_0.8-70 bit_1.1-14 [53] OrganismDbi_1.22.0 preprocessCore_1.42.0 Formula_1.2-3 AnnotationForge_1.22.0 [57] htmlwidgets_1.2 httr_1.3.1 gplots_3.0.1 acepack_1.4.1 [61] ff_2.2-14 pkgconfig_2.0.1 reshape_0.8.7 R.methodsS3_1.7.1 [65] nnet_7.3-12 locfit_1.5-9.1 later_0.7.2 tidyselect_0.2.4 [69] rlang_0.2.1 reshape2_1.4.3 munsell_0.4.3 tools_3.5.0 [73] RSQLite_2.1.1 stringr_1.3.1 knitr_1.20 bit64_0.9-7 [77] oligoClasses_1.42.0 caTools_1.17.1 purrr_0.2.5 AnnotationFilter_1.4.0 [81] RBGL_1.56.0 mime_0.5 R.oo_1.22.0 xml2_1.2.0 [85] biomaRt_2.36.1 shinythemes_1.1.1 compiler_3.5.0 rstudioapi_0.7 [89] curl_3.2 affyio_1.50.0 PFAM.db_3.6.0 tibble_1.4.2 [93] geneplotter_1.58.0 stringi_1.1.7 GenomicFeatures_1.32.0 lattice_0.20-35 [97] ProtGenerics_1.12.0 Matrix_1.2-14 pillar_1.2.3 data.table_1.11.4 [101] bitops_1.0-6 httpuv_1.4.3 rtracklayer_1.40.3 GenomicRanges_1.32.3 [105] R6_2.2.2 latticeExtra_0.6-28 affy_1.58.0 hwriter_1.3.2 [109] promises_1.0.1 KernSmooth_2.23-15 gridExtra_2.3 codetools_0.2-15 [113] dichromat_2.0-0 gtools_3.5.0 assertthat_0.2.0 SummarizedExperiment_1.10.1 [117] DESeq2_1.20.0 Category_2.46.0 ReportingTools_2.20.0 GenomicAlignments_1.16.0 [121] Rsamtools_1.32.0 GenomeInfoDbData_1.1.0 hms_0.4.2 grid_3.5.0 [125] rpart_4.1-13 tidyr_0.8.1 biovizBase_1.28.0 shiny_1.1.0 [129] base64enc_0.1-3
hi, this is not a GSVA-specific issue. The filtering of genes one does for GSVA is the same you would do for a differential expression analysis. The problem you show has to do the particular microarray chip to which the data you are analyzing belongs to and how to analyze it using the oligo package.
Thank you, Robert. I've modified the post to remove the GSVA tag and added oligo.