Hello,
I have finished the RNA seq analysis and I am trying to perform some pathway analysis. I have used the gage package and I was looking online about another package called goseq that takes into account length bias. However, when I run the code I get an error. How to solve that as I am not sure what the error means?
library(goseq)
supportedOrganisms() %>% filter(str_detect(Genome, "hg"))
#Step 1
#prepare the results for goseq
de.genes<-as.data.frame(resSig)
de.genes<- de.genes %>%
mutate(DE_status=1)
res_non_de_genes<-res[!row.names(res) %in% row.names(resSig),]
res_non_de_genes<-as.data.frame(res_non_de_genes) %>%
mutate(DE_status=0)
total.de.genes<-bind_rows(de.genes,res_non_de_genes)
nrow(total.de.genes)
total.de.genes$geneid<-row.names(total.de.genes)
total.de.genes$geneid<- substr(rownames(total.de.genes), 1, 15)
status<-total.de.genes$DE_status
names(status)<-total.de.genes$geneid
#to extract gene length
# Read the data into R
genelength<- readDGE(rawfiles, columns=c(1,6), comment.char = "#")
# (as data frame to strip names of lists containing dimnames):
genelength <- data.frame(genelength$counts)
#to remove the extra header name
genelength<-genelength
for ( i in 1:ncol(genelength)){
colnames(genelength)[i] <- sub("_H.*", "", colnames(genelength)[i])
}
# Convert count data to a matrix of appropriate form that DEseq2 can read
sampleIndex_genelength <- grepl("IGF\\d+", colnames(genelength))
genelength<- as.matrix(genelength[,sampleIndex_genelength])
check<-rowAlls(genelength)
all(check)
#we have check the all our samples have the same geneids and the same gene lenght
#we need a vector of gene ids and gene length only
genelength<-genelength[,1]
str(genelength)
genelength<-as.data.frame(genelength)
genelength$geneid<-row.names(genelength)
genelength<-genelength[row.names(total.de.genes),]
genelength$geneid<- substr(rownames(genelength), 1, 15)
all(rownames(genelength) %in% rownames(total.de.genes))
all(rownames(genelength) == rownames(total.de.genes))
gene_length<-genelength$genelength
names(gene_length)<-genelength$geneid
#fit the probability function
pwf <- nullp(status, "hg38", "ensGene", bias.data = gene_length)
#GO enrichment analysis
goResults <- goseq(pwf, "hg38","ensGene", test.cats=c("GO:BP","GO:CC","GO:MF"))
Fetching GO annotations... Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'toTable': disk I/O error
What does the error mean? I check and my data are in the right format. Any suggestions?
Thanks, Maria
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] rtracklayer_1.54.0 goseq_1.46.0 geneLenDataBase_1.30.0
[4] BiasedUrn_1.07 pathview_1.32.0 org.Hs.eg.db_3.13.0
[7] GO.db_3.13.0 AnnotationDbi_1.56.2 gage_2.42.0
[10] ggbeeswarm_0.6.0 RColorBrewer_1.1-2 vsn_3.60.0
[13] pcaExplorer_2.18.0 UpSetR_1.4.0 pheatmap_1.0.12
[16] magrittr_2.0.2 DEFormats_1.20.0 reshape2_1.4.4
[19] ggpubr_0.4.0 DESeq2_1.32.0 SummarizedExperiment_1.22.0
[22] Biobase_2.52.0 MatrixGenerics_1.4.3 matrixStats_0.61.0
[25] GenomicRanges_1.44.0 GenomeInfoDb_1.28.4 IRanges_2.26.0
[28] S4Vectors_0.30.0 BiocGenerics_0.38.0 forcats_0.5.1
[31] stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4
[34] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6
[37] ggplot2_3.3.5 tidyverse_1.3.1 edgeR_3.34.1
[40] limma_3.48.3
loaded via a namespace (and not attached):
[1] utf8_1.2.2 shinydashboard_0.7.2 tidyselect_1.1.2
[4] heatmaply_1.3.0 RSQLite_2.2.10 htmlwidgets_1.5.4
[7] grid_4.1.2 TSP_1.1-11 BiocParallel_1.26.2
[10] munsell_0.5.0 preprocessCore_1.54.0 codetools_0.2-18
[13] DT_0.21 withr_2.4.3 colorspace_2.0-2
[16] Category_2.60.0 filelock_1.0.2 knitr_1.37
[19] rstudioapi_0.13 ggsignif_0.6.3 NMF_0.23.0
[22] KEGGgraph_1.54.0 GenomeInfoDbData_1.2.7 topGO_2.46.0
[25] bit64_4.0.5 vctrs_0.3.8 generics_0.1.2
[28] xfun_0.29 BiocFileCache_2.2.1 R6_2.5.1
[31] doParallel_1.0.17 seriation_1.3.2 locfit_1.5-9.4
[34] bitops_1.0-7 cachem_1.0.6 shinyAce_0.4.1
[37] DelayedArray_0.18.0 assertthat_0.2.1 BiocIO_1.4.0
[40] promises_1.2.0.1 scales_1.1.1 beeswarm_0.4.0
[43] gtable_0.3.0 affy_1.70.0 rlang_1.0.1
[46] genefilter_1.74.0 splines_4.1.2 rstatix_0.7.0
[49] lazyeval_0.2.2 shinyBS_0.61 broom_0.7.12
[52] checkmate_2.0.0 yaml_2.3.5 BiocManager_1.30.16
[55] abind_1.4-5 modelr_0.1.8 GenomicFeatures_1.46.5
[58] threejs_0.3.3 crosstalk_1.2.0 backports_1.4.1
[61] httpuv_1.6.5 RBGL_1.68.0 tools_4.1.2
[64] gridBase_0.4-7 affyio_1.62.0 ellipsis_0.3.2
[67] Rcpp_1.0.8 plyr_1.8.6 base64enc_0.1-3
[70] progress_1.2.2 zlibbioc_1.38.0 RCurl_1.98-1.6
[73] prettyunits_1.1.1 viridis_0.6.2 haven_2.4.3
[76] ggrepel_0.9.1 cluster_2.1.2 fs_1.5.2
[79] data.table_1.14.2 SparseM_1.81 reprex_2.0.1
[82] hms_1.1.1 mime_0.12 evaluate_0.15
[85] xtable_1.8-4 XML_3.99-0.8 readxl_1.3.1
[88] gridExtra_2.3 compiler_4.1.2 biomaRt_2.50.3
[91] crayon_1.5.0 htmltools_0.5.2 GOstats_2.60.0
[94] mgcv_1.8-39 later_1.3.0 tzdb_0.2.0
[97] geneplotter_1.72.0 lubridate_1.8.0 DBI_1.1.2
[100] dbplyr_2.1.1 MASS_7.3-55 rappdirs_0.3.3
[103] Matrix_1.4-0 car_3.0-12 cli_3.2.0
[106] glmpca_0.2.0 igraph_1.2.11 pkgconfig_2.0.3
[109] GenomicAlignments_1.30.0 registry_0.5-1 plotly_4.10.0
[112] xml2_1.3.3 foreach_1.5.2 annotate_1.72.0
[115] vipor_0.4.5 rngtools_1.5.2 pkgmaker_0.32.2
[118] webshot_0.5.2 XVector_0.32.0 AnnotationForge_1.36.0
[121] rvest_1.0.2 digest_0.6.29 graph_1.70.0
[124] Biostrings_2.60.2 rmarkdown_2.11 cellranger_1.1.0
[127] dendextend_1.15.2 GSEABase_1.56.0 restfulr_0.0.13
[130] curl_4.3.2 Rsamtools_2.10.0 shiny_1.7.1
[133] rjson_0.2.21 nlme_3.1-155 lifecycle_1.0.1
[136] jsonlite_1.7.3 carData_3.0-5 viridisLite_0.4.0
[139] fansi_1.0.2 pillar_1.7.0 lattice_0.20-45
[142] KEGGREST_1.34.0 fastmap_1.1.0 httr_1.4.2
[145] survival_3.2-13 glue_1.6.1 png_0.1-7
[148] iterators_1.0.14 bit_4.0.4 Rgraphviz_2.36.0
[151] stringi_1.7.6 blob_1.2.2 memoise_2.0.1
>
Hi James,
thanks for your suggestion. I tried it and it worked. I then run my script and it is working fine. Maybe it was a bug in R that got fixed.
Kind regards, Maria