Pathway analysis of RNAseq data using goseq package
1
0
Entering edit mode
@998340ab
Last seen 21 months ago
United Kingdom

Hello,

I have finished the RNA seq analysis and I am trying to perform some pathway analysis. I have used the gage package and I was looking online about another package called goseq that takes into account length bias. However, when I run the code I get an error. How to solve that as I am not sure what the error means?



library(goseq)
supportedOrganisms() %>% filter(str_detect(Genome, "hg"))
#Step 1
#prepare the results for goseq
de.genes<-as.data.frame(resSig)
de.genes<- de.genes %>%
  mutate(DE_status=1)

res_non_de_genes<-res[!row.names(res) %in% row.names(resSig),]
res_non_de_genes<-as.data.frame(res_non_de_genes) %>%
  mutate(DE_status=0)

total.de.genes<-bind_rows(de.genes,res_non_de_genes)
nrow(total.de.genes)
total.de.genes$geneid<-row.names(total.de.genes)
total.de.genes$geneid<- substr(rownames(total.de.genes), 1, 15)
status<-total.de.genes$DE_status
names(status)<-total.de.genes$geneid


#to extract gene length
# Read the data into R 
genelength<- readDGE(rawfiles, columns=c(1,6), comment.char = "#")
# (as data frame to strip names of lists containing dimnames):
genelength <- data.frame(genelength$counts)
#to remove the extra header name
genelength<-genelength
for ( i in 1:ncol(genelength)){
  colnames(genelength)[i] <-  sub("_H.*", "", colnames(genelength)[i])
} 
# Convert count data to a matrix of appropriate form that DEseq2 can read
sampleIndex_genelength <- grepl("IGF\\d+", colnames(genelength))
genelength<- as.matrix(genelength[,sampleIndex_genelength])
check<-rowAlls(genelength)
all(check)
#we have check the all our samples have the same geneids and the same gene lenght
#we need a vector of gene ids and gene length only
genelength<-genelength[,1]
str(genelength)
genelength<-as.data.frame(genelength)
genelength$geneid<-row.names(genelength)
genelength<-genelength[row.names(total.de.genes),]
genelength$geneid<- substr(rownames(genelength), 1, 15)


all(rownames(genelength) %in% rownames(total.de.genes)) 
all(rownames(genelength) == rownames(total.de.genes))
gene_length<-genelength$genelength
names(gene_length)<-genelength$geneid


#fit the probability function
pwf <- nullp(status, "hg38", "ensGene", bias.data = gene_length)

#GO enrichment analysis
goResults <- goseq(pwf, "hg38","ensGene", test.cats=c("GO:BP","GO:CC","GO:MF"))

Fetching GO annotations... Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'toTable': disk I/O error

What does the error mean? I check and my data are in the right format. Any suggestions?

Thanks, Maria

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rtracklayer_1.54.0          goseq_1.46.0                geneLenDataBase_1.30.0     
 [4] BiasedUrn_1.07              pathview_1.32.0             org.Hs.eg.db_3.13.0        
 [7] GO.db_3.13.0                AnnotationDbi_1.56.2        gage_2.42.0                
[10] ggbeeswarm_0.6.0            RColorBrewer_1.1-2          vsn_3.60.0                 
[13] pcaExplorer_2.18.0          UpSetR_1.4.0                pheatmap_1.0.12            
[16] magrittr_2.0.2              DEFormats_1.20.0            reshape2_1.4.4             
[19] ggpubr_0.4.0                DESeq2_1.32.0               SummarizedExperiment_1.22.0
[22] Biobase_2.52.0              MatrixGenerics_1.4.3        matrixStats_0.61.0         
[25] GenomicRanges_1.44.0        GenomeInfoDb_1.28.4         IRanges_2.26.0             
[28] S4Vectors_0.30.0            BiocGenerics_0.38.0         forcats_0.5.1              
[31] stringr_1.4.0               dplyr_1.0.8                 purrr_0.3.4                
[34] readr_2.1.2                 tidyr_1.2.0                 tibble_3.1.6               
[37] ggplot2_3.3.5               tidyverse_1.3.1             edgeR_3.34.1               
[40] limma_3.48.3               

loaded via a namespace (and not attached):
  [1] utf8_1.2.2               shinydashboard_0.7.2     tidyselect_1.1.2        
  [4] heatmaply_1.3.0          RSQLite_2.2.10           htmlwidgets_1.5.4       
  [7] grid_4.1.2               TSP_1.1-11               BiocParallel_1.26.2     
 [10] munsell_0.5.0            preprocessCore_1.54.0    codetools_0.2-18        
 [13] DT_0.21                  withr_2.4.3              colorspace_2.0-2        
 [16] Category_2.60.0          filelock_1.0.2           knitr_1.37              
 [19] rstudioapi_0.13          ggsignif_0.6.3           NMF_0.23.0              
 [22] KEGGgraph_1.54.0         GenomeInfoDbData_1.2.7   topGO_2.46.0            
 [25] bit64_4.0.5              vctrs_0.3.8              generics_0.1.2          
 [28] xfun_0.29                BiocFileCache_2.2.1      R6_2.5.1                
 [31] doParallel_1.0.17        seriation_1.3.2          locfit_1.5-9.4          
 [34] bitops_1.0-7             cachem_1.0.6             shinyAce_0.4.1          
 [37] DelayedArray_0.18.0      assertthat_0.2.1         BiocIO_1.4.0            
 [40] promises_1.2.0.1         scales_1.1.1             beeswarm_0.4.0          
 [43] gtable_0.3.0             affy_1.70.0              rlang_1.0.1             
 [46] genefilter_1.74.0        splines_4.1.2            rstatix_0.7.0           
 [49] lazyeval_0.2.2           shinyBS_0.61             broom_0.7.12            
 [52] checkmate_2.0.0          yaml_2.3.5               BiocManager_1.30.16     
 [55] abind_1.4-5              modelr_0.1.8             GenomicFeatures_1.46.5  
 [58] threejs_0.3.3            crosstalk_1.2.0          backports_1.4.1         
 [61] httpuv_1.6.5             RBGL_1.68.0              tools_4.1.2             
 [64] gridBase_0.4-7           affyio_1.62.0            ellipsis_0.3.2          
 [67] Rcpp_1.0.8               plyr_1.8.6               base64enc_0.1-3         
 [70] progress_1.2.2           zlibbioc_1.38.0          RCurl_1.98-1.6          
 [73] prettyunits_1.1.1        viridis_0.6.2            haven_2.4.3             
 [76] ggrepel_0.9.1            cluster_2.1.2            fs_1.5.2                
 [79] data.table_1.14.2        SparseM_1.81             reprex_2.0.1            
 [82] hms_1.1.1                mime_0.12                evaluate_0.15           
 [85] xtable_1.8-4             XML_3.99-0.8             readxl_1.3.1            
 [88] gridExtra_2.3            compiler_4.1.2           biomaRt_2.50.3          
 [91] crayon_1.5.0             htmltools_0.5.2          GOstats_2.60.0          
 [94] mgcv_1.8-39              later_1.3.0              tzdb_0.2.0              
 [97] geneplotter_1.72.0       lubridate_1.8.0          DBI_1.1.2               
[100] dbplyr_2.1.1             MASS_7.3-55              rappdirs_0.3.3          
[103] Matrix_1.4-0             car_3.0-12               cli_3.2.0               
[106] glmpca_0.2.0             igraph_1.2.11            pkgconfig_2.0.3         
[109] GenomicAlignments_1.30.0 registry_0.5-1           plotly_4.10.0           
[112] xml2_1.3.3               foreach_1.5.2            annotate_1.72.0         
[115] vipor_0.4.5              rngtools_1.5.2           pkgmaker_0.32.2         
[118] webshot_0.5.2            XVector_0.32.0           AnnotationForge_1.36.0  
[121] rvest_1.0.2              digest_0.6.29            graph_1.70.0            
[124] Biostrings_2.60.2        rmarkdown_2.11           cellranger_1.1.0        
[127] dendextend_1.15.2        GSEABase_1.56.0          restfulr_0.0.13         
[130] curl_4.3.2               Rsamtools_2.10.0         shiny_1.7.1             
[133] rjson_0.2.21             nlme_3.1-155             lifecycle_1.0.1         
[136] jsonlite_1.7.3           carData_3.0-5            viridisLite_0.4.0       
[139] fansi_1.0.2              pillar_1.7.0             lattice_0.20-45         
[142] KEGGREST_1.34.0          fastmap_1.1.0            httr_1.4.2              
[145] survival_3.2-13          glue_1.6.1               png_0.1-7               
[148] iterators_1.0.14         bit_4.0.4                Rgraphviz_2.36.0        
[151] stringi_1.7.6            blob_1.2.2               memoise_2.0.1           
>
Pathways goseq RNASeqData • 812 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

The disk I/O error makes me think that your hard drive is full, although it is not clear to me at which point toTable should be writing anything to disk.

But the call to toTable is

x <- toTable(org.Hs.egGO2ALLEGS)

Which you could try running yourself to see if you can replicate the error. But I would check your disk space first.

ADD COMMENT
0
Entering edit mode

Hi James,

thanks for your suggestion. I tried it and it worked. I then run my script and it is working fine. Maybe it was a bug in R that got fixed.

Kind regards, Maria

ADD REPLY

Login before adding your answer.

Traffic: 859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6