Entering edit mode
Hi,
I am trying to prepare my data for DEXSeq with the python script provided by the package.
I have downloaded the file from here. it is a gff file of considerable size (over 2Gb when unzipped).
wget ftp://flybase.net/genomes/Drosophila_melanogaster/current/gff/dmel-all-r6.03.gff.gz gunzip dmel-all-r6.03.gff.gz mv dmel-all-r6.03.gff.gz Dmel_r6.03.gff python ~/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_prepare_annotation.py Dmel_r6.03.gff gffFiles/Dmel.DEXSeq.r6.03.gtf
When I try to run the python script to convert it to a DEXSeq-friendly format I get the error message:
Traceback (most recent call last): File "/home/yeroslaviz/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_prepare_annotation.py", line 54, in <module> f.attr['gene_id'] = f.attr['gene_id'].replace( ":", "_" ) KeyError: 'gene_id'
Is there a way to make DEXSeq works with this file?
Assa
sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DEXSeq_1.12.1 BiocParallel_1.0.0 DESeq2_1.6.1 [4] RcppArmadillo_0.4.500.0 Rcpp_0.11.3 GenomicRanges_1.18.1 [7] GenomeInfoDb_1.2.3 IRanges_2.0.0 S4Vectors_0.4.0 [10] Biobase_2.26.0 BiocGenerics_0.12.1 loaded via a namespace (and not attached): [1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1 [4] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 [7] biomaRt_2.22.0 Biostrings_2.34.0 bitops_1.0-6 [10] brew_1.0-6 checkmate_1.5.0 cluster_1.15.3 [13] codetools_0.2-9 colorspace_1.2-4 DBI_0.3.1 [16] digest_0.6.4 fail_1.2 foreach_1.4.2 [19] foreign_0.8-61 Formula_1.1-2 genefilter_1.48.1 [22] geneplotter_1.44.0 ggplot2_1.0.0 grid_3.1.0 [25] gtable_0.1.2 Hmisc_3.14-5 hwriter_1.3.2 [28] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26 [31] locfit_1.5-9.1 MASS_7.3-35 munsell_0.4.2 [34] nnet_7.3-8 plyr_1.8.1 proto_0.3-10 [37] RColorBrewer_1.0-5 RCurl_1.95-4.3 reshape2_1.4 [40] rpart_4.1-8 Rsamtools_1.18.2 RSQLite_1.0.0 [43] scales_0.2.4 sendmailR_1.2-1 splines_3.1.0 [46] statmod_1.4.20 stringr_0.6.2 survival_2.37-7 [49] tools_3.1.0 XML_3.98-1.1 xtable_1.7-4 [52] XVector_0.6.0 zlibbioc_1.12.0