EasyRNASeq - gff file is not recognized
2
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hello, I am trying to generate a table of gene counts to use later with Deseq. However, I got an error message that the maize gff file that I am using is wrong. I downloaded this file directly from the plant ensembl website. I have to mention that I used a .gff file and a .gff3, and with both I have the same issue. Any hint in how to solve my problem. Many thanks for your help in advance, Gabriela -- output of sessionInfo(): > genes_FGS1 <- easyRNASeq(filesDirectory="/projects/EASYRNASeq/", + gapped=F, + validity.check=TRUE, + chr.map=chr.map, + organism="custom", + annotationMethod="gff", + annotationFile="/projects/ZmB73_5b_FGS.gff3", + count="genes", + filenames=files, + summarization="geneModels", + outputFormat="RNAseq") Checking arguments... Fetching annotations... Error in .readGffGtf(filename = filename, ignoreWarnings = ignoreWarnings, : Your file: /projects/ZmB73_5b_FGS.gff3 does not contain a gff header: '##gff-version 3' as first line. Is that really a gff3 file? > > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] VennDiagram_1.5.1 easyRNASeq_1.4.2 ShortRead_1.16.1 [4] latticeExtra_0.6-24 RColorBrewer_1.0-5 BSgenome_1.26.1 [7] biomaRt_2.14.0 genomeIntervals_1.14.0 intervals_0.13.3 [10] Rsamtools_1.10.1 Biostrings_2.26.2 GenomicRanges_1.10.4 [13] IRanges_1.16.4 edgeR_3.0.2 limma_3.14.1 [16] pasilla_0.2.13 DESeq_1.10.1 lattice_0.20-10 [19] locfit_1.5-8 DEXSeq_1.2.1 Biobase_2.18.0 [22] BiocGenerics_0.4.0 pasillaBamSubset_0.0.2 loaded via a namespace (and not attached): [1] annotate_1.34.1 AnnotationDbi_1.18.1 bitops_1.0-4.2 [4] DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 [7] hwriter_1.3 plyr_1.7.1 RCurl_1.91-1 [10] RSQLite_0.11.1 splines_2.15.2 statmod_1.4.15 [13] stats4_2.15.2 stringr_0.6.1 survival_2.36-14 [16] tools_2.15.2 XML_3.9-4 xtable_1.7-0 [19] zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.
• 1.5k views
ADD COMMENT
0
Entering edit mode
@delhommeemblde-3232
Last seen 9.6 years ago
Dear Gabriela, Given that error: > Your file: /projects/ZmB73_5b_FGS.gff3 does not contain a gff header: '##gff-version 3' as first line. Is that really a gff3 file? your gff3 appears not to contain a header. Add the following line: ##gff-version 3 to the beginning of your gff3 file and that should solve the problem. Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 6 Mar 2013, at 10:57, Gabriela [guest] wrote: > > Hello, > > I am trying to generate a table of gene counts to use later with Deseq. However, I got an error message that the maize gff file that I am using is wrong. I downloaded this file directly from the plant ensembl website. > > I have to mention that I used a .gff file and a .gff3, and with both I have the same issue. Any hint in how to solve my problem. > > Many thanks for your help in advance, > > Gabriela > > -- output of sessionInfo(): > >> genes_FGS1 <- easyRNASeq(filesDirectory="/projects/EASYRNASeq/", > + gapped=F, > + validity.check=TRUE, > + chr.map=chr.map, > + organism="custom", > + annotationMethod="gff", > + annotationFile="/projects/ZmB73_5b_FGS.gff3", > + count="genes", > + filenames=files, > + summarization="geneModels", > + outputFormat="RNAseq") > Checking arguments... > Fetching annotations... > Error in .readGffGtf(filename = filename, ignoreWarnings = ignoreWarnings, : > >> > > > > > > > > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid parallel stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] VennDiagram_1.5.1 easyRNASeq_1.4.2 ShortRead_1.16.1 > [4] latticeExtra_0.6-24 RColorBrewer_1.0-5 BSgenome_1.26.1 > [7] biomaRt_2.14.0 genomeIntervals_1.14.0 intervals_0.13.3 > [10] Rsamtools_1.10.1 Biostrings_2.26.2 GenomicRanges_1.10.4 > [13] IRanges_1.16.4 edgeR_3.0.2 limma_3.14.1 > [16] pasilla_0.2.13 DESeq_1.10.1 lattice_0.20-10 > [19] locfit_1.5-8 DEXSeq_1.2.1 Biobase_2.18.0 > [22] BiocGenerics_0.4.0 pasillaBamSubset_0.0.2 > > loaded via a namespace (and not attached): > [1] annotate_1.34.1 AnnotationDbi_1.18.1 bitops_1.0-4.2 > [4] DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 > [7] hwriter_1.3 plyr_1.7.1 RCurl_1.91-1 > [10] RSQLite_0.11.1 splines_2.15.2 statmod_1.4.15 > [13] stats4_2.15.2 stringr_0.6.1 survival_2.36-14 > [16] tools_2.15.2 XML_3.9-4 xtable_1.7-0 > [19] zlibbioc_1.4.0 > > > -- > Sent via the guest posting facility at bioconductor.org.
ADD COMMENT
0
Entering edit mode
@delhommeemblde-3232
Last seen 9.6 years ago
Dear Gabriella, If you look at the vignette of the package: vignette("easyRNASeq") You'll see a short description of the format in section 4.4. More precisely, read the format description in the "genomeIntervals" section page 16 that describe how your gff3 file should look like. Given the error message you get, your gff file does not contain the ID key among the attributes (the ninth column) or the ID key is incorrectly formatted. HTH, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On Mar 6, 2013, at 11:39 AM, Maria Gabriela RL wrote: > Dear Nico, > > Many thanks for your response. The gff3 file that I provided was able to be read. However, a new error came up. It seems to me that there is something wrong with my gff file. Could you recommend something. > > Again, many thanks for your help, > > Gabriela > > > genes_FGS1 <- easyRNASeq(filesDirectory="/projects/irg/grp_stich/p ersonal_folders/Gabby/NGS_R2/cluster/write/EASYRNASeq/", > + gapped=F, > validity.check=TRUE, > + validity.check=TRUE, > + chr.map=chr.map, > filenames=files, > + organism="custom", > + annotationMethod="gff", > + annotationFile="/projects/irg/grp_stich/personal_folders/Gabby/NGS _R2/cluster/write/ZmB73_5b_FGS.gff", > + count="genes", > + filenames=files, > + summarization="geneModels", > + outputFormat="RNAseq") > Checking arguments... > Fetching annotations... > Read 994386 records > Error in .getGffRange(organismName(obj), filename = filename, ignoreWarnings = ignoreWarnings, : > You gff file misses the ID key defining the exon ID in the gff attributes. The format should be 'gene:exon-number'. > > > > > On Wed, Mar 6, 2013 at 11:09 AM, Nicolas Delhomme <delhomme at="" embl.de=""> wrote: > Dear Gabriela, > > Given that error: > > > Your file: /projects/ZmB73_5b_FGS.gff3 does not contain a gff header: '##gff-version 3' as first line. Is that really a gff3 file? > > > your gff3 appears not to contain a header. > > Add the following line: > > ##gff-version 3 > > to the beginning of your gff3 file and that should solve the problem. > > Cheers, > > Nico > > --------------------------------------------------------------- > Nicolas Delhomme > > Genome Biology Computational Support > > European Molecular Biology Laboratory > > Tel: +49 6221 387 8310 > Email: nicolas.delhomme at embl.de > Meyerhofstrasse 1 - Postfach 10.2209 > 69102 Heidelberg, Germany > --------------------------------------------------------------- > > > > > > On 6 Mar 2013, at 10:57, Gabriela [guest] wrote: > > > > > Hello, > > > > I am trying to generate a table of gene counts to use later with Deseq. However, I got an error message that the maize gff file that I am using is wrong. I downloaded this file directly from the plant ensembl website. > > > > I have to mention that I used a .gff file and a .gff3, and with both I have the same issue. Any hint in how to solve my problem. > > > > Many thanks for your help in advance, > > > > Gabriela > > > > -- output of sessionInfo(): > > > >> genes_FGS1 <- easyRNASeq(filesDirectory="/projects/EASYRNASeq/", > > + gapped=F, > > + validity.check=TRUE, > > + chr.map=chr.map, > > + organism="custom", > > + annotationMethod="gff", > > + annotationFile="/projects/ZmB73_5b_FGS.gff3", > > + count="genes", > > + filenames=files, > > + summarization="geneModels", > > + outputFormat="RNAseq") > > Checking arguments... > > Fetching annotations... > > Error in .readGffGtf(filename = filename, ignoreWarnings = ignoreWarnings, : > > > >> > > > > > > > > > > > > > > > > > >> sessionInfo() > > R version 2.15.2 (2012-10-26) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] grid parallel stats graphics grDevices utils datasets > > [8] methods base > > > > other attached packages: > > [1] VennDiagram_1.5.1 easyRNASeq_1.4.2 ShortRead_1.16.1 > > [4] latticeExtra_0.6-24 RColorBrewer_1.0-5 BSgenome_1.26.1 > > [7] biomaRt_2.14.0 genomeIntervals_1.14.0 intervals_0.13.3 > > [10] Rsamtools_1.10.1 Biostrings_2.26.2 GenomicRanges_1.10.4 > > [13] IRanges_1.16.4 edgeR_3.0.2 limma_3.14.1 > > [16] pasilla_0.2.13 DESeq_1.10.1 lattice_0.20-10 > > [19] locfit_1.5-8 DEXSeq_1.2.1 Biobase_2.18.0 > > [22] BiocGenerics_0.4.0 pasillaBamSubset_0.0.2 > > > > loaded via a namespace (and not attached): > > [1] annotate_1.34.1 AnnotationDbi_1.18.1 bitops_1.0-4.2 > > [4] DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 > > [7] hwriter_1.3 plyr_1.7.1 RCurl_1.91-1 > > [10] RSQLite_0.11.1 splines_2.15.2 statmod_1.4.15 > > [13] stats4_2.15.2 stringr_0.6.1 survival_2.36-14 > > [16] tools_2.15.2 XML_3.9-4 xtable_1.7-0 > > [19] zlibbioc_1.4.0 > > > > > > -- > > Sent via the guest posting facility at bioconductor.org. > >
ADD COMMENT

Login before adding your answer.

Traffic: 689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6