easyRNAseq annotation file
4
0
Entering edit mode
kylvalda • 0
@kylvalda-8798
Last seen 8.0 years ago
United States

Hi,

Apparently I need to use a gtf or gff3 file for annotation with easyRNAseq now when processing hg19 bamfiles now - as method biomaRt uses GRCh39 and I get an error. So far so bad.

However, I seem not to be able to download a gtf or gff3 file for hg19 that easyRNAseq would accept:

> DGE<-simpleRNASeq(bamFiles=getBamFileList(bamfiles), param=param, nnodes=4, verbose=T, override=T)


...

==========================
Processing the annotation
==========================
Validating the annotation source
Read 993 records
Error in .validate(obj, verbose = verbose) :
The provided gff3 contains no annotation of type 'mRNA' and/or 'exon' in the first 1000 lines.


I've tried files the following annotation files but none seems to work:

ref_GRCh37.p13_top_level.gff3.gz
ref_GRCh37.p13_scaffolds.gff3.gz
gencode.v19.annotation.gtf
gencode.v19.annotation.gff3
Homo_sapiens.GRCh37.70.gtf

will try the Illumina iGenome next. What am I missing here?!?

easyrnaseq annotation • 2.3k views
ADD COMMENT
0
Entering edit mode
@nicolas-delhomme-6252
Last seen 5.5 years ago
Sweden
Hej Kylvalda! Are all the annotation file giving the same error? And can you tell me 1) where you got the file from 2) which version of R/Bioc and easyRNASeq you are using, so that I can try to reproduce the error. Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme, PhD The Street Lab, Umeå Plant Science Center, Swedish University for Agricultural Sciences (SLU) and Umeå University Tel: +46 90 786 5478 Email: nicolas.delhomme@umu.se SLU - Umeå universitet Umeå S-901 87 Sweden --------------------------------------------------------------- > On 13 Sep 2015, at 21:50, kylvalda [bioc] <noreply@bioconductor.org> wrote: > > Activity on a post you are following on support.bioconductor.org > User kylvalda wrote Question: easyRNAseq annotation file: > > > Hi, > > Apparently I need to use a gtf or gff3 file for annotation with easyRNAseq now when processing hg19 bamfiles now - as method biomaRt uses GRCh39 and I get an error. So far so bad. > > However, I seem not to be able to download a gtf or gff3 file for hg19 that easyRNAseq would accept: > > > DGE<-simpleRNASeq(bamFiles=getBamFileList(bamfiles), param=param, nnodes=4, verbose=T, override=T) > > > ... > > ========================== > Processing the annotation > ========================== > Validating the annotation source > Read 993 records > Error in .validate(obj, verbose = verbose) : > The provided gff3 contains no annotation of type 'mRNA' and/or 'exon' in the first 1000 lines. > > > I've tried files the following annotation files but none seems to work: > > ref_GRCh37.p13_top_level.gff3.gz > ref_GRCh37.p13_scaffolds.gff3.gz > gencode.v19.annotation.gtf > gencode.v19.annotation.gff3 > Homo_sapiens.GRCh37.70.gtf > > will try the Illumina iGenome next. What am I missing here?!? > > > Post tags: easyrnaseq, annotation > > You may reply via email or visit easyRNAseq annotation file >
ADD COMMENT
0
Entering edit mode
kylvalda • 0
@kylvalda-8798
Last seen 8.0 years ago
United States

Hey Nico,

 

sorry for the late reply (am currently battling kallisto/sleuth):

 

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] DESeq2_1.8.1                      RcppArmadillo_0.5.500.2.0
 [3] Rcpp_0.12.1                       easyRNASeq_2.4.7
 [5] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.36.3
 [7] rtracklayer_1.28.10               Biostrings_2.36.4
 [9] XVector_0.8.0                     GenomicRanges_1.20.6
[11] GenomeInfoDb_1.4.2                IRanges_2.2.7
[13] S4Vectors_0.6.5                   BiocGenerics_0.14.0

loaded via a namespace (and not attached):
 [1] genefilter_1.50.0       locfit_1.5-9.1          reshape2_1.4.1
 [4] splines_3.2.1           lattice_0.20-33         colorspace_1.2-6
 [7] survival_2.38-3         XML_3.98-1.3            foreign_0.8-66
[10] DBI_0.3.1               BiocParallel_1.2.21     RColorBrewer_1.1-2
[13] lambda.r_1.1.7          plyr_1.8.3              stringr_1.0.0
[16] zlibbioc_1.14.0         munsell_0.4.2           gtable_0.1.2
[19] futile.logger_1.4.1     hwriter_1.3.2           latticeExtra_0.6-26
[22] Biobase_2.28.0          geneplotter_1.46.0      biomaRt_2.24.0
[25] AnnotationDbi_1.30.1    proto_0.3-10            acepack_1.3-3.3
[28] xtable_1.7-4            edgeR_3.10.2            scales_0.3.0
[31] limma_3.24.15           Hmisc_3.16-0            annotate_1.46.1
[34] ShortRead_1.26.0        gridExtra_2.0.0         Rsamtools_1.20.4
[37] ggplot2_1.0.1           digest_0.6.8            stringi_0.5-5
[40] DESeq_1.20.0            grid_3.2.1              tools_3.2.1
[43] bitops_1.0-6            magrittr_1.5            RCurl_1.95-4.7
[46] LSD_3.0                 RSQLite_1.0.0           cluster_2.0.3
[49] Formula_1.2-1           futile.options_1.0.0    MASS_7.3-44
[52] rpart_4.1-10            nnet_7.3-11             GenomicAlignments_1.4.1
[55] genomeIntervals_1.24.1  intervals_0.15.1


I've downloaded the mentioned annotations from:

ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/

http://www.gencodegenes.org/releases/19.html

http://ftp.ensembl.org/pub/release-70/gtf/homo_sapiens/

 

 

Thanks a bunch

 

 

 

 

ADD COMMENT
0
Entering edit mode
kylvalda • 0
@kylvalda-8798
Last seen 8.0 years ago
United States

Same problem with iGenome:

Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2014-05-23-16-03-55/Genes/genes.gtf

==========================
Processing the annotation
==========================
Validating the annotation source
Read 993 records
Error in .validate(obj, verbose = verbose) :
  The provided gff3 contains no annotation of type 'mRNA' and/or 'exon' in the first 1000 lines.


...

ADD COMMENT
0
Entering edit mode
kylvalda • 0
@kylvalda-8798
Last seen 8.0 years ago
United States

Might this be the problem:

 

> annotParam<-AnnotParam(datasource=system.file('extdata','~/Homo_sapiens/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2014-05-23-16-03-55/Genes/genes.gtf'))


> annotParam
AnnotParamCharacter object set to retrieve 'gff3' formatted annotation from:

>

?

 

 

ADD COMMENT
0
Entering edit mode
I just realised that my answer has been held up on our mail server. :-\ Here is is: ---- Yes, it likely is. Can you set type="gtf" and try again. annotParam<-AnnotParam(datasource=system.file('extdata','~/Homo_sapiens/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2014-05-23-16-03-55/Genes/genes.gtf'),type="gtf") I will try to add asap some more file checking to prevent that from happening. Cheers, Nico --- I will make sure to add a test to ensure that this does not happen. Nico --------------------------------------------------------------- Nicolas Delhomme, PhD The Street Lab, Umeå Plant Science Center, Swedish University for Agricultural Sciences (SLU) and Umeå University Tel: +46 90 786 5478 Email: nicolas.delhomme@umu.se SLU - Umeå universitet Umeå S-901 87 Sweden ---------------------------------------------------------------
ADD REPLY

Login before adding your answer.

Traffic: 892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6