Search
News: Gencode GFF3 and FASTA files now available via AnnotationHub
2
gravatar for Sonali Arora
2.2 years ago by
Sonali Arora360
United States
Sonali Arora360 wrote:

GFF3 and FASTA files from the latest release of  Gencode  are now available via AnnotationHub. (biocVersion 3.2 only) 

One can access GFF3 and FASTA files from the latest release of Homo sapiens (release 23) using the following code snippet :

> library(AnnotationHub)
> ah = AnnotationHub()
snapshotDate(): 2015-08-26
> Human_gff = query(ah, c("Gencode", "gff", "human"))
> Human_gff
AnnotationHub with 9 records
# snapshotDate(): 2015-08-26
# $dataprovider: Gencode
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH49554"]]'

            title
  AH49554 | gencode.v23.2wayconspseudos.gff3.gz
  AH49555 | gencode.v23.annotation.gff3.gz
  AH49556 | gencode.v23.basic.annotation.gff3.gz
  AH49557 | gencode.v23.chr_patch_hapl_scaff.annotation.gff3.gz
  AH49558 | gencode.v23.chr_patch_hapl_scaff.basic.annotation.gff3.gz
  AH49559 | gencode.v23.long_noncoding_RNAs.gff3.gz
  AH49560 | gencode.v23.polyAs.gff3.gz
  AH49561 | gencode.v23.primary_assembly.annotation.gff3.gz
  AH49562 | gencode.v23.tRNAs.gff3.gz

> Human_fasta = query(ah, c("Gencode", "fasta", "human"))
> Human_fasta
AnnotationHub with 5 records
# snapshotDate(): 2015-08-26
# $dataprovider: Gencode
# $species: Homo sapiens
# $rdataclass: FaFile
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH49563"]]'

            title
  AH49563 | gencode.v23.chr_patch_hapl_scaff.transcripts.fa.gz
  AH49564 | gencode.v23.lncRNA_transcripts.fa.gz
  AH49565 | gencode.v23.pc_transcripts.fa.gz
  AH49566 | gencode.v23.pc_translations.fa.gz
  AH49567 | gencode.v23.transcripts.fa.gz

To access information about the file, use the '[' operator and use the '[[' to download the file. 

> ah["AH49562"]
AnnotationHub with 1 record
# snapshotDate(): 2015-08-26
# names(): AH49562
# $dataprovider: Gencode
# $species: Homo sapiens
# $rdataclass: GRanges
# $title: gencode.v23.tRNAs.gff3.gz
# $description: tRNA structures predicted by tRNA-Scan on reference chromosomes
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: GFF
# $sourceurl: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_23/ge...
# $sourcelastmodifieddate: 2015-07-16
# $sourcesize: 17419
# $tags: gencode, v23, tRNAs, gff3
# retrieve record with 'object[["AH49562"]]'
> gff = ah[["AH49562"]]
require(“rtracklayer”)
retrieving 1 resource
  |======================================================================| 100%

 

The GFF3 files are downloaded and read into R as a GenomicRanges object, while the FASTA files are indexed and both the Fasta file and its index are returned as a 'FaFile' object. 

> class(gff)
[1] "GRanges"
attr(,"package")
[1] "GenomicRanges"

> fas = ah[["AH49567"]]
retrieving 2 resources
  |======================================================================| 100%
  |======================================================================| 100%
There were 50 or more warnings (use warnings() to see the first 50)
> class(fas)
[1] "FaFile"
attr(,"package")
[1] "Rsamtools"
> fas
class: FaFile
path: /home/sarora/.AnnotationHub/56291
index: /home/sarora/.AnnotationHub/56292
isOpen: FALSE
yieldSize: NA

Similarly, Gencode GFF3 and FASTA files for current Mouse release ( M6 ) can be accessed with : 

> Mouse_gff = query(ah, c("Gencode", "gff", "mouse"))
> Mouse_fasta = query(ah, c("Gencode", "fasta", "mouse"))

> packageVersion('AnnotationHub')
[1] ‘2.1.40’

 

Sonali. 

ADD COMMENTlink written 2.2 years ago by Sonali Arora360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 132 users visited in the last hour