genFeatures() function in
systemPipeR allows you to compute intergenic regions. The vignette for the Ribo-Seq workflow of
systemPipeR package gives some examples (and/or consult
?genFeatures). The sub-classification of the intergenic regions you mention could be obtained downstream by using the plus/minus strand orientation of the genes defining the intergenic regions. The naming scheme of the intergenics by their neighboring genes (e.g. geneID1__geneID2) could help here but this would require some additional coding for you. If this is a common used case then I am happy to add this functionality to the to-do list for next update. Alternatively, one could discuss whether the utility to extract intergenic regions from TxDbs could become part of the
GenomicFeatures package. In the past there were some discussions about this I believe.
GenomicFeatures fails to produce a
TxDb. Something in your GFF is not meeting the expected format. This is usually fixable by debugging the GFF. However, would you mind using Biomart as source of your annotations (GFF) instead? If this is fine then please try the following which works just fine:
> library(GenomicFeatures); library("biomaRt"); library(systemPipeR)
> txdb <- makeTxDbFromBiomart(biomart = "ensembl", dataset = "scerevisiae_gene_ensembl")
> myfeatures <- c("tx_type", "promoter", "intron", "exon", "cds", "intergenic")
> feat <- genFeatures(txdb, featuretype=myfeatures, reduce_ranges=FALSE, upstream=1000, downstream=0)
Created feature ranges: protein_coding, ncRNA, tRNA, snoRNA, pseudogene, snRNA, rRNA
Created feature ranges: promoter
Created feature ranges: intron
Created feature ranges: exon
Created feature ranges: cds
Created feature ranges: intergenic
Now the intergenic ranges can be extracted with
feat$intergenic. Note: using
feature="all" will give you an error for
fiveUTR/threeUTR since those return empty objects. I will change this to a warning in the next update of systemPipeR.
If your intergenic regions are annotated in your GFF file, then just import the file with the
import() function from the rtracklayer package. This will return you a GRanges object with various metadata columns. One of them will be named
type and it will tell you the type of feature for each range in the GRanges object. Your 3 types of intergenic regions should show up there.