I have a list of locations on the genome, which I have converted to GRanges object, and now wish to annotate.
Sample data:
head(CpG_Ranges)
GRanges object with 6 ranges and 10 metadata columns:
seqnames ranges strand | n_samples pvalue
<Rle> <IRanges> <Rle> | <integer> <numeric>
[1] chr1 110821406-110821412 * | 2 0.000253685
[2] chr1 110820767-110820776 * | 2 0.000887079
[3] chr14 1862428-1862429 * | 2 0.001082515
[4] chr24 40883339-40883354 * | 2 0.001207324
[5] chr14 49361102-49361103 * | 2 0.001705802
[6] chr14 56132052-56132053 * | 2 0.001876952
I have created a txdb file, and found and fetched the appropriate annotation package using Annotation hub as below:
gffFile <- "G:\\oviAri4.ncbiRefSeq.gtf"
txdb <- makeTxDbFromGFF(file=gffFile, format=c("gtf"))
ah <- AnnotationHub()
query(ah, c("Ovis aries", "OrgDb"))
org.Oa.eg.db <- ah[["AH80684"]]
Then I'm using annotatePeak as such:
peakAnno <- annotatePeak(CpG_Ranges, tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Oa.eg.db")
Which runs fine for binning the data into genomic regions (exon, intron, TSS, etc), but fails in the annotation step:
>> preparing features information... 2020-08-05 12:09:06
>> identifying nearest features... 2020-08-05 12:09:06
>> calculating distance from peak to TSS... 2020-08-05 12:09:07
>> assigning genomic annotation... 2020-08-05 12:09:07
>> adding gene annotation... 2020-08-05 12:09:09
>> assigning chromosome lengths 2020-08-05 12:09:09
>> done... 2020-08-05 12:09:09
Warning messages:
1: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrUn_NW_014639147v1, chrUn_NW_014639227v1, chrUn_NW_014639425v1, chrUn_NW_014639454v1, chrUn_NW_014639514v1, chrUn_NW_014639584v1, chrUn_NW_014639757v1, chrUn_NW_014639941v1, chrUn_NW_014640025v1, chrUn_NW_014640441v1, chrUn_NW_014640601v1, chrUn_NW_014640620v1, chrUn_NW_014640625v1, chrUn_NW_014640781v1, chrUn_NW_014640783v1, chrUn_NW_014640831v1, chrUn_NW_014640847v1, chrUn_NW_014641029v1, chrUn_NW_014641066v1, chrUn_NW_014641250v1, chrUn_NW_014641439v1, chrUn_NW_014641483v1, chrUn_NW_014641578v1, chrUn_NW_014641740v1, chrUn_NW_014641910v1, chrUn_NW_014642173v1, chrUn_NW_014642238v1, chrUn_NW_014642275v1
- in 'y': chrM, chrUn_NW_014639041v1, chrUn_NW_014639042v1, chrUn_NW_014639057v1, chrUn_NW_014639070v1, chrUn_NW_014639071v1, chrUn_NW_014639073v1, chrUn_NW_014639074v1, chrUn_NW_014639084v1, chrUn_NW_014639092v1, chrUn_NW_014639127v1, chrUn_NW_014639132v1, chrUn_NW_014639140v1, chrUn_NW_014639151v1, [... truncated]
2: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrUn_NW_014639147v1, chrUn_NW_014639227v1, chrUn_NW_014639425v1, chrUn_NW_014639454v1, chrUn_NW_014639514v1, chrUn_NW_014639584v1, chrUn_NW_014639757v1, chrUn_NW_014639941v1, chrUn_NW_014640025v1, chrUn_NW_014640441v1, chrUn_NW_014640601v1, chrUn_NW_014640620v1, chrUn_NW_014640625v1, chrUn_NW_014640781v1, chrUn_NW_014640783v1, chrUn_NW_014640831v1, chrUn_NW_014640847v1, chrUn_NW_014641029v1, chrUn_NW_014641066v1, chrUn_NW_014641250v1, chrUn_NW_014641439v1, chrUn_NW_014641483v1, chrUn_NW_014641578v1, chrUn_NW_014641740v1, chrUn_NW_014641910v1, chrUn_NW_014642173v1, chrUn_NW_014642238v1, chrUn_NW_014642275v1
- in 'y': chrM, chrUn_NW_014639041v1, chrUn_NW_014639042v1, chrUn_NW_014639057v1, chrUn_NW_014639070v1, chrUn_NW_014639071v1, chrUn_NW_014639073v1, chrUn_NW_014639074v1, chrUn_NW_014639084v1, chrUn_NW_014639092v1, chrUn_NW_014639127v1, chrUn_NW_014639132v1, chrUn_NW_014639140v1, chrUn_NW_014639151v1, [... truncated]
3: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrUn_NW_014639147v1, chrUn_NW_014639227v1, chrUn_NW_014639425v1, chrUn_NW_014639454v1, chrUn_NW_014639514v1, chrUn_NW_014639584v1, chrUn_NW_014639757v1, chrUn_NW_014639941v1, chrUn_NW_014640025v1, chrUn_NW_014640441v1, chrUn_NW_014640601v1, chrUn_NW_014640620v1, chrUn_NW_014640625v1, chrUn_NW_014640781v1, chrUn_NW_014640783v1, chrUn_NW_014640831v1, chrUn_NW_014640847v1, chrUn_NW_014641029v1, chrUn_NW_014641066v1, chrUn_NW_014641250v1, chrUn_NW_014641439v1, chrUn_NW_014641483v1, chrUn_NW_014641578v1, chrUn_NW_014641740v1, chrUn_NW_014641910v1, chrUn_NW_014642173v1, chrUn_NW_014642238v1, chrUn_NW_014642275v1
- in 'y': chrM, chrUn_NW_014639041v1, chrUn_NW_014639042v1, chrUn_NW_014639057v1, chrUn_NW_014639070v1, chrUn_NW_014639071v1, chrUn_NW_014639073v1, chrUn_NW_014639074v1, chrUn_NW_014639084v1, chrUn_NW_014639092v1, chrUn_NW_014639127v1, chrUn_NW_014639132v1, chrUn_NW_014639140v1, chrUn_NW_014639151v1, [... truncated]
4: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrUn_NW_014639147v1, chrUn_NW_014639227v1, chrUn_NW_014639425v1, chrUn_NW_014639454v1, chrUn_NW_014639514v1, chrUn_NW_014639584v1, chrUn_NW_014639757v1, chrUn_NW_014639941v1, chrUn_NW_014640025v1, chrUn_NW_014640441v1, chrUn_NW_014640601v1, chrUn_NW_014640620v1, chrUn_NW_014640625v1, chrUn_NW_014640781v1, chrUn_NW_014640783v1, chrUn_NW_014640831v1, chrUn_NW_014640847v1, chrUn_NW_014641029v1, chrUn_NW_014641066v1, chrUn_NW_014641250v1, chrUn_NW_014641439v1, chrUn_NW_014641483v1, chrUn_NW_014641578v1, chrUn_NW_014641740v1, chrUn_NW_014641910v1, chrUn_NW_014642173v1, chrUn_NW_014642238v1, chrUn_NW_014642275v1
- in 'y': chrM, chrUn_NW_014639041v1, chrUn_NW_014639042v1, chrUn_NW_014639057v1, chrUn_NW_014639070v1, chrUn_NW_014639071v1, chrUn_NW_014639073v1, chrUn_NW_014639074v1, chrUn_NW_014639084v1, chrUn_NW_014639092v1, chrUn_NW_014639127v1, chrUn_NW_014639132v1, chrUn_NW_014639140v1, chrUn_NW_014639151v1, [... truncated]
5: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrUn_NW_014639147v1, chrUn_NW_014639227v1, chrUn_NW_014639425v1, chrUn_NW_014639454v1, chrUn_NW_014639514v1, chrUn_NW_014639584v1, chrUn_NW_014639757v1, chrUn_NW_014639941v1, chrUn_NW_014640025v1, chrUn_NW_014640441v1, chrUn_NW_014640601v1, chrUn_NW_014640620v1, chrUn_NW_014640625v1, chrUn_NW_014640781v1, chrUn_NW_014640783v1, chrUn_NW_014640831v1, chrUn_NW_014640847v1, chrUn_NW_014641029v1, chrUn_NW_014641066v1, chrUn_NW_014641250v1, chrUn_NW_014641439v1, chrUn_NW_014641483v1, chrUn_NW_014641578v1, chrUn_NW_014641740v1, chrUn_NW_014641910v1, chrUn_NW_014642173v1, chrUn_NW_014642238v1, chrUn_NW_014642275v1
- in 'y': chrM, chrUn_NW_014639041v1, chrUn_NW_014639042v1, chrUn_NW_014639057v1, chrUn_NW_014639070v1, chrUn_NW_014639071v1, chrUn_NW_014639073v1, chrUn_NW_014639074v1, chrUn_NW_014639084v1, chrUn_NW_014639092v1, chrUn_NW_014639127v1, chrUn_NW_014639132v1, chrUn_NW_014639140v1, chrUn_NW_014639151v1, [... truncated]
6: In annotatePeak(CpG_Ranges, tssRegion = c(-3000, 3000), TxDb = txdb, :
Unknown ID type, gene annotation will not be added...
I could manually find which genes these are located in/near, but really need to automate this.
Any help would be greatly appreciated!
I forgot, you need a
library(rtracklayer)
before theimport
step.Amazing! Thank you, I would have never noticed gene_id did not contain Gene IDs!!!
This worked perfectly, and I easily converted the csAnno object to a dataframe and exported! Also, this has highlighted how much I still have to learn - your use of the phrase "relatively trivial" before casually hooking into and querying the ucsc database 😂!
Thank you again! Chris