Entering edit mode
Tom Oates
▴
60
@tom-oates-5703
Last seen 7.4 years ago
Hi
I am using distanceToNearest on a datset of CpG dinucleotides and the
rat
transcripts from the latest ensembl build.
Datasets as below:
CpGs
GRanges with 6 ranges and 4 metadata columns:
seqnames ranges strand |
<rle> <iranges> <rle> |
[1] 10 [ 96723746, 96723747] - |
[2] 7 [ 13641170, 13641171] + |
[3] 16 [ 17772801, 17772802] - |
[4] 3 [ 88173502, 88173503] - |
[5] 13 [106979682, 106979683] + |
[6] 9 [104393139, 104393140] + |
rat <- makeTranscriptDbFromBiomart(
biomart="ENSEMBL_MART_ENSEMBL",
dataset='rnorvegicus_gene_ensembl',
host="ensembl.org")
rat_tx<-transcripts(rat)
distances<-distanceToNearestdiff.cpgs.gr, rat.transcripts,
ignore.strand=F)
distances
DataFrame with 1133 rows and 3 columns
queryHits subjectHits distance
<integer> <integer> <integer>
1 1 5962 479744
2 2 23710 65549
3 3 11077 199011
4 4 18109 101821
5 5 8159 664239
6 6 27327 457961
7 7 25795 0
8 8 25108 26868
9 9 14471 202908
When I manually look through the object "distances" I have found that
some
negative strand CpGs have been assigned nearest transcripts which
aren't
the nearest.
For example,
===========B==============B==CG========A=======A===
The object distances contains a subjectHit reference to transcript A
even
though the CG is nearer to transcript B (and the transcript is on the
negative strand so it would make more sense anyway to go to transcript
B).
The problem is not solved by:
distanceToNearestdiff.cpgs.gr, rat.transcripts, ignore.strand=F,
select=all)
Any help would be appreciated
Thanks
[[alternative HTML version deleted]]