ties & strandedness in distanceToNearest GRanges
1
0
Entering edit mode
Tom Oates ▴ 60
@tom-oates-5703
Last seen 7.4 years ago
Hi I am using distanceToNearest on a datset of CpG dinucleotides and the rat transcripts from the latest ensembl build. Datasets as below: CpGs GRanges with 6 ranges and 4 metadata columns: seqnames ranges strand | <rle> <iranges> <rle> | [1] 10 [ 96723746, 96723747] - | [2] 7 [ 13641170, 13641171] + | [3] 16 [ 17772801, 17772802] - | [4] 3 [ 88173502, 88173503] - | [5] 13 [106979682, 106979683] + | [6] 9 [104393139, 104393140] + | rat <- makeTranscriptDbFromBiomart( biomart="ENSEMBL_MART_ENSEMBL", dataset='rnorvegicus_gene_ensembl', host="ensembl.org") rat_tx<-transcripts(rat) distances<-distanceToNearestdiff.cpgs.gr, rat.transcripts, ignore.strand=F) distances DataFrame with 1133 rows and 3 columns queryHits subjectHits distance <integer> <integer> <integer> 1 1 5962 479744 2 2 23710 65549 3 3 11077 199011 4 4 18109 101821 5 5 8159 664239 6 6 27327 457961 7 7 25795 0 8 8 25108 26868 9 9 14471 202908 When I manually look through the object "distances" I have found that some negative strand CpGs have been assigned nearest transcripts which aren't the nearest. For example, ===========B==============B==CG========A=======A=== The object distances contains a subjectHit reference to transcript A even though the CG is nearer to transcript B (and the transcript is on the negative strand so it would make more sense anyway to go to transcript B). The problem is not solved by: distanceToNearestdiff.cpgs.gr, rat.transcripts, ignore.strand=F, select=all) Any help would be appreciated Thanks [[alternative HTML version deleted]]
GO GO • 1.0k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi Tom, Thanks for reporting the bug. A fix was checked into GenomicRanges 1.11.32 (devel) and 1.10.7 (release). Both versions will be available through biocLite() on Thursday. The problem was in nearest(). The distance computation for potential nearest ranges was correct for "+" but not for "-". Using the first negative range in your example below as the query, query <- GRanges("10", IRanges(96723746, width=2), strand="-") and a small collection of transcripts in 'rat' that surround this range as the subject. subject <- GRanges("10", IRanges(c(95919265, 97203491), c(96311060, 97204143)), strand=c("-", "-")) Previously nearest was giving the correct answer for "+" but not "-". > nearest(query, subject, ignore.strand=TRUE) [1] 1 > nearest(query, subject, ignore.strand=FALSE) [1] 2 Now strand is handled correctly. > nearest(query, subject, ignore.strand=TRUE) [1] 1 > nearest(query, subject, ignore.strand=FALSE) [1] 1 Valerie On 02/23/13 11:11, Tom Oates wrote: > Hi > I am using distanceToNearest on a datset of CpG dinucleotides and the rat > transcripts from the latest ensembl build. > Datasets as below: > > CpGs > GRanges with 6 ranges and 4 metadata columns: > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] 10 [ 96723746, 96723747] - | > [2] 7 [ 13641170, 13641171] + | > [3] 16 [ 17772801, 17772802] - | > [4] 3 [ 88173502, 88173503] - | > [5] 13 [106979682, 106979683] + | > [6] 9 [104393139, 104393140] + | > > rat <- makeTranscriptDbFromBiomart( > biomart="ENSEMBL_MART_ENSEMBL", > dataset='rnorvegicus_gene_ensembl', > host="ensembl.org") > rat_tx<-transcripts(rat) > > distances<-distanceToNearestdiff.cpgs.gr, rat.transcripts, ignore.strand=F) > > distances > DataFrame with 1133 rows and 3 columns > queryHits subjectHits distance > <integer> <integer> <integer> > 1 1 5962 479744 > 2 2 23710 65549 > 3 3 11077 199011 > 4 4 18109 101821 > 5 5 8159 664239 > 6 6 27327 457961 > 7 7 25795 0 > 8 8 25108 26868 > 9 9 14471 202908 > > When I manually look through the object "distances" I have found that some > negative strand CpGs have been assigned nearest transcripts which aren't > the nearest. > For example, > > ===========B==============B==CG========A=======A=== > > The object distances contains a subjectHit reference to transcript A even > though the CG is nearer to transcript B (and the transcript is on the > negative strand so it would make more sense anyway to go to transcript B). > The problem is not solved by: > distanceToNearestdiff.cpgs.gr, rat.transcripts, ignore.strand=F, > select=all) > > Any help would be appreciated > Thanks > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6