Hi
I want to remove rows from out of chromosome length ranges from granges. Please suggest
Hi
I want to remove rows from out of chromosome length ranges from granges. Please suggest
There might be an easy way to make a GRanges object for the chromosomes, but it wasn't readily apparent to me, so I did it the hard way. But the idea is to have a GRanges that encompasses the extent of the chromosomes, and then use subsetByOverlaps()
.
> library(TxDb.Hsapiens.UCSC.hg19.knownGene) ## get chrlengths - there must actually be an accessor I don't know abou > con <- dbConnect(SQLite(), paste0(path.package("TxDb.Hsapiens.UCSC.hg19.knownGene"), "/extdata/TxDb.Hsapiens.UCSC.hg19.knownGene.sqlite")) > chrs <- dbGetQuery(con, "select chrom, length from chrominfo limit 24;") > chrGR <- GRanges(chrs[,1], IRanges(rep(1, 24), chrs[,2])) ## so chrGR is a GRanges that has start and end for all 24 chromosomes > badGR <- GRanges(paste0("chr", c(1,2,3,4,5)), IRanges(c(249250600, 3,4,5,6), c(249250645, 6,7,8,9))) ## a 'bad' GRanges, where the first range extends past the end of chr1 > badGR GRanges object with 5 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr1 [249250600, 249250645] * [2] chr2 [ 3, 6] * [3] chr3 [ 4, 7] * [4] chr4 [ 5, 8] * [5] chr5 [ 6, 9] * ------- seqinfo: 5 sequences from an unspecified genome; no seqlengths ## now remove all ranges that are not 'within' the chromosome boundaries > subsetByOverlaps(badGR, chrGR, type="within") GRanges object with 4 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr2 [3, 6] * [2] chr3 [4, 7] * [3] chr4 [5, 8] * [4] chr5 [6, 9] * ------- seqinfo: 5 sequences from an unspecified genome; no seqlengths
Hi Jim,
FWIW, getting the sequence lengths out of a TxDb object is a one-liner:
seqlengths(TxDb.Hsapiens.UCSC.hg19.knownGene)
If you want the full Seqinfo:
seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)
If you want it as a GRanges object (your chrGR
object):
as(seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene), "GRanges")
Too easy I know ;-)
H.
If your genome build is hg19 or some other key known to UCSC, you can do something like:
seqinfo(badGR) <- Seqinfo(genome="hg19")
goodGR <- trim(badGR)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi i have ext_grn my badGr which ahve 253 ranges, m trying to remove location 6 and 10 that are out of bound. when i applied the above process on it it is still having 253 ranges.
I see. I'm not sure there's a super-easy way to drop ranges that fall outside of the bounds. trim() is expected to return an object of the same length as its input, so we might need a new function. Maybe Herve has some ideas?
Hi,
trim()
uses internal helperGenomicRanges:::get_out_of_bound_index()
to find the out-of-bound ranges. So you could use this on your GRanges object:idx <- GenomicRanges:::get_out_of_bound_index(ext_grn) if (length(idx) != 0L) ext_grn <- ext_grn[-idx]
We should probably expose and document
get_out_of_bound_index()
, and rename it in the process.H.