I would like to extract the coordinates for a chromosome. I made a chunk of code, but as I am new to these packages, I am not sure if I follow the right logic here:
library(IRanges) library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb = transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by = "gene",use.names=TRUE) #For example, only I am interested in intergenic coordinates in chromosome 1 seqlevels(txdb, force=TRUE) = c("chr1") #Creates IRanges ir = IRanges(start=unlist(start(txdb)), end=unlist(end(txdb)), names=names(txdb)) # Reduce overlapping gene positions to continuous range ir.red = reduce(ir) # Collapses ranges of overlapping genes to continuous range. #Identify overlaps among the initial and reduced range data sets overlap = findOverlaps(ir, ir.red)
So, the overlap should contain the intergenic coordinate of these genes. Is this right?
What does the Txdb object really contain? By this I am mean: I am confused what grouping really means. For
example, if I group by genes how is that different if I group by exons or introns? Are not introns and exons part of
the gene, so is not that the same? Maybe the problem is just fundamental and revolves around the definition of a gene, but I am confused : if we have transcripts, isnt that the same as genes?