If there are multiple genes that overlap, I only want the first gene. Is there a way to do this in genomicfeatures already or do I need to preprocess the GTF file before calling makeTxDbFromGFF?
Thanks.
If there are multiple genes that overlap, I only want the first gene. Is there a way to do this in genomicfeatures already or do I need to preprocess the GTF file before calling makeTxDbFromGFF?
Thanks.
No need to preprocess any files. Just import the file as a GRanges, subset it, and pass it to makeTxDbFromGRanges.
Hi,
makeTxDbFromGFF() has nothing to do with finding overlaps. It just imports the genes, transcripts, exons, and CDS from a GFF file into a TxDb object. Once you've done this, you can extract the genomic coordinates of the genes, transcripts, exons, or CDS from the TxDb object with genes(), transcripts(), exons(), or cds():
txdb <- makeTxDbFromGFF("path/to/GFF/file")
gn <- genes(txdb) # extract the genes (see ?genes for more information)
Then you can use findOverlaps() to find the overlaps between a set of genomic ranges (e.g. some aligned reads) and gn:
findOverlaps(reads, gn)
If you only want the 1st overlapping gene for each range in reads, call findOverlaps() with select="first":
findOverlaps(reads, gn, select="first")
Please see ?findOverlaps for more information.
Cheers,
H.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How do I remove overlapping genes by subsetting granges?
It would be difficult, because I think it would need to be iterative. For example, you could have this situation:
---- ---- ---- ----You do not want to simply remove ranges 2,3,4. Rather, I think you want to keep 1 and 3. Thus, the algorithm would need to remove range 2, decide to keep range 3 since 2 is gone, and remove range 4.
Maybe something this (untested) will help you get started:
m <- as.matrix(findOverlaps(gr)) drop <- integer(0L) while(nrow(m) > 0L) { last <- max(m) drop <- c(drop, last) m <- m[m[,1] != last & m[,2] != last,] } gr <- gr[-drop]Please consider whether
findOverlapsshould haveignore.strand=TRUEor not.