If there are multiple genes that overlap, I only want the first gene. Is there a way to do this in genomicfeatures already or do I need to preprocess the GTF file before calling makeTxDbFromGFF?
Thanks.
If there are multiple genes that overlap, I only want the first gene. Is there a way to do this in genomicfeatures already or do I need to preprocess the GTF file before calling makeTxDbFromGFF?
Thanks.
No need to preprocess any files. Just import
the file as a GRanges, subset it, and pass it to makeTxDbFromGRanges
.
Hi,
makeTxDbFromGFF()
has nothing to do with finding overlaps. It just imports the genes, transcripts, exons, and CDS from a GFF file into a TxDb object. Once you've done this, you can extract the genomic coordinates of the genes, transcripts, exons, or CDS from the TxDb object with genes()
, transcripts()
, exons()
, or cds()
:
txdb <- makeTxDbFromGFF("path/to/GFF/file") gn <- genes(txdb) # extract the genes (see ?genes for more information)
Then you can use findOverlaps()
to find the overlaps between a set of genomic ranges (e.g. some aligned reads) and gn
:
findOverlaps(reads, gn)
If you only want the 1st overlapping gene for each range in reads
, call findOverlaps()
with select="first"
:
findOverlaps(reads, gn, select="first")
Please see ?findOverlaps
for more information.
Cheers,
H.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How do I remove overlapping genes by subsetting granges?
It would be difficult, because I think it would need to be iterative. For example, you could have this situation:
You do not want to simply remove ranges 2,3,4. Rather, I think you want to keep 1 and 3. Thus, the algorithm would need to remove range 2, decide to keep range 3 since 2 is gone, and remove range 4.
Maybe something this (untested) will help you get started:
Please consider whether
findOverlaps
should haveignore.strand=TRUE
or not.