Question: gff files: how to tell if right-open interval convention used?
0
gravatar for Julien Gagneur
8.6 years ago by
Julien Gagneur50 wrote:
Hi Simon, Karl and Herve, As Simon points out, the GFF3 format is quite ill-defined. When implementing readGff3() in genomeIntervals, we had to interpret the specifications. GFF3 allows zero-length features but there is no column to flag intervals as zero-length features. Have we missed something?
genomeintervals • 507 views
ADD COMMENTlink modified 8.6 years ago by Hervé Pagès ♦♦ 14k • written 8.6 years ago by Julien Gagneur50
Answer: gff files: how to tell if right-open interval convention used?
0
gravatar for Hervé Pagès
8.6 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:
Hi Julien, Simon, It's true that by just looking at the start/end of a feature, if start == end you cannot tell if this is a zero-length feature (e.g. an insertion point) or a feature of length 1 (e.g. a SNP). But my understanding of the GFF3 specs is that you know it's a zero- length feature by looking at the type of feature (there is a column where this is specified). So maybe this is why the GFF3 people considered that there was no need to flag intervals as zero-length features? Not necessarily a very good design though because that means GFF3 parsers need to get the information of which feature types are zero-length features from somewhere (from the user?). That's only my guess. Of course instead of trying to guess those things I should ask the GFF3 people (hey they have mailing lists!) Anyway, maybe setting the default value for isRightOpen to FALSE in readGff3() would be more accurate and avoid a lot of confusion, especially since most users don't have or don't care about zero-length features. Thanks! H. ----- Original Message ----- From: "Julien Gagneur" <julien.gagneur@embl.de> To: "Simon Anders" <anders at="" embl.de=""> Cc: bioconductor at r-project.org, karlerhard at berkeley.edu, "Hervé Pagès" <hpages at="" fhcrc.org=""> Sent: Friday, April 1, 2011 3:37:27 AM Subject: Re: [BioC] gff files: how to tell if right-open interval convention used? Hi Simon, Karl and Herve, As Simon points out, the GFF3 format is quite ill-defined. When implementing readGff3() in genomeIntervals, we had to interpret the specifications. GFF3 allows zero-length features but there is no column to flag intervals as zero-length features. Have we missed something? >From the two sentences: "Start is always less than or equal to end." and "For zero-length features, such as insertion sites, start equals end ..." we understood that the only way to distinguish zero-length features from other features was to adopt a right-open convention. We thus interpreted this as an equivalence: "features are zero-length if and only if starts equals end". It is not exactly what it is said, but what else could we do? Of course, we have noticed that most files actually use the right- closed convention and rarely have zero-length features. We therefore added the parameter isRightOpen. Although not frequently provided, I believe zero-length features can be useful. genomeIntervals provides consistent support for these (including interval_overlap, etc). Hope this clarifies the question. Julien
ADD COMMENTlink written 8.6 years ago by Hervé Pagès ♦♦ 14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 388 users visited in the last hour