Question

Feature Request: improve output when printing GenomicRanges instances

0

Entering edit mode

Keith Hughitt ▴ 180

@keith-hughitt-6740

Last seen 8 weeks ago

United States

Currently, when dsiplaying a GenomicRanges instance, there are often many more columns to be displayed than can fit on a screen at once.

The result is that when the instance is printed to the screen, single entries appear over multiple rows so that all columns can be displayed, e.g.:

print(gr)

GRanges object with 6 ranges and 23 metadata columns:
      seqnames         ranges strand |    source     type     score     phase
         <Rle>      <IRanges>  <Rle> |  <factor> <factor> <numeric> <integer>
  [1]  LmjF.01 [ 3704,  4702]      - | TriTrypDB     gene      <NA>      <NA>
  [2]  LmjF.01 [ 5790,  7439]      - | TriTrypDB     gene      <NA>      <NA>
  [3]  LmjF.01 [ 9061, 11067]      - | TriTrypDB     gene      <NA>      <NA>
  [4]  LmjF.01 [12073, 12642]      - | TriTrypDB     gene      <NA>      <NA>
  [5]  LmjF.01 [15025, 17022]      - | TriTrypDB     gene      <NA>      <NA>
  [6]  LmjF.01 [18137, 18886]      - | TriTrypDB     gene      <NA>      <NA>
                ID         Name                            description        size
       <character>  <character>                            <character> <character>
  [1] LmjF.01.0010 LmjF.01.0010 hypothetical+protein,+unknown+function         999
  [2] LmjF.01.0020 LmjF.01.0020        hypothetical+protein,+conserved        1650
  [3] LmjF.01.0030 LmjF.01.0030       Kinesin-13+1,+putative+(KIN13-1)        2007
  [4] LmjF.01.0040 LmjF.01.0040 hypothetical+protein,+unknown+function         570
  [5] LmjF.01.0050 LmjF.01.0050                  carboxylase,+putative        1998
  [6] LmjF.01.0060 LmjF.01.0060        hypothetical+protein,+conserved         750
            web_id molecule_type organism_name translation_table    topology
       <character>   <character>   <character>       <character> <character>
  [1] LmjF.01.0010          <NA>          <NA>              <NA>        <NA>
  [2] LmjF.01.0020          <NA>          <NA>              <NA>        <NA>
  [3] LmjF.01.0030          <NA>          <NA>              <NA>        <NA>
  [4] LmjF.01.0040          <NA>          <NA>              <NA>        <NA>
  [5] LmjF.01.0050          <NA>          <NA>              <NA>        <NA>
  [6] LmjF.01.0060          <NA>          <NA>              <NA>        <NA>
      localization          Dbxref    locus_tag                              Alias
       <character> <CharacterList>  <character>                    <CharacterList>
  [1]         <NA>                 LmjF.01.0010 321438052,389592307,LmjF1.0010,...
  [2]         <NA>                 LmjF.01.0020 321438053,389592309,LmjF1.0020,...
  [3]         <NA>                 LmjF.01.0030     KIN13-1,Kif-13-1,321438054,...
  [4]         <NA>                 LmjF.01.0040 321438055,389592313,LmjF1.0040,...
  [5]         <NA>                 LmjF.01.0050 321438056,389592315,LmjF1.0050,...
  [6]         <NA>                 LmjF.01.0060 321438057,389592317,LmjF1.0060,...
               Parent   Ontology_term       Frame         Comment   identical
      <CharacterList> <CharacterList> <character> <CharacterList> <character>
  [1]                                        <NA>                        <NA>
  [2]                                        <NA>                        <NA>
  [3]                                        <NA>                        <NA>
  [4]                                        <NA>                        <NA>
  [5]                                        <NA>                        <NA>
  [6]                                        <NA>                        <NA>
          overlap
      <character>
  [1]        <NA>
  [2]        <NA>
  [3]        <NA>
  [4]        <NA>
  [5]        <NA>
  [6]        <NA>
  -------
  seqinfo: 36 sequences from an unspecified genome; no seqlengths

A much nicer approach might be to instead adopt a dplyr-like adaptive approach which modifies the number of columns displayed by default such that they all fit nicely on the screen:

> tbl_df(as.data.frame(gff1))
Source: local data frame [6 x 28]

  seqnames start   end width strand    source   type score phase
    (fctr) (int) (int) (int) (fctr)    (fctr) (fctr) (dbl) (int)
1  LmjF.01  3704  4702   999      - TriTrypDB   gene    NA    NA
2  LmjF.01  5790  7439  1650      - TriTrypDB   gene    NA    NA
3  LmjF.01  9061 11067  2007      - TriTrypDB   gene    NA    NA
4  LmjF.01 12073 12642   570      - TriTrypDB   gene    NA    NA
5  LmjF.01 15025 17022  1998      - TriTrypDB   gene    NA    NA
6  LmjF.01 18137 18886   750      - TriTrypDB   gene    NA    NA
Variables not shown: ID (chr), Name (chr), description (chr), size
  (chr), web_id (chr), molecule_type (chr), organism_name (chr),
  translation_table (chr), topology (chr), localization (chr),
  Dbxref (chr), locus_tag (chr), Alias (chr), Parent (chr),
  Ontology_term (chr), Frame (chr), Comment (chr), identical (chr),
  overlap (chr)

Above is what the output would look like on a fairly small console (~800px wide). For a larger screen, more of the columns would be displayed in a similar manner.

Although I am sure people will have different preferences, I personally find this much cleaner and more pleasant to work with.

Is there any chance of changing the default behavior of GenomicRanges in the future to behave more like this?

GenomicRanges featurerequest • 1.1k views

ADD COMMENT • link 8.6 years ago Keith Hughitt ▴ 180

0

Entering edit mode

While I'm not enlightened enough to be a dplyr user, it seems at least as desirable to show the last columns, since in a typical workflow, one is adding columns and wants to review them easily. Perhaps we could extend the row-wise head...tail approach to the columns. Personally, I've never really been bothered by showing all the columns, so we should support configuration like we do for the row-wise strategy.

ADD REPLY • link 8.6 years ago Michael Lawrence ★ 11k