Question: Get rs numbers using IntronVariants() in VariantAnnotation, locateVariants
gravatar for
7 weeks ago by
goldberg.jm10 wrote:

Hi All,

Could you please let me know how to get rsnumbers using using IntronVariants()?

intron_regions     <- IntronVariants()

head(locateVariants(vcf_mod, enstxdb, intron_regions))
'select()' returned many:1 mapping between keys and columns
GRanges object with 6 ranges and 9 metadata columns:
     seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID        TXID         CDSID      GENEID       PRECEDEID
         <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <character> <IntegerList> <character> <CharacterList>
[1]    chr19    572764      + |   intron        63        63        82           1          <NA>         BSG            <NA>
[2]    chr19    572770      + |   intron        69        69        83           1          <NA>         BSG            <NA>
[3]    chr19    572786      + |   intron        85        85        84           1          <NA>         BSG            <NA>Best,

I would expect the rsnumbers to appear as rownames in this output. I can provide details of my input to locateVariants if it would help. The rsnumbers ARE in the vcf.


ADD COMMENTlink modified 6 weeks ago by Valerie Obenchain ♦♦ 6.6k • written 7 weeks ago by goldberg.jm10
gravatar for Valerie Obenchain
6 weeks ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:


If the rsnumbers are the ID column of the VCF file they should be the rownames in the readVcf() output. This (which you may already know) is described in the ?readVcf man page:

     rowRanges: The CHROM, POS, ID and REF fields are used to create a
          ‘GRanges’ object. Ranges are created using POS as the start
          value and width of the reference allele (REF). By default,
          the IDs become the rownames ('row.names = FALSE' to turn this
          off). If IDs are missing (i.e., ‘.’) a string of
          CHROM:POS_REF/ALT is used instead.  The ‘genome’ argument is
          stored in the seqinfo of the ‘GRanges’ and can be accessed
          with ‘genome(<VCF>)’.

Are the rsnumbers in the 'vcf_mod' object? Is the question why didn't the rownames from 'vcf_mod' propagate to the output of locateVariants()?


I edited my answer after re-reading the question.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Valerie Obenchain ♦♦ 6.6k

If the rsnumbers are in 'vcf_mod' you can map back to them with the QUERYID column in the output of locateVariants().

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Valerie Obenchain ♦♦ 6.6k

I ended up doing an ugly cludge... I read in the small, selected vcf as a data.frame and mapped its rs numbers to the names of the small GRranges object via genome start coordinate of the variant. I realize that this could lead to errors if more than one rs number has the same start. When I get a chance, I will fix my script to use the correct method you have provided! Thank you, Jon

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by goldberg.jm10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 123 users visited in the last hour