Get rs numbers using IntronVariants() in VariantAnnotation, locateVariants
Entering edit mode ▴ 10
Last seen 5.6 years ago

Hi All,

Could you please let me know how to get rsnumbers using using IntronVariants()?

intron_regions     <- IntronVariants()

head(locateVariants(vcf_mod, enstxdb, intron_regions))
'select()' returned many:1 mapping between keys and columns
GRanges object with 6 ranges and 9 metadata columns:
     seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID        TXID         CDSID      GENEID       PRECEDEID
         <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <character> <IntegerList> <character> <CharacterList>
[1]    chr19    572764      + |   intron        63        63        82           1          <NA>         BSG            <NA>
[2]    chr19    572770      + |   intron        69        69        83           1          <NA>         BSG            <NA>
[3]    chr19    572786      + |   intron        85        85        84           1          <NA>         BSG            <NA>Best,

I would expect the rsnumbers to appear as rownames in this output. I can provide details of my input to locateVariants if it would help. The rsnumbers ARE in the vcf.


variantannotation • 1.2k views
Entering edit mode
Last seen 2.1 years ago
United States


If the rsnumbers are the ID column of the VCF file they should be the rownames in the readVcf() output. This (which you may already know) is described in the ?readVcf man page:

     rowRanges: The CHROM, POS, ID and REF fields are used to create a
          ‘GRanges’ object. Ranges are created using POS as the start
          value and width of the reference allele (REF). By default,
          the IDs become the rownames ('row.names = FALSE' to turn this
          off). If IDs are missing (i.e., ‘.’) a string of
          CHROM:POS_REF/ALT is used instead.  The ‘genome’ argument is
          stored in the seqinfo of the ‘GRanges’ and can be accessed
          with ‘genome(<VCF>)’.

Are the rsnumbers in the 'vcf_mod' object? Is the question why didn't the rownames from 'vcf_mod' propagate to the output of locateVariants()?


I edited my answer after re-reading the question.

Entering edit mode

If the rsnumbers are in 'vcf_mod' you can map back to them with the QUERYID column in the output of locateVariants().

Entering edit mode

I ended up doing an ugly cludge... I read in the small, selected vcf as a data.frame and mapped its rs numbers to the names of the small GRranges object via genome start coordinate of the variant. I realize that this could lead to errors if more than one rs number has the same start. When I get a chance, I will fix my script to use the correct method you have provided! Thank you, Jon


Login before adding your answer.

Traffic: 796 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6