how to get genomic coordinates from an MIndex object
1
0
Entering edit mode
Ann Loraine ▴ 110
@ann-loraine-3863
Last seen 11 weeks ago
United States

Hello,

I am trying to extract the genomic coordinates (chromosome name, start position, end position) from an MIndex object but can't figure out how to do it.

Can someone help?

Here is the code - first, I load the library, read a genome sequence into memory (it's tiny), and create DNAString objects out of a column of short sequences stored in my data frame data:

library(Biostrings)
query_seqs = sapply(data$Sequence,DNAString)  A bit more info: The object data is a data frame with one column (Sequence) that contains short oligonucleotide sequences - character data. These are guide RNA sequences I got from a collaborator who needs to map them onto the genomic sequence in rice. We only care about exact matches at this point. I then locate them in the genomic sequence with: matches = lapply(query_seqs,vmatchPattern,genome) rc_matches = lapply(rc_query_seqs,vmatchPattern,genome)  The above two commands return lists. Interrogating one item from one list: m = matches[[1]] > m MIndex object of length 16$Chr1
IRanges object with 0 ranges and 0 metadata columns:
start       end     width
<integer> <integer> <integer>

$Chr2 IRanges object with 1 range and 0 metadata columns: start end width <integer> <integer> <integer> [1] 4496975 4496994 20$Chr3
IRanges object with 0 ranges and 0 metadata columns:
start       end     width
<integer> <integer> <integer>

...
<13 more elements>


As you can see from the above code, object m is an MIndex. That sounds fine, and now I need to get some data out of it, so that I can then write out a BED format file and look at the data in a genome browser. (For example, I need to interactively check that the data I got from my collaborator fits his assumption about it.)

For each location in the genome that matched, I want to retrieve the name of the sequence (chromosome) that matched, the start position of the match, and the end position of the match, so that I can fill in the proper fields of the BED file.

How do I get that data from this MIndex thing and the IRanges objects it appears to contain?

I've been looking through the documentation and can't figure out what methods to use or how to do it.

-Ann

IRanges MIndex • 170 views
2
Entering edit mode
@michael-lawrence-3846
Last seen 17 days ago
United States

You can convert to a GRanges with GRanges(m), and then use its accessors.

0
Entering edit mode

Thanks! Got it now :-)

0
Entering edit mode

Btw, it might be more convenient to use the vmatchPDict() function in BSgenome, i.e., something like vmatchPDict(PDict(query_seqs)), BSgenome.Osativa.MSU.MSU7). That gives you a GRanges directly, with a column indicating the query sequence for each hit.