I am trying to extract the genomic coordinates (chromosome name, start position, end position) from an MIndex object but can't figure out how to do it.
Can someone help?
Here is the code - first, I load the library, read a genome sequence into memory (it's tiny), and create DNAString objects out of a column of short sequences stored in my data frame
library(Biostrings) genome=readDNAStringSet("O_sativa_japonica_Oct_2011.fa") query_seqs = sapply(data$Sequence,DNAString)
A bit more info:
data is a data frame with one column (
Sequence) that contains short oligonucleotide sequences - character data. These are guide RNA sequences I got from a collaborator who needs to map them onto the genomic sequence in rice. We only care about exact matches at this point.
I then locate them in the genomic sequence with:
matches = lapply(query_seqs,vmatchPattern,genome) rc_matches = lapply(rc_query_seqs,vmatchPattern,genome)
The above two commands return lists. Interrogating one item from one list:
m = matches[] > m MIndex object of length 16 $Chr1 IRanges object with 0 ranges and 0 metadata columns: start end width <integer> <integer> <integer> $Chr2 IRanges object with 1 range and 0 metadata columns: start end width <integer> <integer> <integer>  4496975 4496994 20 $Chr3 IRanges object with 0 ranges and 0 metadata columns: start end width <integer> <integer> <integer> ... <13 more elements>
As you can see from the above code, object
m is an
MIndex. That sounds fine, and now I need to get some data out of it, so that I can then write out a BED format file and look at the data in a genome browser. (For example, I need to interactively check that the data I got from my collaborator fits his assumption about it.)
For each location in the genome that matched, I want to retrieve the name of the sequence (chromosome) that matched, the start position of the match, and the end position of the match, so that I can fill in the proper fields of the BED file.
How do I get that data from this
MIndex thing and the
IRanges objects it appears to contain?
I've been looking through the documentation and can't figure out what methods to use or how to do it.
Thanks in advance for any help you can provide!