Entering edit mode
I imported a fasta file using readDNAStringSet :
fasta <- readDNAStringSet("~/piRbase.fa")
fasta@ranges
group start end width names
1 1 28720989 28721013 25 piR-hsa-1000547
2 1 42817439 42817467 29 piR-hsa-1483697
3 1 43210296 43210328 33 piR-hsa-1497078
4 1 47285542 47285569 28 piR-hsa-1635975
5 1 49067139 49067170 32 piR-hsa-1696655
6 1 50110086 50110115 30 piR-hsa-1732223
fasta
A DNAStringSet instance of length 29
width seq names
[1] 25 TAACCAGGGACAGCAGCAAAACATG piR-hsa-1000547
[2] 29 CAGAAATTCCTAAAGAAGAGAAGGACCCT piR-hsa-1483697
[3] 33 TCTGAACTTTTGAAACCTTGTGTGGGATTGATG piR-hsa-1497078
[4] 28 GTTACGGTTACTGTGGGCAAGTTTGAGA piR-hsa-1635975
[5] 32 TGAAAGGGCTGTTGTAATAGTGGATGATCGTG piR-hsa-1696655
- I'm trying to understand what are the
fasta@rangesand how are created because these are not the "correct" coordinates in the genome. - I have another bed file with the "correct" coordinates that I import using "rtracklayer" package to a Granges object. How is it possible to "merge" the information between the two files?
example bed:
GRanges object with 53 ranges and 2 metadata columns:
seqnames ranges strand | name score
<Rle> <IRanges> <Rle> | <character> <numeric>
piR-hsa-222502 chr2 182983305-182983332 - | piR-hsa-222502 0
piR-hsa-470448 chr18 77016562-77016586 - | piR-hsa-470448 0
piR-hsa-504532 chr11 65211877-65211907 + | piR-hsa-504532 0
piR-hsa-612712 chr15 68290091-68290122 + | piR-hsa-612712 0
piR-hsa-1000547 chr17 35820399-35820423 + | piR-hsa-1000547 1

The method is actually in BSgenome; sorry for not being clear.
Thank you! Probably naive problem but I get :
any suggestions on where to identify the problem?
A GRanges has
seqnames()(e.g., chromsomes) as well asnames()(e.g., exon IDs). You want to ensure thatSo names in the "fasta" file should have chr names and not something like smallRNA IDs e.g.:
Backing up, what is it that you want to do? I had not read your question carefully enough, and though that you had sequences, e.g., of chromosomes, and you wanted to extract sub-sequences, e.g., of small RNAs, whose genomic coordinates you knew.
You could for instance add a column to your GRanges object
or similar
Thank you for the time you spent.
Well, I try to make an object from 3 different files (the
fastawith the sequences, thegr_objwith coordinates in the genome and another table with read sequences from smallRNAseq) in order to plot something like coverage but for each different sequence of individual small RNA. I found this example and I'll recreate the "Plotting genomic ranges". Thus showing differences in read length and 3' or 5' ends of identified smallRNAs. Currently, I'm using a combination of stringr, dplyr and granges.