Entering edit mode
I imported a fasta file using readDNAStringSet
:
fasta <- readDNAStringSet("~/piRbase.fa")
fasta@ranges
group start end width names
1 1 28720989 28721013 25 piR-hsa-1000547
2 1 42817439 42817467 29 piR-hsa-1483697
3 1 43210296 43210328 33 piR-hsa-1497078
4 1 47285542 47285569 28 piR-hsa-1635975
5 1 49067139 49067170 32 piR-hsa-1696655
6 1 50110086 50110115 30 piR-hsa-1732223
fasta
A DNAStringSet instance of length 29
width seq names
[1] 25 TAACCAGGGACAGCAGCAAAACATG piR-hsa-1000547
[2] 29 CAGAAATTCCTAAAGAAGAGAAGGACCCT piR-hsa-1483697
[3] 33 TCTGAACTTTTGAAACCTTGTGTGGGATTGATG piR-hsa-1497078
[4] 28 GTTACGGTTACTGTGGGCAAGTTTGAGA piR-hsa-1635975
[5] 32 TGAAAGGGCTGTTGTAATAGTGGATGATCGTG piR-hsa-1696655
- I'm trying to understand what are the
fasta@ranges
and how are created because these are not the "correct" coordinates in the genome. - I have another bed file with the "correct" coordinates that I import using "rtracklayer" package to a Granges object. How is it possible to "merge" the information between the two files?
example bed:
GRanges object with 53 ranges and 2 metadata columns:
seqnames ranges strand | name score
<Rle> <IRanges> <Rle> | <character> <numeric>
piR-hsa-222502 chr2 182983305-182983332 - | piR-hsa-222502 0
piR-hsa-470448 chr18 77016562-77016586 - | piR-hsa-470448 0
piR-hsa-504532 chr11 65211877-65211907 + | piR-hsa-504532 0
piR-hsa-612712 chr15 68290091-68290122 + | piR-hsa-612712 0
piR-hsa-1000547 chr17 35820399-35820423 + | piR-hsa-1000547 1
The method is actually in BSgenome; sorry for not being clear.
Thank you! Probably naive problem but I get :
any suggestions on where to identify the problem?
A GRanges has
seqnames()
(e.g., chromsomes) as well asnames()
(e.g., exon IDs). You want to ensure thatSo names in the "fasta" file should have chr names and not something like smallRNA IDs e.g.:
Backing up, what is it that you want to do? I had not read your question carefully enough, and though that you had sequences, e.g., of chromosomes, and you wanted to extract sub-sequences, e.g., of small RNAs, whose genomic coordinates you knew.
You could for instance add a column to your GRanges object
or similar
Thank you for the time you spent.
Well, I try to make an object from 3 different files (the
fasta
with the sequences, thegr_obj
with coordinates in the genome and another table with read sequences from smallRNAseq) in order to plot something like coverage but for each different sequence of individual small RNA. I found this example and I'll recreate the "Plotting genomic ranges". Thus showing differences in read length and 3' or 5' ends of identified smallRNAs. Currently, I'm using a combination of stringr, dplyr and granges.