Rsamtools ScanFA error
2
0
Entering edit mode
@andreag1987-14482
Last seen 6.2 years ago

 

I am trying to fetch promoter sequences from a bacterial genome, retrieving only those sequences that don't fall into cds. I think the problem is that some sequences fall out of the FASTA sequences (2 chromosomes, 2 plasmids) that are not recognized as circular giving me an error from scanFa.

Here there's my code

phy<-makeTxDbFromGFF(file.choose(),format = "auto")


GR <- cdsBy(phy,"tx",use.names=TRUE)
indexFa(file.choose())
phy.fasta<-FaFile(file.choose(), index=sprintf("%s.fai", file.choose()))

p<-promoters(phy,200,0)
promoter_seqs<-subsetByOverlaps(p,GR,invert = TRUE)
promoter_seqs <- scanFa(phy.fasta,promoter_seqs,use.names=TRUE)

this is the error I get from scanFa:

Error in value[[3L]](cond) : 
record 1470 (1:3479165-3479364) was truncated

this is the seqinfo of phy.fasta

seqinfo(phy.fasta)
Seqinfo object with 4 sequences from an unspecified genome:
  seqnames seqlengths isCircular genome
  pBPHY01     1904893         NA   <NA>
  pBPHY02      595108         NA   <NA>
  1           3479187         NA   <NA>
  2           2697374         NA   <NA>

I have tried also with isCircular but this method doesn't work with FAfile

I hope that I have been clear, and thanks in advance for your time and your precious advises!

rsamtools • 1.1k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 2 days ago
United States

I think the diagnosis is correct -- you're asking for regions outside the bounds of the chromosome. I think the solution is to manually (e.g., using narrow()) trim promoter_seqs to be within seqinfo(), and to paste together any sequences that wrap around.

ADD COMMENT
0
Entering edit mode
@andreag1987-14482
Last seen 6.2 years ago

Dear Prof.Morgan thank you very much for your help! I solved the problem, here the solution I found:
promoter_seqs<-GRangesList(
  restrict(promoter_seqs[seqnames(promoter_seqs)=="1"],start = 1,end = 3479187),
  restrict(promoter_seqs[seqnames(promoter_seqs)=="2"],start = 1,end = 2697374),
  restrict(promoter_seqs[seqnames(promoter_seqs)=="pBPHY01"],start = 1,end = 1904893),
  promoter_seqs[seqnames(promoter_seqs)=="pBPHY02"])
promoter_seqs<-(unlist(promoter_seqs))
promoter_seqs_fa <- scanFa(phy.fasta,promoter_seqs,use.names=TRUE)

I hope it could be helpful to someone else.
 

ADD COMMENT

Login before adding your answer.

Traffic: 434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6