getSeq failing with DNAStringSet and GRanges
Entering edit mode
Mog • 0
Last seen 6 weeks ago
United Kingdom

Hi all, I'm trying to run ORFik on a folder of .fastas. These fastas are outputted by an earlier python script, they're HTS contigs marked as viral in origin. Everything runs fine until I need to extract just the ORFs from the fastas. Following the manual, I ryn findORFsFasta. The for loop is just to find each file in turn. It finds the ORFs fine, and the BioStringSet generates fine, I can print them.


all_fastas <- list.files("rma_fastas")

for (fasta in all_fastas)
    fasta_dss <- readDNAStringSet(fasta, "fasta")
    ORFs <- findORFsFasta(fasta, startCodon = startDefinition(1), stopCodon = stopDefinition(2), longestORF = TRUE, minimumLength = 50, is.circular = FALSE)
    extracted_ORFs <- getSeq(fasta_dss, names = ORFs)

But extracted_ORFs won't generate. It creates a massive error log that I just can't understand. I double checked the manual and it says for GRanges, names = (GRanges) is just fine. No clue what the problem is. I updated my R, my Bioconductor manager, and the packages, to no success. I'm running in Jupyter Lab because I don't have admin privs on this machine and Anaconda doesn't have the latest R Studio versions (or base R versions) for me to get around that problem.

Error in h(simpleError(msg, call)): error in evaluating the argument 'value' in selecting a method for function 'unsplit': error in evaluating the argument 'x' in selecting a method for function 'extractAt': subscript contains invalid names

1. getSeq(fasta_dss, names = ORFs)
2. getSeq(fasta_dss, names = ORFs)
3. .local(x, ...)
4. unsplit(extractAt(x[names(rl)], unname(rl)), seqnames(gr))
5. extractAt(x[names(rl)], unname(rl))
6. x[names(rl)]
7. x[names(rl)]
8. subset_along_ROWS(x, i, drop = drop)
9. extractROWS(x, i)
10. extractROWS(x, i)
11. .dropUnusedPoolElts(callNextMethod())
12. callNextMethod()
13. .nextMethod(x = x, i = i)
14. normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
15. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
  .     allow.NAs = allow.NAs)
16. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
  .     allow.NAs = allow.NAs)
17. .subscript_error("subscript contains invalid ", what)
18. stop(wmsg(...), call. = FALSE)
19. .handleSimpleError(function (cond) 
  . .Internal(C_tryCatchHelper(addr, 1L, cond)), "subscript contains invalid names", 
  .     base::quote(NULL))
20. h(simpleError(msg, call))
21. .handleSimpleError(function (cond) 
  . .Internal(C_tryCatchHelper(addr, 1L, cond)), "error in evaluating the argument 'x' in selecting a method for function 'extractAt': subscript contains invalid names", 
  .     base::quote(h(simpleError(msg, call))))
22. h(simpleError(msg, call))

Here's the specs, but I get the same problem running an anaconda build with a more outdated set.

R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.utf8 [2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 [4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] fansi_1.0.3 crayon_1.5.2 digest_0.6.29 utf8_1.2.2
[5] IRdisplay_1.1 repr_1.1.4 lifecycle_1.0.2 jsonlite_1.8.0 [9] evaluate_0.16 pillar_1.8.1 rlang_1.0.6 cli_3.4.1
[13] uuid_1.1-0 vctrs_0.4.1 IRkernel_1.3 tools_4.2.1
[17] glue_1.6.2 fastmap_1.1.0 compiler_4.2.1 base64enc_0.1-3 [21] pbdZMQ_0.3-7 htmltools_0.5.3

Any help would be appreciated. I've been fighting with this all day and I do really need ORFik for what I want to do; no other package really does what I require.

EDIT: I have access to a Linux server but as far as actually writing and testing goes, it's a method of last resort! I prefer to get things working on Windows then stitch them in since the server has no graphics. EDIT 2: This also fails. What is my error here? It's not a version problem I suppose?

ORFik Bioconductor GenomicRanges Biostrings BSgenome • 211 views
Entering edit mode
Last seen 1 hour ago
United States

The example for findORFsFasta has this


     # location of the example fasta file
     example_genome <- system.file("extdata/Danio_rerio_sample", "genome_dummy.fasta",
      package = "ORFik")
     orfs <- findORFsFasta(example_genome)
     # To store ORF sequences (you need indexed genome .fai file):
     fa <- FaFile(example_genome)
     names(orfs) <- paste0("ORF_",, "_", seqnames(orfs))
     orf_seqs <- getSeq(fa, orfs)
     # You sequences (fa), needs to have isCircular(fa) == TRUE for it to work
     # on circular wrapping ranges!

     # writeXStringSet(DNAStringSet(orf_seqs), "orfs.fasta")

And note that fa in this example is a FaFile, not a DNAStringSet. If I run the example

> getSeq(fa, orfs)
DNAStringSet object of length 1235:
       width seq                                            names               
   [1]    33 CTGCAACGGGCAATATGTCTCTGTGTGGATTAA              Chromosome
   [5]    30 CTGGGAGCGTTTTTGCCAGGAACTGGGTAA                 Chromosome
   ...   ... ...
[1231]    12 TTGGCGATGTGA                                   Chromosome
[1232]    24 TTGTCCATCCCAGTGCCGTCTTGA                       Chromosome
[1233]    12 TTGAGGGGTTAA                                   Chromosome

But if I do what you have done

> z <- readDNAStringSet(path(fa))
> getSeq(z, orfs)
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'value' in selecting a method for function 'unsplit': error in evaluating the argument 'x' in selecting a method for function 'extractAt': subscript contains invalid names

Which shows why perusing the help page, including examples, is a good idea.

Entering edit mode

Ah thanks for that. I was just using the manual .pdf, I misread what it said about DNAStringSets as being an input for some reason. I haven't used R in a while. Thanks so much for getting back to me so quickly and pointing me in the right direction despite my rookie error!


Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6