Hi all, I'm trying to run ORFik on a folder of .fastas. These fastas are outputted by an earlier python script, they're HTS contigs marked as viral in origin. Everything runs fine until I need to extract just the ORFs from the fastas. Following the manual, I ryn findORFsFasta. The for loop is just to find each file in turn. It finds the ORFs fine, and the BioStringSet generates fine, I can print them.
library(ORFik)
library(GenomicRanges)
library(BSgenome)
library(Biostrings)
all_fastas <- list.files("rma_fastas")
setwd("rma_fastas")
for (fasta in all_fastas)
{
fasta_dss <- readDNAStringSet(fasta, "fasta")
ORFs <- findORFsFasta(fasta, startCodon = startDefinition(1), stopCodon = stopDefinition(2), longestORF = TRUE, minimumLength = 50, is.circular = FALSE)
extracted_ORFs <- getSeq(fasta_dss, names = ORFs)
}
But extracted_ORFs won't generate. It creates a massive error log that I just can't understand. I double checked the manual and it says for GRanges, names = (GRanges) is just fine. No clue what the problem is. I updated my R, my Bioconductor manager, and the packages, to no success. I'm running in Jupyter Lab because I don't have admin privs on this machine and Anaconda doesn't have the latest R Studio versions (or base R versions) for me to get around that problem.
Error in h(simpleError(msg, call)): error in evaluating the argument 'value' in selecting a method for function 'unsplit': error in evaluating the argument 'x' in selecting a method for function 'extractAt': subscript contains invalid names
Traceback:
1. getSeq(fasta_dss, names = ORFs)
2. getSeq(fasta_dss, names = ORFs)
3. .local(x, ...)
4. unsplit(extractAt(x[names(rl)], unname(rl)), seqnames(gr))
5. extractAt(x[names(rl)], unname(rl))
6. x[names(rl)]
7. x[names(rl)]
8. subset_along_ROWS(x, i, drop = drop)
9. extractROWS(x, i)
10. extractROWS(x, i)
11. .dropUnusedPoolElts(callNextMethod())
12. callNextMethod()
13. .nextMethod(x = x, i = i)
14. normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
15. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append,
. allow.NAs = allow.NAs)
16. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append,
. allow.NAs = allow.NAs)
17. .subscript_error("subscript contains invalid ", what)
18. stop(wmsg(...), call. = FALSE)
19. .handleSimpleError(function (cond)
. .Internal(C_tryCatchHelper(addr, 1L, cond)), "subscript contains invalid names",
. base::quote(NULL))
20. h(simpleError(msg, call))
21. .handleSimpleError(function (cond)
. .Internal(C_tryCatchHelper(addr, 1L, cond)), "error in evaluating the argument 'x' in selecting a method for function 'extractAt': subscript contains invalid names",
. base::quote(h(simpleError(msg, call))))
22. h(simpleError(msg, call))
Here's the specs, but I get the same problem running an anaconda build with a more outdated set.
R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8
[2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
attached base packages: [1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] fansi_1.0.3 crayon_1.5.2 digest_0.6.29 utf8_1.2.2
[5] IRdisplay_1.1 repr_1.1.4 lifecycle_1.0.2 jsonlite_1.8.0
[9] evaluate_0.16 pillar_1.8.1 rlang_1.0.6 cli_3.4.1
[13] uuid_1.1-0 vctrs_0.4.1 IRkernel_1.3 tools_4.2.1
[17] glue_1.6.2 fastmap_1.1.0 compiler_4.2.1 base64enc_0.1-3
[21] pbdZMQ_0.3-7 htmltools_0.5.3
Any help would be appreciated. I've been fighting with this all day and I do really need ORFik for what I want to do; no other package really does what I require.
EDIT: I have access to a Linux server but as far as actually writing and testing goes, it's a method of last resort! I prefer to get things working on Windows then stitch them in since the server has no graphics. EDIT 2: This also fails. What is my error here? It's not a version problem I suppose?
Ah thanks for that. I was just using the manual .pdf, I misread what it said about DNAStringSets as being an input for some reason. I haven't used R in a while. Thanks so much for getting back to me so quickly and pointing me in the right direction despite my rookie error!