Tximeta Error: lexical error: invalid char in json text.
1
0
Entering edit mode
atariw ▴ 10
@atariw-7670
Last seen 18 days ago
Italy

I am trying to use tximeta R package.

R version 4.1.2 tximeta version 1.12.3.

I runned the following code:

makeLinkedTxome(
indexDir = "/Users/atari/biodata/genomes/ensembl_hg38/indexes/SALMON",
source = "ensembl",
organism = "Homo Sapiens",
release = 98,
genome = "GRCh38",
fasta = "/Users/atari/biodata/genomes/ensembl_hg38/transcripts/ensembl_hg38_transcripts.fa.gz",
gtf = "/Users/atari/biodata/genomes/ensembl_hg38/GTF/original.gtf.gz",
write = TRUE,
jsonFile = "/Users/atari/biodata/provaLinkedTxome.json"
)

json = "/Users/atari/biodata/provaLinkedTxome.json"
metafile =  "/Users/atari/biodata/prova_salmon/metadata.txt"
salmondir = "/Users/atari/biodata/prova_salmon"

## Load json linkedTxome
loadLinkedTxome(json)

## Read metadata
metadata <- read.delim(metafile, header = TRUE, as.is = TRUE, sep = "\t")

## List Salmon directories
salmonfiles <- paste0(salmondir, "/", metadata$names, "/quant.sf") names(salmonfiles) <- metadata$names

## Add file column to metadata and import annotated abundances
## In transcript level
coldata <- cbind(metadata, files = salmonfiles, stringsAsFactors = FALSE)
st <- tximeta::tximeta(coldata)

# When I run the last line I get this error:
Error: lexical error: invalid char in json text.
/Users/atari/biodata/prova_salmo
(right here) ------^


I would be grateful to get help to solve the issue. All quan.sf salmon files seem correctly populated (and each one is saved in a directory corrisponding to the name in the metadata file).

Thx a lot, Andrea

tximeta • 469 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Can you try importing these files one by one with tximport()?

for (i in seq_along(salmonfiles)) {
cat(i)
txi <- tximport(salmonfiles[i], type="salmon")
}


This may help identify which JSON file is causing the issue.

ADD COMMENT
0
Entering edit mode

for (i in seq_along(salmonfiles)) { cat(i) txi <- tximport(salmonfiles[i], type="salmon") } 1reading in files with read.delim (install 'readr' package for speed up) Error in summarizeFail() :

tximport failed at summarizing to the gene-level. Please see 'Solutions' in the Details section of the man page: ?tximport

Could it be because I do have identifiers with version in the headers of the fasta transcripts ? (e.g. ENSG00000237235.2 ) but without in the GTF.... (e.g. ENSG00000237235 )

Thx a lot

ADD REPLY
0
Entering edit mode

I tried. Same error. So, it doesn't seem related to the gene/transcripts version.

ADD REPLY
0
Entering edit mode

Sorry, I forgot to add txOut=TRUE to the tximport() command. This will help us find the JSON issue.

Also, unrelated, but I'd recommend:

install.packages("readr")

ADD REPLY
0
Entering edit mode

With txOut=TRUE it works.

for (i in seq_along(salmonfiles)) { cat(i) txi <- tximport(salmonfiles[i], type="salmon", txOut=TRUE ) } 1reading in files with read_tsv 1 2reading in files with read_tsv 1 3reading in files with read_tsv 1 4reading in files with read_tsv 1

ADD REPLY
0
Entering edit mode

Ah, it's the linkedTxome json, not the quantification json files. I should have seen this.

Can you examine this file to see if it looks complete:

/Users/atari/biodata/provaLinkedTxome.json

If it looks complete, try deleting it and re-running makeLinkedTxome().

Side question: if this is just Ensembl human transcriptome, why do you need a linkedTxome?

If this is a modified version of the Ensembl human transcriptome, I'd recommend changing these lines to:

source = "LocalEnsembl",
organism = "Homo sapiens",


(the second line just to match with how organisms are typically capitalized)

ADD REPLY
0
Entering edit mode

The json file seems ok. I tried deleting and regenerating the file. I still get the same error.

cat /Users/atari/biodata/provaLinkedTxome.json [ { "index": "SALMON2", "source": "LocalEnsembl", "organism": "Homo Sapiens", "release": 98, "genome": "GRCh38", "fasta": ["/Users/atari/biodata/genomes/ensembl_hg38/transcripts/ensembl_hg38_transcripts2.fa.gz"], "gtf": "/Users/atari/biodata/genomes/ensembl_hg38/GTF/original.gtf.gz", "sha256": "afae900a8fe578ee066e69d6991302dc980f80150d0e77285eb46dca77c8fee8" } ]

I also tried to put the path before the SALMON2 index, but I still get the same error.

This time I am using a standard ensembl. However I could change that in the future.

ADD REPLY
0
Entering edit mode

Can you try to read these files in with the function that is throwing the error?

library(jsonlite)
dat1 <- jsonlite::fromJSON("/Users/atari/biodata/provaLinkedTxome.json")
dat2 <- jsonlite::fromJSON("/path/to/sample/cmd_info.json")
dat3 <- jsonlite::fromJSON("/path/to/sample/aux_info/meta_info.json")


One of these is throwing an error, but I'm not sure which one anymore.

ADD REPLY
0
Entering edit mode

Should I provide also the cmd_info.json and meta_info.json ??? Where should I specify the path to those files in the code ?

P.S. Now I may have understood the issue. That library has a severe bug !!! It throughs an incomprehensible error when it just cannot find the file: dat3 <- jsonlite::fromJSON("/not_exist_file.json") Error: lexical error: invalid char in json text. /not_exist_file.json (right here) ------^

ADD REPLY
0
Entering edit mode

Above in my code examples, I was suggesting to try to read some of the json files that are within each Salmon output directory, as a test to see why the error is coming up.

Have you found out which file was missing? Can you show the directory structure containing the files that may be missing?

ADD REPLY
0
Entering edit mode

I hadn't thought about providing you those files. I can add those files in any directory. Where should I specify in the code the directory where to put those files ? I can only guess that you extract the directory from the path to the various quant.sf files.

If put the content of the salmon directory in that path, I then get this new error: st <- tximeta::tximeta(coldata) importing quantifications reading in files with read_tav 1 2 3 4 found matching linked transcriptome: [ Ensembl - Homo Sapiens - release 98 ] useHub=TRUE: checking for EnsDb via 'AnnotationHub' Error in AnnotationHub() : DEFUNCT: As of AnnotationHub (>2.23.2), default caching location has changed. Problematic cache: /Users/atari/Library/Caches/AnnotationHub See https://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/TroubleshootingTheCache.html#default-caching-location-update

Thx a lot.

ADD REPLY
0
Entering edit mode

1) Sorry, I think there is confusion here, I don't mean for you to provide me files. I'm asking: can you on your machine attempt to read some of these files with jsonlite to see why the error is occuring?

dat1 <- jsonlite::fromJSON("/Users/atari/biodata/provaLinkedTxome.json")
dat2 <- jsonlite::fromJSON(".../sample01/cmd_info.json")
dat3 <- jsonlite::fromJSON(".../sample01/aux_info/meta_info.json")


So in the above code, you should replace .../sample01 with a path to a Salmon output directory.

Do these three commands work or give an error? If so, which give an error?

2) The second error can be solved by following the instructions at that link. That is, copying the code from that link and restarting R.

ADD REPLY
0
Entering edit mode

Ok, thx. Now it works. It just complain with a warning for some few missing transcripts in the GTF (I am investigating about it).

dat1 <- jsonlite::fromJSON("/Users/atari/biodata/provaLinkedTxome.json") dat2 <- jsonlite::fromJSON(".../sample01/cmd_info.json") dat3 <- jsonlite::fromJSON(".../sample01/aux_info/meta_info.json")

All these files are read and parsed correctly by the jsonlite library.

Thx a million.

ADD REPLY
0
Entering edit mode

Ok great.

The warning is common with Ensembl, they apparently include more transcripts in FASTA than in the GTF. I believe the warning text even mentions this about Ensembl?

ADD REPLY
0
Entering edit mode

Yes, it does. Thx.

ADD REPLY

Login before adding your answer.

Traffic: 190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6