Question: Cannot find header.json in salmon index directory
0
4 weeks ago by
mico0
mico0 wrote:

I am running tximeta to import the salmon quantification data. When I link to the transcriptome using the following code:

indexDir <- file.path(dir, "transcriptome", "GRCh38.index")
fastaPath <- c(file.path(dir, "transcriptome", "Homo_sapiens.GRCh38.cdna.all.fa.gz"),
file.path(dir, "transcriptome", "Homo_sapiens.GRCh38.ncrna.fa.gz"))
gtfPath <- file.path(dir, "transcriptome", "Homo_sapiens.GRCh38.98.gtf.gz")

source="Ensembl",
organism="Homo sapiens",
release="98",
genome="GRCh38",
fasta=fastaPath,
gtf=gtfPath,
write=FALSE)


an error occurs:

Error: lexical error: invalid char in json text.
~/utr/transcriptom
(right here) ------^


The GRCh38.index is from salmon:

salmon index -t Homo_sapiens.GRCh38.cdna.all.fa.gz Homo_sapiens.GRCh38.ncrna.fa.gz -i GRCh38.index


I found there are only two json files in the index directory: info.json and versionInfo.json, and there is no file called header.json, which is necessary in the makeLinkedTxome function. Then I just create a header.json file using the information from info.json, and it seems work.

Did I miss some steps in salmon?

rna-seq salmon tximeta • 121 views
modified 4 days ago • written 4 weeks ago by mico0
Answer: Cannot find header.json in salmon index directory
1
4 weeks ago by
Michael Love26k
United States
Michael Love26k wrote:

What version of Salmon are you using?

The latest one, v1.0.0

Thanks for the bug report. I'll push a fix to the release branch today. The information has moved as you said from header.json to info.json in the index and so I'll just need a line to check both places.

BTW, if I push a fix today, it should show up in the release branch by noon tomorrow (or the next day if something goes wrong).

Great, thanks so much Michael!

We had to make one more change (Charlotte spotted the issue), so it will be fixed in v1.4.2, which should be available tomorrow. Relevant PR:

https://github.com/mikelove/tximeta/pull/22

You can also test the solution with:

devtools::install_github("mikelove/tximeta")


Hi Michael, I tried Salmon v1.0.0 with decoy-augmented transcriptomes today and when I imported using tximeta v1.5.6 the same error occurred...

All files listed in the Salmon index directory:

complete_ref_lens.bin
ctable.bin
ctg_offsets.bin
duplicate_clusters.tsv
eqtable.bin
info.json
mphf.bin
pos.bin
pre_indexing.log
rank.bin
refAccumLengths.bin
ref_indexing.log
reflengths.bin
refseq.bin
seq.bin
versionInfo.json

ADD REPLYlink modified 4 days ago • written 4 days ago by mico0

Can you type out makeLinkedTxome in the console, type return and see what the code says?

The first line should be

indexJson <- file.path(indexDir, "info.json")


So I'm confused by it wouldn't be picking it up.

You can also run this line of code to check:

file.exists(file.path(indexDir, "info.json"))


Hi Michael,

I am also using Salmon 1.0.0 and I am having the same issue after installing the version of tximeta from your github repository.

other attached packages: [1] tximeta_1.5.6

The first line of the function still looks for header.json

> makeLinkedTxome
function (indexDir, source, organism, release, genome, fasta,
gtf, write = TRUE, jsonFile)
{
indexList <- fromJSON(file.path(indexDir, "header.json"))
indexSeqHash <- indexList$value0$SeqHash
index <- basename(indexDir)
std.sources <- c("Gencode", "Ensembl")
for (src in std.sources) {
if (tolower(source) == tolower(src)) {
source <- src
}


I actually only came across tximeta in the hopes that it won't crash unexpectedly as tximport is currently doing when trying to summarize lengths of ~80 samples.... :\

ADD REPLYlink modified 4 days ago • written 4 days ago by rbenel0
1

So that first line doesn’t match the code thats in GitHub or Bioconductor, meaning you don’t yet have the fixed version of the code. I wonder what could be going on. Maybe try restarting R?

Re: 80 samples, tximeta calls tximport so no it won’t change performance. Make sure you’ve allocated sufficient memory — 200,000 x 80 x 3 is not a trivial amount of data.

> makeLinkedTxome
function (indexDir, source, organism, release, genome, fasta,
gtf, write = TRUE, jsonFile)
{
indexJson <- file.path(indexDir, "info.json")
if (!file.exists(indexJson)) {
indexJson <- file.path(indexDir, "header.json")
}
indexList <- fromJSON(indexJson)
...


Yes the first line is looking for info.json, and you are right, R needs to restart after installing the latest tximeta then everything is OK... Many thanks!