Question: Cannot find header.json in salmon index directory
0
gravatar for mico
4 weeks ago by
mico0
mico0 wrote:

I am running tximeta to import the salmon quantification data. When I link to the transcriptome using the following code:

indexDir <- file.path(dir, "transcriptome", "GRCh38.index")
fastaPath <- c(file.path(dir, "transcriptome", "Homo_sapiens.GRCh38.cdna.all.fa.gz"),
               file.path(dir, "transcriptome", "Homo_sapiens.GRCh38.ncrna.fa.gz"))
gtfPath <- file.path(dir, "transcriptome", "Homo_sapiens.GRCh38.98.gtf.gz")

makeLinkedTxome(indexDir=indexDir,
                source="Ensembl",
                organism="Homo sapiens",
                release="98",
                genome="GRCh38",
                fasta=fastaPath,
                gtf=gtfPath,
                write=FALSE)

an error occurs:

Error: lexical error: invalid char in json text.
                                      ~/utr/transcriptom
                    (right here) ------^

The GRCh38.index is from salmon:

salmon index -t Homo_sapiens.GRCh38.cdna.all.fa.gz Homo_sapiens.GRCh38.ncrna.fa.gz -i GRCh38.index

I found there are only two json files in the index directory: info.json and versionInfo.json, and there is no file called header.json, which is necessary in the makeLinkedTxome function. Then I just create a header.json file using the information from info.json, and it seems work.

Did I miss some steps in salmon?

rna-seq salmon tximeta • 121 views
ADD COMMENTlink modified 4 days ago • written 4 weeks ago by mico0
Answer: Cannot find header.json in salmon index directory
1
gravatar for Michael Love
4 weeks ago by
Michael Love26k
United States
Michael Love26k wrote:

What version of Salmon are you using?

ADD COMMENTlink written 4 weeks ago by Michael Love26k

The latest one, v1.0.0

ADD REPLYlink written 4 weeks ago by mico0

Thanks for the bug report. I'll push a fix to the release branch today. The information has moved as you said from header.json to info.json in the index and so I'll just need a line to check both places.

ADD REPLYlink written 4 weeks ago by Michael Love26k

BTW, if I push a fix today, it should show up in the release branch by noon tomorrow (or the next day if something goes wrong).

ADD REPLYlink written 4 weeks ago by Michael Love26k

Great, thanks so much Michael!

ADD REPLYlink written 4 weeks ago by mico0

We had to make one more change (Charlotte spotted the issue), so it will be fixed in v1.4.2, which should be available tomorrow. Relevant PR:

https://github.com/mikelove/tximeta/pull/22

You can also test the solution with:

devtools::install_github("mikelove/tximeta")
ADD REPLYlink written 28 days ago by Michael Love26k

Hi Michael, I tried Salmon v1.0.0 with decoy-augmented transcriptomes today and when I imported using tximeta v1.5.6 the same error occurred...

All files listed in the Salmon index directory:

complete_ref_lens.bin
ctable.bin
ctg_offsets.bin
duplicate_clusters.tsv
eqtable.bin
info.json
mphf.bin
pos.bin
pre_indexing.log
rank.bin
refAccumLengths.bin
ref_indexing.log
reflengths.bin
refseq.bin
seq.bin
versionInfo.json
ADD REPLYlink modified 4 days ago • written 4 days ago by mico0

Can you type out makeLinkedTxome in the console, type return and see what the code says?

The first line should be

indexJson <- file.path(indexDir, "info.json")

So I'm confused by it wouldn't be picking it up.

You can also run this line of code to check:

file.exists(file.path(indexDir, "info.json"))
ADD REPLYlink written 4 days ago by Michael Love26k

Hi Michael,

I am also using Salmon 1.0.0 and I am having the same issue after installing the version of tximeta from your github repository.

other attached packages: [1] tximeta_1.5.6

The first line of the function still looks for header.json

> makeLinkedTxome
function (indexDir, source, organism, release, genome, fasta, 
    gtf, write = TRUE, jsonFile) 
{
    indexList <- fromJSON(file.path(indexDir, "header.json"))
    indexSeqHash <- indexList$value0$SeqHash
    index <- basename(indexDir)
    std.sources <- c("Gencode", "Ensembl")
    for (src in std.sources) {
        if (tolower(source) == tolower(src)) {
            source <- src
        }

I actually only came across tximeta in the hopes that it won't crash unexpectedly as tximport is currently doing when trying to summarize lengths of ~80 samples.... :\

ADD REPLYlink modified 4 days ago • written 4 days ago by rbenel0
1

So that first line doesn’t match the code thats in GitHub or Bioconductor, meaning you don’t yet have the fixed version of the code. I wonder what could be going on. Maybe try restarting R?

Re: 80 samples, tximeta calls tximport so no it won’t change performance. Make sure you’ve allocated sufficient memory — 200,000 x 80 x 3 is not a trivial amount of data.

ADD REPLYlink written 4 days ago by Michael Love26k
> makeLinkedTxome
function (indexDir, source, organism, release, genome, fasta, 
    gtf, write = TRUE, jsonFile) 
{
    indexJson <- file.path(indexDir, "info.json")
    if (!file.exists(indexJson)) {
        indexJson <- file.path(indexDir, "header.json")
    }
    indexList <- fromJSON(indexJson)
...

Yes the first line is looking for info.json, and you are right, R needs to restart after installing the latest tximeta then everything is OK... Many thanks!

ADD REPLYlink written 4 days ago by mico0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 140 users visited in the last hour