Cannot find header.json in salmon index directory
1
0
Entering edit mode
mico • 0
@mico-15362
Last seen 9 months ago
United States

I am running tximeta to import the salmon quantification data. When I link to the transcriptome using the following code:

indexDir <- file.path(dir, "transcriptome", "GRCh38.index")
fastaPath <- c(file.path(dir, "transcriptome", "Homo_sapiens.GRCh38.cdna.all.fa.gz"),
               file.path(dir, "transcriptome", "Homo_sapiens.GRCh38.ncrna.fa.gz"))
gtfPath <- file.path(dir, "transcriptome", "Homo_sapiens.GRCh38.98.gtf.gz")

makeLinkedTxome(indexDir=indexDir,
                source="Ensembl",
                organism="Homo sapiens",
                release="98",
                genome="GRCh38",
                fasta=fastaPath,
                gtf=gtfPath,
                write=FALSE)

an error occurs:

Error: lexical error: invalid char in json text.
                                      ~/utr/transcriptom
                    (right here) ------^

The GRCh38.index is from salmon:

salmon index -t Homo_sapiens.GRCh38.cdna.all.fa.gz Homo_sapiens.GRCh38.ncrna.fa.gz -i GRCh38.index

I found there are only two json files in the index directory: info.json and versionInfo.json, and there is no file called header.json, which is necessary in the makeLinkedTxome function. Then I just create a header.json file using the information from info.json, and it seems work.

Did I miss some steps in salmon?

salmon tximeta rna-seq • 3.5k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 20 hours ago
United States

What version of Salmon are you using?

ADD COMMENT
0
Entering edit mode

The latest one, v1.0.0

ADD REPLY
0
Entering edit mode

Thanks for the bug report. I'll push a fix to the release branch today. The information has moved as you said from header.json to info.json in the index and so I'll just need a line to check both places.

ADD REPLY
0
Entering edit mode

BTW, if I push a fix today, it should show up in the release branch by noon tomorrow (or the next day if something goes wrong).

ADD REPLY
0
Entering edit mode

Great, thanks so much Michael!

ADD REPLY
0
Entering edit mode

We had to make one more change (Charlotte spotted the issue), so it will be fixed in v1.4.2, which should be available tomorrow. Relevant PR:

https://github.com/mikelove/tximeta/pull/22

You can also test the solution with:

devtools::install_github("mikelove/tximeta")
ADD REPLY
0
Entering edit mode

Hi Michael, I tried Salmon v1.0.0 with decoy-augmented transcriptomes today and when I imported using tximeta v1.5.6 the same error occurred...

All files listed in the Salmon index directory:

complete_ref_lens.bin
ctable.bin
ctg_offsets.bin
duplicate_clusters.tsv
eqtable.bin
info.json
mphf.bin
pos.bin
pre_indexing.log
rank.bin
refAccumLengths.bin
ref_indexing.log
reflengths.bin
refseq.bin
seq.bin
versionInfo.json
ADD REPLY
0
Entering edit mode

Can you type out makeLinkedTxome in the console, type return and see what the code says?

The first line should be

indexJson <- file.path(indexDir, "info.json")

So I'm confused by it wouldn't be picking it up.

You can also run this line of code to check:

file.exists(file.path(indexDir, "info.json"))
ADD REPLY
0
Entering edit mode

Hi Michael,

I am also using Salmon 1.0.0 and I am having the same issue after installing the version of tximeta from your github repository.

other attached packages: [1] tximeta_1.5.6

The first line of the function still looks for header.json

> makeLinkedTxome
function (indexDir, source, organism, release, genome, fasta, 
    gtf, write = TRUE, jsonFile) 
{
    indexList <- fromJSON(file.path(indexDir, "header.json"))
    indexSeqHash <- indexList$value0$SeqHash
    index <- basename(indexDir)
    std.sources <- c("Gencode", "Ensembl")
    for (src in std.sources) {
        if (tolower(source) == tolower(src)) {
            source <- src
        }

I actually only came across tximeta in the hopes that it won't crash unexpectedly as tximport is currently doing when trying to summarize lengths of ~80 samples.... :\

ADD REPLY
1
Entering edit mode

So that first line doesn’t match the code thats in GitHub or Bioconductor, meaning you don’t yet have the fixed version of the code. I wonder what could be going on. Maybe try restarting R?

Re: 80 samples, tximeta calls tximport so no it won’t change performance. Make sure you’ve allocated sufficient memory — 200,000 x 80 x 3 is not a trivial amount of data.

ADD REPLY
0
Entering edit mode
> makeLinkedTxome
function (indexDir, source, organism, release, genome, fasta, 
    gtf, write = TRUE, jsonFile) 
{
    indexJson <- file.path(indexDir, "info.json")
    if (!file.exists(indexJson)) {
        indexJson <- file.path(indexDir, "header.json")
    }
    indexList <- fromJSON(indexJson)
...

Yes the first line is looking for info.json, and you are right, R needs to restart after installing the latest tximeta then everything is OK... Many thanks!

ADD REPLY

Login before adding your answer.

Traffic: 473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6