error loading kallisto output in tximport
1
0
Entering edit mode
cta • 0
@31009d02
Last seen 2.0 years ago
Denmark

Hi All,

I am struggling with importing kallisto output into tximport in order to afterwards use DESeq2 for differential expression at gene level. I just reinstalled R, Bioconductor, tximport, tximportData, as I saw in some of the previous posts this solved the issues. I simply can't figure out why the code below works for salmon but not for kallisto... For my own data, I only have kallisto output and I get the same result with file.exists, 'FALSE', the files are not visible. Please see the code and output below.

sessioninfo()

library(tximportData)

library(tximport)

dir <- system.file("extdata", package = "tximportData")

list.files(dir)

[1] "alevin" "cufflinks" "kallisto"
[4] "kallisto_boot" "refseq" "rsem"
[7] "sailfish" "salmon" "salmon_dm"
[10] "salmon_ec" "salmon_gibbs" "samples.txt"
[13] "samples_extended.txt" "tx2gene.csv" "tx2gene.ensembl.v87.csv" [16] "tx2gene.gencode.v27.csv" "tx2gene_alevin.tsv"

samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)

samples

pop center assay sample experiment run

1 TSI UNIGE NA20503.1.M_111124_5 ERS185497 ERX163094 ERR188297

2 TSI UNIGE NA20504.1.M_111124_7 ERS185242 ERX162972 ERR188088

3 TSI UNIGE NA20505.1.M_111124_6 ERS185048 ERX163009 ERR188329

4 TSI UNIGE NA20507.1.M_111124_7 ERS185412 ERX163158 ERR188288

5 TSI UNIGE NA20508.1.M_111124_2 ERS185362 ERX163159 ERR188021

6 TSI UNIGE NA20514.1.M_111124_4 ERS185217 ERX163062 ERR188356

files <- file.path(dir, "salmon", samples$run, "quant.sf.gz")

names(files) <- paste0("sample", 1:6)

all(file.exists(files))

[1] TRUE

files <- file.path(dir, "kallisto", samples$run, "abundance.tsv")

names(files) <- paste0("sample", 1:6)

all(file.exists(files))

[1] FALSE

list.files(dir, recursive = TRUE) shows all the kallisto files are present.

Any help is highly appreciated. Thank you.

file.exists false kallisto tximport • 1.0k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 44 minutes ago
United States

Presumably you aren't putting your files in the extdata subdir of the tximportData package, which if you are, don't do that. Also, if you are using kallisto data, use the HDF5 file instead. Here's how to get the example data from tximportData.

> library(tximportData)
> thedir <- system.file("extdata", package = "tximportData")
> samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
> dirs <- paste0(thedir, "/kallisto/", samples$run)
> thefiles <- sapply(dirs, dir, pattern = "tsv", full.names = TRUE)
> length(thefiles)
[1] 6
> all(file.exists(thefiles))
[1] TRUE

## but use the h5 files
> thebetterfiles <- sapply(dirs, dir, pattern = "h5$", full.names = TRUE)
> all(file.exists(thebetterfiles))
[1] TRUE
ADD COMMENT
0
Entering edit mode

Hi James,

Thanks a million!!

I very much appreciate your time and help. The issue was indeed the "don't do that" part... Your code fixed this, it works perfect and I just managed to adapt it to process my data.

Thank you,

Anna

ADD REPLY

Login before adding your answer.

Traffic: 903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6