Tximport and pseudo alignment with Kallisto
Entering edit mode
Mozart ▴ 20
Last seen 22 months ago

Hi there, I am using Kallisto to generate counts in my RNA-seq experiments. Since in the downstream analysis I am preferring DESeq2, I have to use tximport for importing transcript abundances in order to perform differential expression analysis. I am slavishly following the code used in the tximport package documentation and, so far, I have never had any problems with that.

By the way, I noticed that when doing the pseudo alignment in Kallisto I can generate counts either by running all of paired samples at once

kallisto quant -i index -o output pairA_1.fastq pairA_2.fastq pairB_1.fastq pairB_2.fastq

or by running each pair at time

kallisto quant -i index -o output pairA_1.fastq pairA_2.fastq

And this is the crucial point with tximport because usually at the end of Kallisto run, I ended up with an amount of sample folders that was equal to the number of my samples in the experiment (ie 6 folders for 6 samples). This, allowed me to use the following:

files <- file.path(dir, "kallisto", samples$run, "abundance.tsv") 
names(files) <- paste0("sample", 1:6) 
txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreAfterBar = TRUE)

But I am not able to use tximport if I want to run all of my samples at once (thus, generating just 1 abundance.tsv file). Given the fact, I presume that either way the pseudoalignement is identical (and probably because this would be much easier for Sleuth users), I would stick with the method I mentioned earlier, just to make my life easier and ease the usage of tximport.

But I am just seeking confirmation of this, guys.

PS: sorry for posting such a borderline topic, it is very difficult for me to hit the ground with this technique and any relevant opinions are more than welcomed.

PPS: in the kallisto documentation they also mention as an important note that one should 'only supply one sample at a time to kallisto. The multiple FASTQ (pair) option is for users who have samples that span multiple FASTQ files.'

kallisto tximport deseq2 • 786 views
Entering edit mode
Last seen 3 days ago
United States

Your first line of code above is mistakenly collapsing multiple biological replicates into a single sample. As you say in your question the option to give multiple input files is for usage with technical replicates only.


Login before adding your answer.

Traffic: 205 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6