Question

Tximport and pseudo alignment with Kallisto

0

Entering edit mode

Mozart ▴ 30

@mozart-20625

Last seen 3.6 years ago

Hi there, I am using Kallisto to generate counts in my RNA-seq experiments. Since in the downstream analysis I am preferring DESeq2, I have to use tximport for importing transcript abundances in order to perform differential expression analysis. I am slavishly following the code used in the tximport package documentation and, so far, I have never had any problems with that.

By the way, I noticed that when doing the pseudo alignment in Kallisto I can generate counts either by running all of paired samples at once

kallisto quant -i index -o output pairA_1.fastq pairA_2.fastq pairB_1.fastq pairB_2.fastq

or by running each pair at time

kallisto quant -i index -o output pairA_1.fastq pairA_2.fastq

And this is the crucial point with tximport because usually at the end of Kallisto run, I ended up with an amount of sample folders that was equal to the number of my samples in the experiment (ie 6 folders for 6 samples). This, allowed me to use the following:

files <- file.path(dir, "kallisto", samples$run, "abundance.tsv") 
names(files) <- paste0("sample", 1:6) 
txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreAfterBar = TRUE)

But I am not able to use tximport if I want to run all of my samples at once (thus, generating just 1 abundance.tsv file). Given the fact, I presume that either way the pseudoalignement is identical (and probably because this would be much easier for Sleuth users), I would stick with the method I mentioned earlier, just to make my life easier and ease the usage of tximport.

But I am just seeking confirmation of this, guys.

PS: sorry for posting such a borderline topic, it is very difficult for me to hit the ground with this technique and any relevant opinions are more than welcomed.

PPS: in the kallisto documentation they also mention as an important note that one should 'only supply one sample at a time to kallisto. The multiple FASTQ (pair) option is for users who have samples that span multiple FASTQ files.'

kallisto tximport deseq2 • 1.7k views

ADD COMMENT • link updated 4.9 years ago by Michael Love 41k • written 4.9 years ago by Mozart ▴ 30

score 3 · Accepted Answer · 2019-05-15

3

Entering edit mode

Michael Love 41k

@mikelove

Last seen 7 hours ago

United States

Your first line of code above is mistakenly collapsing multiple biological replicates into a single sample. As you say in your question the option to give multiple input files is for usage with technical replicates only.

ADD COMMENT • link 4.9 years ago Michael Love 41k