Kallisto output with DEXSeq
1
0
Entering edit mode
j2h2k2 • 0
@j2h2k2-17664
Last seen 4.0 years ago

Hello,

I am trying to use the Kallisto aligner to run differential exon usage on my dataset. I want to run DEXSeq but am running into an error. Kallisto's GenomeBam does not include an NH tag so DEXSeq's dexseq_counts.py command refuses to run. When trying with Kallisto's PseudoBam the script reports them all as empty, likely due to the coordinates not matching up as PseudoBam doesn't project to genomic coordinates. I was wondering if there is some work around or changing dexseq_counts.py to not require the NH tag?

Thank you.

dexseq kallisto NH_tag • 1.3k views
2
Entering edit mode
Alejandro Reyes ★ 1.8k
@alejandro-reyes-5124
Last seen 5 weeks ago
Novartis Institutes for BioMedical Rese…

Hi! I have not personally try something like that, but I know that others have use the kallisto output as input to DEXSeq. See for example these two papers from Mark Robinson's lab:

1. https://f1000research.com/articles/5-1356/v1
2. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0862-3

However, these papers use transcript-level counts inferred from kallisto directly into DEXSeq. This is an important distinction from what you are trying to do. The script dexseq_count.py is designed for assigning reads mapped to the reference genome into exon regions. I doubt that it will work for counting reads per exonic regions starting from pseudo-alignments to reference transcriptomes. I am actually not sure if it is possible to extract exon-level counts from the output of kallisto.

Best,
Alejandro

0
Entering edit mode

Thank you for your reply. That makes sense to me as I was thinking that kallisto's program seems to do the job of dexseq_counts.py. The papers you attached do mention using DEXSeq for some transcript analysis, however I do not seem to be able to get the DEXSeqDataSet to run. Would you happen to know which of the DEXSeqDataSet commands might take kallisto output (and in what form) and make it a DEXSeq object to be able to run through the DEXSeq command?

Thank you

2
Entering edit mode
One option is to import the kallisto output into R with tximport (set txOut=TRUE to get the transcript-level count matrix). Then you can use DEXSeqDataSet() to create the DEXSeqDataSet object. You'll need to provide the featureID (which would be the rownames of the count matrix) and the groupID (which would be the corresponding gene for each transcript).
0
Entering edit mode

So I have my tximport object from when I did DESeq2, and tried to run it through DEXSeqDataSet. The command said that it was a list not a dataframe/matrix so I pulled out just the counts that are in a matrix. However now I am getting an error that not all values in the assay are integers. Do you know if there is a way around this? all my attempts to change the numbers to integers just converts the matrix to an integer and leaves it useless and unable to convert back into a matrix.

Thank you.

2
Entering edit mode

You can do round(counts) to get a matrix of integers. Also note that if you previously did differential gene expression analysis with DESeq2, your count matrix is probably on the gene level. For DEXSeq, you need a transcript-level count matrix (i.e., by setting txOut=TRUE in the tximport call).

0
Entering edit mode

I was able to get the command to work and got DEXseq to run for me.

Thank you very much.