Search
Question: tximport of kallisto files
0
gravatar for Clara
6 days ago by
Clara0
Clara0 wrote:

Hello,

I am trying to import transcript abundances (from kallisto .tsv files) to analyze with DESeq2.

My goal is to do a gene-level and a transcript-level analysis of differential expression. In my kallisto files however, I have several transcripts from unidentified genes which does not allow me to directly compare the gene-level with the transcript-level analysis performed in DESeq2 (since the gene-level approach will only use the transcripts with gene IDs but the transcript-level one will use all transcripts).

To address this I manually edited the Kallisto files to remove the transcripts with no gene ID and restrict my analysis. I did this by opening the files in excel as text, deleting the rows, and saving in the original format.

I was able to import these successfully using tximport and run the analysis to completion about 6 and 4 weeks ago, multiple times.

However, now I am running into an error and cannot repeat the analyses.

I am getting the following message:

> txi.kallisto.tsv<-tximport(files, type="kallisto", tx2gene=tx2gene)
reading in files
1 2 Error in Ops.factor(txId, raw[[txIdCol]]) :
  level sets of factors are different
In addition: Warning message:
In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length

I have checked that the .tsv files to be imported all have the same number of rows and columns and that the first column is the same in all.

The tx2gene file also has the same number of rows as the .tsv files and column with the transcript IDs also match.

Has anything changed in tximport recently to account for what I am seeing?

Is there another approach that I could use to achieve the same goal: analyze the data in DESeq2 using the same transcripts with and without gene-level summarization?

Thanks in advance!

Clara

ADD COMMENTlink modified 6 days ago by Michael Love15k • written 6 days ago by Clara0
1
gravatar for Michael Love
6 days ago by
Michael Love15k
United States
Michael Love15k wrote:

I wouldn't recommend to edit the output files of a quantification method. This isn't necessary and I would expect this to lead to errors in downstream software. I believe you can just supply a tx2gene for the transcripts that can be grouped to a gene, and the tximport() function will tell you how many transcripts are being left out. Did you try this? You may need to re-run kallisto so you have the un-edited versions of the output files.

ADD COMMENTlink written 6 days ago by Michael Love15k

Hi,

Thank you for the quick reply! When I supply the tx2gene for the transcripts that can be grouped to a gene,  tximport() does tell me which ones are left out. However, if I want to run the same analysis without grouping at the gene level, but only using the transcripts that could be grouped by gene, how can I exclude the other transcripts that are not identified as a certain gene from the transcript-level analysis?

The idea is to determine if the gene-level approach is showing the same trends than a transcript-level approach, but because the transcript-level approach includes unidentified transcripts, it is not exactly a fair comparison per se.

Thank you,

Clara

ADD REPLYlink modified 5 days ago • written 5 days ago by Clara0
1

You can keep/remove transcripts from the matrices that are imported. E.g. if you have the names of transcripts that are in tx2gene, you can do:

idx <- rownames(txi$counts) %in% tx2gene[,1]
txi.sub <- txi
for (mat in c("abundance","counts","length")) {
  txi.sub[[mat]] <- txi[[mat]][idx,]
}
ADD REPLYlink modified 5 days ago • written 5 days ago by Michael Love15k

Great!! Thank you!

Clara

 

ADD REPLYlink written 5 days ago by Clara0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 418 users visited in the last hour