Question

tximport and de novo transcriptome

0

Entering edit mode

RMRG ▴ 10

@rmrg-13708

Last seen 7.3 years ago

Hi,

I typically run DE analyses through the trinity pipeline, but due to certain features of the transcriptome I'm currently working on, I've found it's essential to use a kmer value higher than allowed in trinity. I'm not used to the various software for doing DE analyses outside of trinity.

I am working with a de novo assembly from a non-model organism (2 genotypes) and I don't have a high quality genome assembly to go with it. I have carried out transcript-level abundance pseudoalignments with salmon, and I'd like to get gene-level abundances, but I'm not quite sure how to do this. It seems that tximport is commonly used to get this, but from all of the examples I've seen, it seems that it requires a known set of genes, presumably from a sequenced genome project.

Is it possible to do what I want to do with tximport? Or some other program?

Thanks for your help!

tximport salmon rna seq de novo • 2.2k views

ADD COMMENT • link updated 7.3 years ago by James W. MacDonald 68k • written 7.3 years ago by RMRG ▴ 10

score 1 · Answer 1 · 2018-09-27

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

You can set txOut to TRUE to import transcript level only. Or you can make up your own table if you want to do any summarization. tx2gene is just a data.frame.

ADD COMMENT • link 7.3 years ago Michael Love 43k

0

Entering edit mode

Thanks!

But could you direct me to the software I could use to actually do the summarization? I.E. to get 'genes' from transcripts in a similar way to what Trinity does for de novo assemblies?

ADD REPLY • link 7.3 years ago RMRG ▴ 10

score 1 · Answer 2 · 2018-09-27

Trinity outputs both the transcript and gene ID, and you can use all the transcripts for each gene, just like you would normally do with a more comprehensive transcriptome/genome. I generally use a two-step approach; I first generate a tx2gene data.frame based on what Trinity says are the transcript/gene combinations, and then import using tximport. The next step is generally to get rid of genes that have consistently low counts (which there will be many, due to Trinity's greedy algorithm - lots of those transcripts aren't real). I then usually go back and use BLAST+ to align the filtered transcripts against some reasonable database of sequences, which invariably results in many transcripts being matched to the same gene. I then use that information to make an updated tx2gene and read the data back in.

That usually gets you to a reasonable set of genes, with many actually having some sort of tenuous annotation from BLAST.