tximport and de novo transcriptome
2
0
Entering edit mode
RMRG ▴ 10
@rmrg-13708
Last seen 6.3 years ago

Hi,

I typically run DE analyses through the trinity pipeline, but due to certain features of the transcriptome I'm currently working on, I've found it's essential to use a kmer value higher than allowed in trinity. I'm not used to the various software for doing DE analyses outside of trinity.

I am working with a de novo assembly from a non-model organism (2 genotypes) and I don't have a high quality genome assembly to go with it. I have carried out transcript-level abundance pseudoalignments with salmon, and I'd like to get gene-level abundances, but I'm not quite sure how to do this. It seems that tximport is commonly used to get this, but from all of the examples I've seen, it seems that it requires a known set of genes, presumably from a sequenced genome project.

Is it possible to do what I want to do with tximport? Or some other program?

Thanks for your help!

 

 

tximport salmon rna seq de novo • 1.7k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 7 days ago
United States

You can set txOut to TRUE to import transcript level only. Or you can make up your own table if you want to do any summarization. tx2gene is just a data.frame.

ADD COMMENT
0
Entering edit mode

Thanks!

But could you direct me to the software I could use to actually do the summarization? I.E. to get 'genes' from transcripts in a similar way to what Trinity does for de novo assemblies?

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 4 days ago
United States

Trinity outputs both the transcript and gene ID, and you can use all the transcripts for each gene, just like you would normally do with a more comprehensive transcriptome/genome. I generally use a two-step approach; I first generate a tx2gene data.frame based on what Trinity says are the transcript/gene combinations, and then import using tximport. The next step is generally to get rid of genes that have consistently low counts (which there will be many, due to Trinity's greedy algorithm - lots of those transcripts aren't real). I then usually go back and use BLAST+ to align the filtered transcripts against some reasonable database of sequences, which invariably results in many transcripts being matched to the same gene. I then use that information to make an updated tx2gene and read the data back in.

That usually gets you to a reasonable set of genes, with many actually having some sort of tenuous annotation from BLAST.

ADD COMMENT
1
Entering edit mode

Oh, and if you are using salmon to do the alignment, particularly a newer version, you will probably want to set the --incompatPrior to some non-zero value. Particularly with de novo transcriptomes it seems that you end up with lots of incompatible libtypes, and if --incompatPrior is set to zero (the default now) you may end up with really low mapping rates.

ADD REPLY

Login before adding your answer.

Traffic: 447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6