Question

how to match salmon output to txname

0

Entering edit mode

camilia.savitri • 0

@6db15c42

Last seen 20 months ago

Japan

Hello. I am new to RNAseq analysis and I have been finding a hard time to solve the mismatch between my salmon output to tx2gene 'txname'

I use salmon alignment-based mode, with transcript fasta file extracted via the gff3 file downloaded from NCBI

#command used for generating transcript :
gffread -w transcripts.fasta -g tn2-sequence.fasta tn2-sequence.gff3

#salmon output looks like this 
Name    Length  EffectiveLength TPM NumReads
gene-DO80_00010 417 167.000 53.992327   7.000
gene-DO80_00020 471 221.000 17.485556   3.000

I was preparing the data for DESeq2, hence for tximport I need to make tx2gene file, in which I make with the code below :

library(GenomicFeatures)
gff_file <- "tn2-sequence.gff3"
file.exists(gff_file)
txdb <- makeTxDbFromGFF(gff_file)
keytypes(txdb)
columns(txdb)

#gene names to transcript only
k <- keys(txdb, keytype="TXNAME")
tx_map <- AnnotationDbi::select(txdb, keys = k, 
                                columns="GENEID", keytype = "TXNAME")
view(tx_map) 
tx2gene <- tx_map
write.csv(tx2gene,file="tx2gene.csv",row.names = FALSE,quote=FALSE)
view (tx2gene)

It gives me the input as such :

TXNAME     GENEID
1 DO80_00050 DO80_00050
2      panC DO80_00060

Due to this mismatch, I cannot move on to the downstream analysis. Do you have any solution to this? Which one should I modify and how? Thank you

GenomicFeatures tximport salmon • 1.3k views

ADD COMMENT • link written 2.2 years ago by camilia.savitri • 0

score 0 · Answer 1 · 2023-01-12

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

You just need a file that lists the transcript names that you used for quantification and then the gene they are associated with. Maybe you can find someone to consult with on how to do this for your particular dataset.

ADD COMMENT • link 2.2 years ago Michael Love 43k