Hello. I am new to RNAseq analysis and I have been finding a hard time to solve the mismatch between my salmon output to tx2gene 'txname'
I use salmon alignment-based mode, with transcript fasta file extracted via the gff3 file downloaded from NCBI
#command used for generating transcript :
gffread -w transcripts.fasta -g tn2-sequence.fasta tn2-sequence.gff3
#salmon output looks like this
Name Length EffectiveLength TPM NumReads
gene-DO80_00010 417 167.000 53.992327 7.000
gene-DO80_00020 471 221.000 17.485556 3.000
I was preparing the data for DESeq2, hence for tximport I need to make tx2gene file, in which I make with the code below :
library(GenomicFeatures)
gff_file <- "tn2-sequence.gff3"
file.exists(gff_file)
txdb <- makeTxDbFromGFF(gff_file)
keytypes(txdb)
columns(txdb)
#gene names to transcript only
k <- keys(txdb, keytype="TXNAME")
tx_map <- AnnotationDbi::select(txdb, keys = k,
columns="GENEID", keytype = "TXNAME")
view(tx_map)
tx2gene <- tx_map
write.csv(tx2gene,file="tx2gene.csv",row.names = FALSE,quote=FALSE)
view (tx2gene)
It gives me the input as such :
TXNAME GENEID
1 DO80_00050 DO80_00050
2 panC DO80_00060
Due to this mismatch, I cannot move on to the downstream analysis. Do you have any solution to this? Which one should I modify and how? Thank you