Hello. I am new to RNAseq analysis and I have been finding a hard time to solve the mismatch between my salmon output to tx2gene 'txname'
I use salmon alignment-based mode, with transcript fasta file extracted via the gff3 file downloaded from NCBI
#command used for generating transcript : gffread -w transcripts.fasta -g tn2-sequence.fasta tn2-sequence.gff3 #salmon output looks like this Name Length EffectiveLength TPM NumReads gene-DO80_00010 417 167.000 53.992327 7.000 gene-DO80_00020 471 221.000 17.485556 3.000
I was preparing the data for DESeq2, hence for tximport I need to make tx2gene file, in which I make with the code below :
library(GenomicFeatures) gff_file <- "tn2-sequence.gff3" file.exists(gff_file) txdb <- makeTxDbFromGFF(gff_file) keytypes(txdb) columns(txdb) #gene names to transcript only k <- keys(txdb, keytype="TXNAME") tx_map <- AnnotationDbi::select(txdb, keys = k, columns="GENEID", keytype = "TXNAME") view(tx_map) tx2gene <- tx_map write.csv(tx2gene,file="tx2gene.csv",row.names = FALSE,quote=FALSE) view (tx2gene)
It gives me the input as such :
TXNAME GENEID 1 DO80_00050 DO80_00050 2 panC DO80_00060
Due to this mismatch, I cannot move on to the downstream analysis. Do you have any solution to this? Which one should I modify and how? Thank you