Question

Having a hard time creating a Txi object from an existing Salmon output file to convert transcript file to gene id TPM

0

Entering edit mode

slouis • 0

@slouis-23365

Last seen 4.8 years ago

Hello everyone. I have been trying to follow a number of walk throughs on converting a salmon transcript TPM data frame into its gene counts. I think the issue is in the following line:

samples <- read.table("samples.txt", header = TRUE) files <- file.path("quant", samples$sample, "quant.sf") names(files) <- paste0(samples$sample) txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene)

I have successfully created the tx2gene file just using the EnsDb.Hsapiens.v86 package and the transcripts function seen below:

head(tx2gene)

DataFrame with 6 rows and 2 columns txid geneid <character> <character> 1 ENST00000000233 ENSG00000004059 2 ENST00000000412 ENSG00000003056 3 ENST00000000442 ENSG00000173153 4 ENST00000001008 ENSG00000004478 5 ENST00000001146 ENSG00000003137 6 ENST00000002125 ENSG00000003509

However, the salmon file that I have looks like this pasted below, as opposed to multiple separate files that have transcript TPM counts.

head(salmon_output)
            X sample1  sample2 sample3  sample4  sample5  sample6  sample7  sample8

1 ENST00000000233 16.28690 20.910300 6.85988 4.889860 8.908700 0.000000 2.223280 8.952680 2 ENST00000000412 9.96427 12.695700 31.22860 37.437700 36.617700 16.729400 34.906800 30.086900 3 ENST00000000442 1.23997 0.847996 0.00000 0.000000 0.889783 0.321261 0.451316 0.000000 4 ENST00000001008 9.23394 11.012300 18.04590 20.538000 17.430500 10.035800 17.035700 17.519500 5 ENST00000001146 1.04069 1.508500 0.00000 0.165007 0.201487 0.000000 0.000000 0.390072 6 ENST00000002125 1.09016 2.310980 3.41563 8.428720 5.931020 4.875550 4.959320 5.771440

As you can see the sample names per column already exist in the file I was given, and the transcript ensemble ID's make up the first column so the "files <- file.path" and tximport(files) commands all fail for me. Not sure how to work around this to make a txi object to turn the transcript names into gene names and summarise the counts in a proper fashion.

Appreciate any help! thank you!

Salmon tximport tximportdata ensembl • 1.5k views

ADD COMMENT • link updated 4.8 years ago by Michael Love 43k • written 4.8 years ago by slouis • 0

score 0 · Answer 1 · 2020-04-16

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 hours ago

United States

Just to clarify, this is not a Salmon output, but some file with some information in it (not sure the content exactly) that has been created by someone else and provided to you. So tximport and the downstream pipelines from the tximport vignette can't be used here.

ADD COMMENT • link 4.8 years ago Michael Love 43k

0

Entering edit mode

In that case, how does one go about summarizing the files to gene counts if that is the output file that I have currently. I was told from our bioinformatics core that they simply merged the TPM counts from salmon into one file with the samples and transcript IDs.

The pipeline that was used was

FastQC --> trimmonic --> salmon index using GENCODE GRCH38.p13 transcriptome --> Salmon Quant into TPM

and then I was given a merged file.

ADD REPLY • link 4.8 years ago slouis • 0

0

Entering edit mode

You need more data. Rather than me describe in words how the software works, you could instead ask the core for the Salmon output, or have them run this for you.

ADD REPLY • link 4.8 years ago Michael Love 43k

0

Entering edit mode

Yes, I did try asking them how they convert their files but they said that was the salmon output file. Will try asking again then. Thanks

ADD REPLY • link 4.8 years ago slouis • 0