Question

tximport and stevia, how to build tx2gene without any reference genome

0

Entering edit mode

bahmanik@msu.edu ▴ 60

@bahmanikmsuedu-23146

Last seen 5.3 years ago

Michigan State University

Hi, I'm new in this field, and trying to learn, so any advice would be appreciated. In my RNA seq experiment, I used Salmon to map my reads to a Transcriptome (no genome reference in stevia). Now I have my quant.sf files, that I want to import them to DESeq2 using tximport. I have seen the webpage (https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html), but I am not sure how I am going to get this part below: library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene k <- keys(txdb, keytype = "TXNAME") tx2gene <- select(txdb, k, "GENEID", "TXNAME") Since there is no genome reference in stevia, how this part is going to work for me? Thank you,

deseq2 annotation • 2.3k views

ADD COMMENT • link updated 5.9 years ago by Michael Love 43k • written 5.9 years ago by bahmanik@msu.edu ▴ 60

score 1 · Answer 1 · 2020-03-20

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

Here's a basic answer:

You obtained your reference transcript sequences from somewhere in order to quantify the samples. If that source provides a grouping of transcripts to genes, you use that. If, as it sounds like is your case, there are no groupings of transcripts to genes available from your reference source, then you need to use a computational method to produce that, and then provide that grouping to tximport.

ADD COMMENT • link 5.9 years ago Michael Love 43k

1

Entering edit mode

Or you can perform transcript level analysis, by setting txOut=TRUE

ADD REPLY • link 5.9 years ago Michael Love 43k

0

Entering edit mode

Thank you for the quick and clear answer.

ADD REPLY • link 5.9 years ago bahmanik@msu.edu ▴ 60

0

Entering edit mode

Sorry, one more question; if I can use "txOut=TRUE" in tximport (to perform transcript level analysis), then what is the point in using tximport, I could just input the quant.sf files directly to RStudio and start DESeq2 on the TPMs. In this post "https://support.bioconductor.org/p/84883/" you have said there is no difference between TPM from salmon and TPM from tximport. Thank you,

ADD REPLY • link 5.9 years ago bahmanik@msu.edu ▴ 60

1

Entering edit mode

DESeq2 from tximport will make use of effective transcript lengths. If you use Salmon these would account for eg sample specific GC biases or transcript length biases.

ADD REPLY • link 5.9 years ago Michael Love 43k

0

Entering edit mode

It makes sense then I am going to use tximport (with "txOut=TRUE"), but I am not sure how I am going to define the replications for each sample for tximport? In here "Importing transcript abundance with tximport" it says create a vector of filenames by reading in a table that contains the sample IDs, but doesn't say anything about reps. Thank you,

ADD REPLY • link 5.9 years ago bahmanik@msu.edu ▴ 60

1

Entering edit mode

This isn’t something covered by tximport. Take a look at the DESeq2 vignette though. You need to provide a table of sample information, called colData. And you need to make sure the rows of that table match the order of files given to tximport.

ADD REPLY • link 5.9 years ago Michael Love 43k

0

Entering edit mode

Thanks, I got that part. Now I'm trying to get to the next steps. For that, I made a data frame out of txi$counts: mydata.df <- data.frame(txi$counts), as a matrix for the rest of the process. Then I built colData and CountNoeZero from mydata.df. Then DESeq2 by: dds <- DESeqDataSetFromMatrix(countsNonZero, colData = coldata, design = ~ genotype). Is this a right workflow? Thank you,

ADD REPLY • link 5.9 years ago bahmanik@msu.edu ▴ 60

0

Entering edit mode

You should read over the documentation a bit more.

ADD REPLY • link 5.9 years ago Michael Love 43k

1

Entering edit mode

I think I got the workflow right this time (transcript-level analysis, with no tx2gene):
1: files <- file.path(dir, "salmon", samples$run, "quant.sf")
2: names(files) <- paste0("sample", 1:18)
3: txi <- tximport(files, type = "salmon", txOut=TRUE)
4: rownames(sampleTable) <- colnames(txi$counts)
5: dds <- DESeqDataSetFromTximport(txi, sampleTable, ~genotype)

Thank you,

ADD REPLY • link 5.9 years ago bahmanik@msu.edu ▴ 60