Problem with Tximport to DESeq2
1
0
Entering edit mode
zli1 • 0
@cd41d80f
Last seen 3 days ago
United States

Hi all,

I have some raw RNA seq data from mouse that I would like to use DESeq2 for analysis. After using Salmon for quantification, I used tximport and DESeqDataSetFromTximport to create the data object, but the count matrix I obtained only have one row per sample, which is a value corresponding to protein-encoding.

I am not sure whether this is because the reference transcriptome I used for Salmon indexing is downloaded from http://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/cdna/, which should be GRCm38. However, the id names does not match those in the R package TxDb.Mmusculus.UCSC. The id names I obtained starts with "GENSCAN0000000000", and when I tried to convert it to tx2gene object, the result I obtained only has two columns, one is the id name and the other is the string 'protein encoding'. Below is the codes I used to create the tx2gene object.

gunzip -c Mus.GRCm38.cdna.all.fa.gz | grep '>' | cut -d ' ' -f1,4,7 > temp

paste <(cut -d '>' -f2 temp | cut -d ' ' -f1) <(cut -d ' ' -f2 temp | cut -d ':' -f2) <(cut -d ' ' -f3 temp | cut -d ':' -f2) >> tx2gene.txt

I am wondering whether there is any way to fix the issue or whether there is another fasta file I should use for Salmon index. Any help would be greatly appreciated!

TxDb.Mmusculus.UCSC.mm10.ensGene DESeq2 tximport • 138 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 15 minutes ago
United States

It was common for users with tximport to have difficulty putting together the correct table for gene summarization, which is why we created tximeta which does this for you. Magic!

It works for GENCODE, Ensembl and RefSeq for human, mouse and fly. I'd recommend starting there first.

ADD COMMENT
0
Entering edit mode

Thank you for the reply! I tried to run se <- tximeta(coldata) but encountered an error saying that Error in nchar(x) : invalid multibyte string, element 4. Is there a way to fix the problem? In addition, I'm new in biostatistics, and I'm wondering whether there is any recommended reference transcriptome for mouse RNA seq data? Thank you again for your help!

ADD REPLY
0
Entering edit mode

I prefer GENCODE for mouse and human. They do a good job providing relevant files, documenting versions, and keeping permalinks.

Re: your error, I can't help you much without you showing me what coldata looks like.

ADD REPLY
0
Entering edit mode

Below is what my coldata looks like. I also attach my session info below.

enter image description here

R version 4.0.4 (2021-02-15) Platform: x86_64-apple-darwin17.0 (64-bit)

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale: 1 en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

ADD REPLY
0
Entering edit mode

Hmm, my first guess is whether those files are readable by tximport? Are those the quant.sf files?

ADD REPLY
0
Entering edit mode

They are quant.sf files. It turned out I solved the problem by using transcripts downloaded from GENCODE. Thank you for your help!!

ADD REPLY

Login before adding your answer.

Traffic: 463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6