Question

Problem with tximport IDs

0

Entering edit mode

Maka • 0

@54d5a2c1

Last seen 7 months ago

Germany

Dear all,

I am a newbie in this field and I am trying to solve the following issue: when I run tximport, I do not manage to get rid of transcript variants in my counts (ENSMUSG00000000001.5) Hereafter is the code I am running

library(GenomicFeatures)

txdb <- makeTxDbFromGFF("gencode.vM27.annotation.gtf.gz")

keytypes(txdb)

k <- keys(txdb, keytype = "TXNAME")
tx2gene <- select(txdb, k, "GENEID", "TXNAME")

> head(tx2gene)
                TXNAME               GENEID
1 ENSMUST00000193812.2 ENSMUSG00000102693.2
2 ENSMUST00000082908.3 ENSMUSG00000064842.3
3 ENSMUST00000192857.2 ENSMUSG00000102851.2
4 ENSMUST00000161581.2 ENSMUSG00000089699.2
5 ENSMUST00000192183.2 ENSMUSG00000103147.2
6 ENSMUST00000193244.2 ENSMUSG00000102348.2

library(tximport)
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE)

> head(txi$counts)
                      21L006446 21L006447  21L006449 21L006450 21L006451  21L006452
ENSMUSG00000000001.5    315.138   334.248    294.000   319.226   254.199    434.647
ENSMUSG00000000003.16     0.000     0.000      0.000     0.000     0.000      0.000
ENSMUSG00000000028.16    87.004   110.044    134.774    66.000    46.017    182.001
ENSMUSG00000000031.17  7870.144 49568.878 130934.525  4780.285  3096.264 133446.737
ENSMUSG00000000037.18     1.000     0.000      1.000     0.000     1.000      1.000
ENSMUSG00000000049.12    10.000     7.000      9.000    11.000     6.000      6.000

# I get the transcript variant, so I tried to run this other one

library(tximport)
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreTxVersion = TRUE)

# but I get this error message 

> txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreTxVersion = TRUE)
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
Error in .local(object, ...) : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Example IDs (file): [ENSMUST00000193812, ENSMUST00000082908, ENSMUST00000162897, ...]

Example IDs (tx2gene): [ENSMUST00000193812.2, ENSMUST00000082908.3, ENSMUST00000192857.2, ...]

  This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.

#I also checked my files input (quant.sf) and they have this ENSMUST00000193812.2 ID

Do you have any suggestions? Or is there any mistake I do not see? Thank you a lot

tximport tx2gene rnas DESeq2 RNASeqData • 1.2k views

ADD COMMENT • link 23 months ago Maka • 0

score 1 · Answer 1 · 2022-05-03

1

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 4 hours ago

Germany

If it's just stripping the version from the obtained gene level count table then use something like gsub("\\..*", "", rownames(txi$counts)) to remove them. Leave the tximport code as-is, it is correct (the first, working one). You might want to edit the colnames, as names starting with a number is not "valid" R names so downstream tools might change the colnames into valid ones automatically, just that you are aware of that.

ADD COMMENT • link 23 months ago ATpoint ★ 4.0k

1

Entering edit mode

Agree with ATpoint.

If you want to remove the version information, do that after tximport. If you are heading towards building a DESeqDataSet of DGEList, etc, you should wait until after you've built that object, and then use the code ATpoint provided on the rows of the object.

rownames(x) <- gsub("\\..*", "", rownames(x))

...this will replace "<period><any other characters>" with "" (empty string). You then take that new string and assign it to the rownames of the object x, e.g. that could be your dds.

ADD REPLY • link 23 months ago Michael Love 41k

0

Entering edit mode

Thank both of you! Now it is much clear. I'll keep the version information for my DESeqDataSet and, change it later on.

ADD REPLY • link 23 months ago Maka • 0