tximport gene name
2
1
Entering edit mode
tanyabioinfo ▴ 20
@tanyabioinfo-14091
Last seen 5.2 years ago

Hi I am doing the following to get the tximport count  matrix with gene name in the first column

txdf <- transcripts(EnsDb.Mmusculus.v79, return.type = "DataFrame")
txdf$symbol <- mapIds(EnsDb.Mmusculus.v79, txdf$gene_id, "GENENAME", "GENEID")
tx2gene <- as.data.frame(txdf[,c("tx_id","symbol")])

txi <- tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion=TRUE,dropInfReps=TRUE)

 

However when I do head(txi$abundance)

                 0 wt      0 wt     0 wt     0 wt     6 wt     6 wt     6 wt
              71.50353 112.29713 73.64570 73.13216 60.17879 56.01880 57.25439
0610007P14Rik  0.00000  16.73136 69.46050 60.45882 86.66511 27.10330 48.84700
0610009B22Rik  0.00000  16.34480 29.00857 26.11050  0.00000 18.28440 25.29169

I am getting an extra row at the top. Can someone help me to rectify this or let me know if I am doing anything wrong.

Tanya

tximport • 1.9k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

I believe that’s not a row of data, it’s the column names. Check what is in position [1,1] if you want to see the data values alone.

ADD COMMENT
0
Entering edit mode

I believe the "extra row" she means is the one under the column names. 
71.50353 112.29713 73.64570 73.13216 60.17879 56.01880 57.25439

I did the same analysis with the same annotation this week and also have a row with no rowname.  I was worried there might be an off by one error somewhere, but my results look similar to those from another tool so that's probably not the case.  Just one tx_id that doesn't have a corresponding symbol I guess.  Or could it be many tx_ids that are collapsed to the symbol "" during summarization?

ADD REPLY
1
Entering edit mode

Re “many tx_ids that are collapsed to the symbol "... If so, you can go looking in your tx2gene.

ADD REPLY
0
Entering edit mode
Ed Siefker ▴ 230
@ed-siefker-5136
Last seen 5 months ago
United States

Good point Michael. They weren't hard to find. 

> tx2gene <- transcripts(EnsDb.Mmusculus.v79, columns=c("gene_name"), return.type="data.frame")[c(2,1)]
> head(tx2gene,n=12)
                tx_id     gene_name
1  ENSMUST00000077235
2  ENSMUST00000179505
3  ENSMUST00000178343
4  ENSMUST00000187028
5  ENSMUST00000186475
6  ENSMUST00000161472
7  ENSMUST00000182513
8  ENSMUST00000130094 0610005C13Rik
9  ENSMUST00000145208 0610005C13Rik
10 ENSMUST00000133678 0610005C13Rik
11 ENSMUST00000123549 0610005C13Rik
12 ENSMUST00000132138 0610005C13Rik
> which(tx2gene$gene_name =="")
[1] 1 2 3 4 5 6 7
>

tx2gene[1,] is DHRSX.  Both of tx2gene[2:3,] are AC149090.  tx2gene[4,] is Zfp383 and so on.

So, we are collapsing multiple tx_id to "".  Unsurprising, since Ensembl is on v90 and we're using v79 annotations.  I haven't tested, but I'd imagine building the current EnsDB using ensembldb as documented
(http://bioconductor.org/packages/release/bioc/vignettes/ensembldb/inst/doc/ensembldb.html#1021_directly_from_ensembl_databases) would fix this. 

ADD COMMENT

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6