Question: tximport gene name
1
gravatar for tanyabioinfo
19 months ago by
tanyabioinfo20
tanyabioinfo20 wrote:

Hi I am doing the following to get the tximport count  matrix with gene name in the first column

txdf <- transcripts(EnsDb.Mmusculus.v79, return.type = "DataFrame")
txdf$symbol <- mapIds(EnsDb.Mmusculus.v79, txdf$gene_id, "GENENAME", "GENEID")
tx2gene <- as.data.frame(txdf[,c("tx_id","symbol")])

txi <- tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion=TRUE,dropInfReps=TRUE)

 

However when I do head(txi$abundance)

                 0 wt      0 wt     0 wt     0 wt     6 wt     6 wt     6 wt
              71.50353 112.29713 73.64570 73.13216 60.17879 56.01880 57.25439
0610007P14Rik  0.00000  16.73136 69.46050 60.45882 86.66511 27.10330 48.84700
0610009B22Rik  0.00000  16.34480 29.00857 26.11050  0.00000 18.28440 25.29169

I am getting an extra row at the top. Can someone help me to rectify this or let me know if I am doing anything wrong.

Tanya

tximport • 442 views
ADD COMMENTlink modified 18 months ago by Ed Siefker220 • written 19 months ago by tanyabioinfo20
Answer: tximport gene name
0
gravatar for Michael Love
19 months ago by
Michael Love23k
United States
Michael Love23k wrote:

I believe that’s not a row of data, it’s the column names. Check what is in position [1,1] if you want to see the data values alone.

ADD COMMENTlink written 19 months ago by Michael Love23k

I believe the "extra row" she means is the one under the column names. 
71.50353 112.29713 73.64570 73.13216 60.17879 56.01880 57.25439

I did the same analysis with the same annotation this week and also have a row with no rowname.  I was worried there might be an off by one error somewhere, but my results look similar to those from another tool so that's probably not the case.  Just one tx_id that doesn't have a corresponding symbol I guess.  Or could it be many tx_ids that are collapsed to the symbol "" during summarization?

ADD REPLYlink written 19 months ago by Ed Siefker220
1

Re “many tx_ids that are collapsed to the symbol "... If so, you can go looking in your tx2gene.

ADD REPLYlink written 19 months ago by Michael Love23k
Answer: tximport gene name
0
gravatar for Ed Siefker
18 months ago by
Ed Siefker220
United States
Ed Siefker220 wrote:

Good point Michael. They weren't hard to find. 

> tx2gene <- transcripts(EnsDb.Mmusculus.v79, columns=c("gene_name"), return.type="data.frame")[c(2,1)]
> head(tx2gene,n=12)
                tx_id     gene_name
1  ENSMUST00000077235
2  ENSMUST00000179505
3  ENSMUST00000178343
4  ENSMUST00000187028
5  ENSMUST00000186475
6  ENSMUST00000161472
7  ENSMUST00000182513
8  ENSMUST00000130094 0610005C13Rik
9  ENSMUST00000145208 0610005C13Rik
10 ENSMUST00000133678 0610005C13Rik
11 ENSMUST00000123549 0610005C13Rik
12 ENSMUST00000132138 0610005C13Rik
> which(tx2gene$gene_name =="")
[1] 1 2 3 4 5 6 7
>

tx2gene[1,] is DHRSX.  Both of tx2gene[2:3,] are AC149090.  tx2gene[4,] is Zfp383 and so on.

So, we are collapsing multiple tx_id to "".  Unsurprising, since Ensembl is on v90 and we're using v79 annotations.  I haven't tested, but I'd imagine building the current EnsDB using ensembldb as documented
(http://bioconductor.org/packages/release/bioc/vignettes/ensembldb/inst/doc/ensembldb.html#1021_directly_from_ensembl_databases) would fix this. 

ADD COMMENTlink written 18 months ago by Ed Siefker220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 339 users visited in the last hour