from kallisto to deseq2 analysis
2
0
Entering edit mode
Marianna ▴ 10
@7cc5052f
Last seen 9 days ago
Italy

Hi all,

I'm doing a DE analysis using deseq2 with a non-model species, thus I retrieved annotation using biomaRt in R. I've imported kallisto counts using tximport and collapsing transcripts to genes (txOut=FALSE). I collapsed transcripts either to "Ensembl gene ID" or to "external gene name" and I got a slighly different output in term of DE genes. I suppose this is due to the fact that if you collapse to gene names you lose all the genes without a definite gene name. This, in turn, affects the total number of genes considered and the FDR calculations. Isn't it?

Said that, in your opinion which is the best option? using gene IDs or gene names?

Thanks in advance for you help!

Best

Marianna

biomaRt DESeq2 tximport • 1.7k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

I wouldn't use a gene name for anything but annotating output so my biologist colleagues can find their gene of interest. Gene IDs are meant to be systematic and unique (although even with long standing model organisms the annotation services are still collapsing duplicates). Gene symbols are pretty good for humans, a bit less so for mice, and once you get past that it's the wild west.

> z <- mapIds(org.Hs.eg.db, keys(org.Hs.eg.db, "SYMBOL"), "ENTREZID", "SYMBOL", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
> table(sapply(z, length))

    1     2     3 
63882     8     1 
> z <- mapIds(org.Mm.eg.db, keys(org.Mm.eg.db, "SYMBOL"), "ENTREZID", "SYMBOL", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
> table(sapply(z, length))

    1     2     3 
72516   275     5 
> z <- mapIds(org.Mmu.eg.db, keys(org.Mmu.eg.db, "SYMBOL"), "ENTREZID", "SYMBOL", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
> table(sapply(z, length))

    1     2     3     4     5     6     7     8     9    10    11    12    13 
39926    26     2     6     6     5     1     5     3     2     6     1     3 
   15    20 
    1     6

So even an animal like Macaque still has multiple symbols that probably point to multiple different genes.

ADD COMMENT
0
Entering edit mode

Thank you James,

you have definetly confirmed my doubts.

Best

Marianna

ADD REPLY
0
Entering edit mode
Marianna ▴ 10
@7cc5052f
Last seen 9 days ago
Italy

I'm back again....

If I want to run Deseq2 for the DE analysis, tximport should be set with countsFromAbundance="lengthScaledTPM" or not? I had a look into manuals and in this forum, but this issue is not completely clear for me.

Thanks again!

Marianna

ADD COMMENT
0
Entering edit mode

Any setting is allowed. See tximport vignette.

ADD REPLY
0
Entering edit mode

Thank you Micheal,

so if not set, as in the vignette, the default is countsFromAbundance="no". Deseq2 will use kallisto/Salmon abundance.

Best

Marianna

ADD REPLY

Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6