Question

Best DEG tool for datasets with FPKM counts?

0

Entering edit mode

Nithisha ▴ 10

@nithisha-14272

Last seen 8.0 years ago

Hi everyone,

I have a dataset that contains FPKM values for 5 samples. Is it alright to carry out log2 transformation on these values and using Limma on them or is there a better way to do this?

Also, my gene_id column contains something like this- 0610005C13Rik, 0610007L01Rik. Could anyone guide me as to how to get the gene names? I find that whenever I download a dataset, I always have problems mapping the gene IDs to gene names. Could someone advice me on what the different methods of getting gene names for any particular dataset may bet? Or if I can use BioMart for anything etc.?

Thanks.

FPKM • 6.6k views

ADD COMMENT • link 8.3 years ago Nithisha ▴ 10

0

Entering edit mode

Nithisha ▴ 10

@nithisha-14272

Last seen 8.0 years ago

Hi Steve,

Thank you so much for your reply. When I do sample(dat$gebe_id, 20), I get this:

[1] Acot8 Tarm1 Aph1b
[4] Cga Ptms Irx3
[7] 1700003F12Rik Six3 Tdh
[10] Olfr67 Gm14327 Stx6
[13] Zfp868 Skap2 9130221H12Rik
[16] Crb1 Gm14005 Zfp616
[19] Yars2 Mif

I guess this means that gene symbols are already there. In this case, what would gene names like "0610005C13Rik, 0610007L01Rik" indicate?

The problem I encounter is that every time I have a dataset and it does not contain gene IDs, I do not know where to look for or how to map existing information to gene names. For instance, I had a few datasets from Affymatrix data and upon asking for help in this forum, realized that I could use the library(affycoretools) to do this. After reading some more posts, I realized that bioMart can be used as well. As such, I am totally lost as to what to look for and what tool to use to map to gene names.

In your post, did you mean look out for columns that contain ensembl gene identifiers/ entrrez ids/ refseq ids etc. as they can be used to map to gene names?

Thanks so much for your help!

ADD COMMENT • link 8.3 years ago Nithisha ▴ 10

1

Entering edit mode

Please remember to use ADD COMMENT to add comments.

The gene symbols you show like 1700003F12Rik (and yes, they are gene symbols) are RIKEN genes. Or rather, RIKEN ESTs that the RIKEN project generated way back in the dark old days of like 2000 or so. These days those are (probably) mostly speculative content that never turned out to be anything. Otherwise they would have real gene names and stuff because other people would have corroborated their existence.

Do note that you can always either just google things directly, or go to NCBI and search there. A google search for one of those RIKEN genes would have told you straight away what it is.

ADD REPLY • link 8.3 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you for the information.

ADD REPLY • link 8.3 years ago Nithisha ▴ 10

score 3 · Accepted Answer · 2017-11-02

For analysis, you will want to log2 transform the fpkm values and use the limma-trend pipeline, see here:

Differential expression of RNA-seq data using limma and voom()

As for the gene_id column, you're only looking in the land of "funny looking" gene ids. Take a random sampling of the gene_id values, ie. if you have a data.frame with a gene_id column, you could do: sample(dat$gene_id, 10) to see what the identifiers look like. Showing us the result of that would be a bit easier to diagnose what kind of identifiers you are working with.

In general, you should be able to recognize the difference between ensembl gene identifiers, entrrez ids, and refseq ids. Being able to look at an identifier and recognize it as one of these quickly will be very handy in your work, as well as a random bar trivia night (one day, you'll see).