Question: Best DEG tool for datasets with FPKM counts?
gravatar for Natasha
17 days ago by
Natasha0 wrote:

Hi everyone,

I have a dataset that contains FPKM values for 5 samples. Is it alright to carry out log2 transformation on these values and using Limma on them or is there a better way to do this?

Also, my gene_id column contains something like this- 0610005C13Rik,  0610007L01Rik. Could anyone guide me as to how to get the gene names? I find that whenever I download a dataset, I always have problems mapping the gene IDs to gene names. Could someone advice me on what the different methods of getting gene names for any particular dataset may bet? Or if I can use BioMart for anything etc.?


ADD COMMENTlink modified 17 days ago • written 17 days ago by Natasha0
gravatar for Steve Lianoglou
17 days ago by
Steve Lianoglou12k wrote:

For analysis, you will want to log2 transform the fpkm values and use the limma-trend pipeline, see here:

Differential expression of RNA-seq data using limma and voom()

As for the gene_id column, you're only looking in the land of "funny looking" gene ids. Take a random sampling of the gene_id values, ie. if you have a data.frame with a gene_id column, you could do: sample(dat$gene_id, 10) to see what the identifiers look like. Showing us the result of that would be a bit easier to diagnose what kind of identifiers you are working with.

In general, you should be able to recognize the difference between ensembl gene identifiers, entrrez ids, and refseq ids. Being able to look at an identifier and recognize it as one of these quickly will be very handy in your work, as well as a random bar trivia night (one day, you'll see).

ADD COMMENTlink written 17 days ago by Steve Lianoglou12k
gravatar for Natasha
17 days ago by
Natasha0 wrote:

Hi Steve,

Thank you so much for your reply. When I do sample(dat$gebe_id, 20), I get this:

 [1] Acot8         Tarm1         Aph1b        
 [4] Cga           Ptms          Irx3         
 [7] 1700003F12Rik Six3          Tdh          
[10] Olfr67        Gm14327       Stx6         
[13] Zfp868        Skap2         9130221H12Rik
[16] Crb1          Gm14005       Zfp616       
[19] Yars2         Mif          

I guess this means that gene symbols are already there. In this case, what would gene names like "0610005C13Rik,  0610007L01Rik"  indicate? 

The problem I encounter is that every time I have a dataset  and it does not contain gene IDs,  I do not know where to look for or how to map existing information to gene names. For instance, I had a few datasets from Affymatrix data and upon asking for help in this forum, realized that I could use the library(affycoretools) to do this. After reading some more posts, I realized that bioMart can be used as well. As such, I am totally lost as to what to look for and what tool to use to map to gene names. 

In your post, did you mean look out for columns that contain ensembl gene identifiers/ entrrez ids/ refseq ids etc. as they can be used to map to gene names?

Thanks so much for your help!

ADD COMMENTlink written 17 days ago by Natasha0

Please remember to use ADD COMMENT to add comments.

The gene symbols you show like 1700003F12Rik (and yes, they are gene symbols) are RIKEN genes. Or rather, RIKEN ESTs that the RIKEN project generated way back in the dark old days of like 2000 or so. These days those are (probably) mostly speculative content that never turned out to be anything. Otherwise they would have real gene names and stuff because other people would have corroborated their existence.

Do note that you can always either just google things directly, or go to NCBI and search there. A google search for one of those RIKEN genes would have told you straight away what it is.

ADD REPLYlink written 17 days ago by James W. MacDonald45k

Thank you for the information.

ADD REPLYlink written 16 days ago by Natasha0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 174 users visited in the last hour