Search
Question: read counts for gene families
1
gravatar for zhou_hye
3 months ago by
zhou_hye10
zhou_hye10 wrote:

Hi, I am using edger to analyze RNA-seq data. I have generated count table for each genes under each condition. However, my goal is not to look at differential expression of specific genes. Instead, I want to look at differential expression of some gene families. For example, I have gene A1, A2,...,An under gene family A . I want to see whether gene family A is differentially expressed. My question is whether the expression of gene family A is simply the addition of expression of genes A1 to An regarding to normalized read counts. By the way, the gene lengths are slightly different for genes within same family. What is in my mind is that I am quite sure FPKM is addible but not read counts. Thus, if read count is not addible, what should I do to analyze DE of gene families. Even if read count is addible, can I simply generate a new table for gene families and follow the edger guide as genes. Thank you for your time. 

 

 

ADD COMMENTlink modified 3 months ago by Gordon Smyth34k • written 3 months ago by zhou_hye10
4
gravatar for Gordon Smyth
3 months ago by
Gordon Smyth34k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth34k wrote:

As Ryan has already pointed out, you don't need to add anything, counts or otherwise. Just do a gene set test for your family of genes.

Suppose your DGEList object is dge, and suppose you have an annotation column called "Symbol". Then

Family <- c("A1","A2","A3")
fry(dge, index=Family, design=design, geneid="Symbol")

will test whether the gene family is differentially expressed.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Gordon Smyth34k
2
gravatar for Ryan C. Thompson
3 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.8k wrote:

Generally I wouldn't recommend trying to combine counts from multiple different genes. What precisely is your null hypothesis? If you want to know whether any genes in a family are differentially expressed, you can use roast fry. If you want to know whether a family is enriched for differentially expressed genes, you can use camera.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Ryan C. Thompson6.8k
1

For an edgeR analysis, fry() is quicker and better than roast(). It's actually equivalent to using roast() with nrot=Inf.

ADD REPLYlink written 3 months ago by Gordon Smyth34k

Thanks for the correction. I've done most of my gene set testing with limma, so I'm less familiar with the options for edgeR.

ADD REPLYlink written 3 months ago by Ryan C. Thompson6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 265 users visited in the last hour