I am using EdgeR to find DE genes in my data. In order to reduce necessary sequencing depth, I have been enriching my sequencing libraries for 300 genes of interest before sequencing (described here: http://www.ncbi.nlm.nih.gov/pubmed/24705597) and then running EdgeR DE analysis on only those genes. I am comparing gene expression between two experimental conditions.
I have been getting a list of DE genes using the following function:
edgeRDElist = function(subset, groups){
  y = DGEList(counts = subset, group=groups)
  y = calcNormFactors(y)
  y = estimateCommonDisp(y)
  y = estimateTagwiseDisp(y)
  et = exactTest(y)
  topTags(et, 20)
  de <- decideTestsDGE(et,adjust.method="fdr",p.value=0.05)
  print(subset[de == -1,])
  return(rownames(subset[de == -1,]))
}
Where subset is a matrix of read counts for a set of samples (consisting of biological replicates) and groups identifies the experimental condition.
Of the 300 genes that I am capturing (and therefore running the analysis on), 270 are “test” genes that could be DE between the two experimental conditions and 30 are “control” genes that I am reasonably sure should not change.
I am concerned that what I'm currently doing not appropriate for this analysis because it assumes that most of the genes are not DE. I am also concerned that using only a small number of genes is not compatible. Are these valid concerns? If so, is there anyway to get around them using EdgeR or another program?
Thanks!
Kelsey
