Question

Combining differential gene expression on 2 reference transcriptomes: EdgeR analysis

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 minute ago

WEHI, Melbourne, Australia

Dear Eshita, > Date: Mon, 13 May 2013 15:25:28 +0200 > From: Eshita <eshita.sharma at="" tuebingen.mpg.de=""> > To: bioconductor at r-project.org > Subject: [BioC] Combining differential gene expression on 2 reference > transcriptomes: EdgeR analysis > > Hi, > > I have assembled 2 reference transcriptomes of the same species using > a) genome-guided assembler on an incomplete draft-genome > and > b) genome-independent assembler. > > Q.1) Since, each assembly has it's own limitations making it difficult > to combine the datasets, so I would like to know your suggestions for > the two strategies: > > a) doing each differential expression analysis independently (on > full-length transcripts and predicted ORFs) and combining the results > only after identification of genes. > > or > > b) Take the genome-guided assembly, add missing data from the genome > independent assembly and do mapping, read counting and differential > expression analysis on predicted ORFs from this one assembly. > > Q.2) I used the eXpress package for counting reads, and this reports the > raw counts as well as effective counts (after correction for > distribution biases). Since edgeR recommends using raw counts, I have > used these and obtained expected results for genes that pass a min. cpm > cutoff. However, eXpress developers recommend the use of rounded > effective counts over raw counts even for edgeR. As you already know, the edgeR developers recommend raw counts, because the methodology pre-supposes counts. I haven't specially evaluated the eXpress's recommendation, but the onus is on eXpress to justify this, to provide good evidence that it is a good idea to enter non-counts into a statistical methods intended for counts. If you really must use effective counts, then my suggestion would be to use voom instead of edgeR because it works fine with fractional counts. > From what I see the maximum difference I would see would be in the > removal of lowly expressed genes from the dataset and large-biases in > genes with very high no. of mapped reads (which is a problem in my > dataset). It would be informative to have some input from the developers > on the issue of these biases and on using the normalised count value. I am unclear what "biases" you are referring to, and we (edgeR developers) have already unambiguously told you that we do not recommend normalized counts. Have you read Section 2.5 of the edgeR User's Guide? Best wishes Gordon > Thanks > Eshita Sharma > > --------------------------- > Graduate Student > Max Planck Institute for Developmental Biology > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

edgeR edgeR • 995 views

ADD COMMENT • link 10.9 years ago Gordon Smyth 50k