Entering edit mode
Dear Eshita,
> Date: Mon, 13 May 2013 15:25:28 +0200
> From: Eshita <eshita.sharma at="" tuebingen.mpg.de="">
> To: bioconductor at r-project.org
> Subject: [BioC] Combining differential gene expression on 2
reference
> transcriptomes: EdgeR analysis
>
> Hi,
>
> I have assembled 2 reference transcriptomes of the same species
using
> a) genome-guided assembler on an incomplete draft-genome
> and
> b) genome-independent assembler.
>
> Q.1) Since, each assembly has it's own limitations making it
difficult
> to combine the datasets, so I would like to know your suggestions
for
> the two strategies:
>
> a) doing each differential expression analysis independently (on
> full-length transcripts and predicted ORFs) and combining the
results
> only after identification of genes.
>
> or
>
> b) Take the genome-guided assembly, add missing data from the genome
> independent assembly and do mapping, read counting and differential
> expression analysis on predicted ORFs from this one assembly.
>
> Q.2) I used the eXpress package for counting reads, and this reports
the
> raw counts as well as effective counts (after correction for
> distribution biases). Since edgeR recommends using raw counts, I
have
> used these and obtained expected results for genes that pass a min.
cpm
> cutoff. However, eXpress developers recommend the use of rounded
> effective counts over raw counts even for edgeR.
As you already know, the edgeR developers recommend raw counts,
because
the methodology pre-supposes counts. I haven't specially evaluated
the
eXpress's recommendation, but the onus is on eXpress to justify this,
to
provide good evidence that it is a good idea to enter non-counts into
a
statistical methods intended for counts.
If you really must use effective counts, then my suggestion would be
to
use voom instead of edgeR because it works fine with fractional
counts.
> From what I see the maximum difference I would see would be in the
> removal of lowly expressed genes from the dataset and large-biases
in
> genes with very high no. of mapped reads (which is a problem in my
> dataset). It would be informative to have some input from the
developers
> on the issue of these biases and on using the normalised count
value.
I am unclear what "biases" you are referring to, and we (edgeR
developers)
have already unambiguously told you that we do not recommend
normalized
counts.
Have you read Section 2.5 of the edgeR User's Guide?
Best wishes
Gordon
> Thanks
> Eshita Sharma
>
> ---------------------------
> Graduate Student
> Max Planck Institute for Developmental Biology
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}