Entering edit mode
Eshita
▴
10
@eshita-5937
Last seen 10.6 years ago
Hi,
I have assembled 2 reference transcriptomes of the same species using
a) genome-guided assembler on an incomplete draft-genome
and
b) genome-independent assembler.
Q.1) Since, each assembly has it's own limitations making it difficult
to combine the datasets, so I would like to know your suggestions for
the two strategies:
a) doing each differential expression analysis independently (on full-
length transcripts and predicted ORFs) and combining the results only
after identification of genes.
or
b) Take the genome-guided assembly, add missing data from the genome
independent assembly and do mapping, read counting and differential
expression analysis on predicted ORFs from this one assembly.
Q.2) I used the eXpress package for counting reads, and this reports
the raw counts as well as effective counts (after correction for
distribution biases). Since edgeR recommends using raw counts, I have
used these and obtained expected results for genes that pass a min.
cpm cutoff.
However, eXpress developers recommend the use of rounded effective
counts over raw counts even for edgeR. From what I see the maximum
difference I would see would be in the removal of lowly expressed
genes from the dataset and large-biases in genes with very high no. of
mapped reads (which is a problem in my dataset).
It would be informative to have some input from the developers on the
issue of these biases and on using the normalised count value.
Thanks
Eshita Sharma
---------------------------
Graduate Student
Max Planck Institute for Developmental Biology