Entering edit mode
Alejandro Reyes <alejandro.reyes at="" ...=""> writes:
>
> Dear Julien, Dear Mar and people interested in DEXSeq ,
>
> You recently reported some problems in DEXSeq that had to do with
the
> way the HTSeq python scripts deal with the exons that overlap with
more
> than one gene ID.
>
> The solution that we had taken so far was that the gene IDs sharing
an
> exon were merged into an "aggregate gene" ID. From the input of
some
> users and our own experience, we know that it was not the most
> appropriate solution: when the merged genes were differentially
> expressed, DEXSeq falsely calls differential usage in other exons of
the
> aggregate genes. We have included a "-r" parameter in the script
> "prepare_annotation_dexseq.py", for the user to decide what to do
with
> these exons: either to ignore the exons associated with more than
one
> gene IDs and treat each gene separately, or to merge the genes and
take
> these exons into account.
>
> Additionally, we have implemented the R/Bioconductor functions
> equivalent to the python scripts. These functions were implemented
using
> code contributed by Mike Love.
>
> All these changes are available in the last svn version (1.5.9).
>
> Best regards,
> Alejandro Reyes
>
> Hi Alejandro,
> Just to let you know that adding the junctions to the test of
> differential expression of DEXSeq worked fine! The "hack" was
actually
> straightforward, I just had to modify the counts files taken as
input.
>
> On a different note, I noticed that many false positives were
generated
> because of "aggregate" gene models that were composed on different
> overlapping genes. When these overlapping genes have different
behavior
> in different conditions, this is interpreted as differential
expression
> of some exons, while it is differential expression of genes... See
the
> attached picture, this might turn out to be easier to understand
> Did you notice this behavior of DEXSeq, and do you have any comment
on
> this?
>
> Thanks again for your work on DEXSeq
> Julien
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
Hi, Alejandro
We recently used the python script dexseq_prepare_annotation.py to
generate exon.gtf from genes.gtf. Similar to the situations in your
previous discussion, we found some overlapping exons which share
patial/all regions and in the meanwhile belong to different genes,
which
you can see below, the 1st and the 4th line. Even after using "-r yes"
parameter, we still see these happening. "-r yes" should be the
default.
We thought by merging the genes and taking all these exons into
account
would generate non-overlapping exon.gtf in the end, but now the gtf
still
have some overlapping exons. We are concerned if the further count
files
generated would be biased in these regions. Do you have any lastest
update
about the solution of this problem?
In your last message, you mentioned about the R functions equivalent
to
the python scripts which were contributed by Mike Love. Could you
provide
more details about how we can find these functions, like the name or
the
website of it?
Thank you very much in advance!
Thanks,
Xiayu
1 dexseq_prepare_annotation.py exonic_part 13671
14409 . + .
transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_
part_number "018"; gene_id "ENSG00000223972"
1 dexseq_prepare_annotation.py exonic_part 14410
14412 . + . transcripts "ENST00000515242";
exonic_part_number "019"; gene_id "ENSG
00000223972"
1 dexseq_prepare_annotation.py aggregate_gene 14363
29806 . - . gene_id "ENSG00000227232"
1 dexseq_prepare_annotation.py exonic_part 14363
14403 . - .
transcripts "ENST00000541675+ENST00000423562+ENST00000438504"; exonic_
part_number "001"; gene_id "ENSG00000227232"