Dear Steve,thanks for your fast reply.About the CC, sorry, I didnt
realize.I will repeat it now with the new data provide by you now. We
realize that the number of DE genes were too high, so we just take in
account for later analysis the higher ones. Also, exons are not
properly called. In this organisms, there is no introns, so no
necessity of analysis of splice variants. For some reason, the ones
who made the gff file that I take for alignment, called it like this,
but they are considered as genes. About the edgeR, the analysis was
made time ago, so I will do it again with the new version.Again, thank
you for the patience and the long reply. Sandra
> From: lianoglou.steve@gene.com
> To: dedeusan@hotmail.com
> Subject: Re: [BioC] Question about median of replicates
> Date: Thu, 31 Jul 2014 14:19:29 -0700
>
> Sandra,
>
> First, as I mentioned previously: PLEASE include the bioconductor
list
> when seeking for and replying to help this way others can benefit
from
> the help, and also help you better than I can.
>
> I would normally just CC the list in this reply, but I won't here.
>
> Your analysis looks (more or less) correct, but:
>
> (1) you have missed an "estimateTagwiseDisp" call after your
> "estimateCommonDisp"
>
> (2) one normally filters out rows by logCPM, and not by the raw
counts
>
> (3) you *still* haven't provided your sessionInfo so we can verify
you
> are using the latest versions of the software, but if you aren't --
you
> should upgrade.
>
> You end with asking:
>
> > So that is why we asked ourselves what is the basics of EdgeR,
because
> > now we have in all our data at least 1 fold less than before, but
the
> > most important thing is that we dont know still why. So I was
relieved
> > because I think it is not such a big deal and that the analysis is
> > getting real results, but still dont understand why exists this
> > difference. Can you give me a small explanation if it is possible?
> > Maybe I put something wrong in the analysis...Sandra
>
> One thing to understand is that edgeR (or DESeq2, or limma) is not
> basic, so it's hard to understand "the basics" without a certain
degree
> of statistical sophistication.
>
> I didn't quite follow the math example that you provided as the
> formatting came through weird, so I'm not sure what the "1 logFC
> difference" you are describing is.
>
> James' email to you outlined an easy example (with real/simulated
data
> that you can generate using the same code he wrote in his email) of
how
> the logFC's calculated by edgeR will be different than those you
> calculate by hand -- simply because you can't just calculate it by
hand.
>
> The details of why don't really matter for you. edgeR is a widely
used
> piece of software written by card-carrying statisticians and
published
> under peer review, so as far as you should be concerned, its results
are
> correct as long as you perform the analysis correctly.
>
> I'll end with just pointing out that the number of genes you are
> identifying as DE are quite high, so if it were me, I'd be
suspicious of
> something and double check lots of things.
>
> Also, I just reviewed your code again and it looks like you are
counting
> exon expression instead of gene expression? If this is the case you
> should have mentioned this from the get go! and also you are doing
it
> wrong. you can try to use edgeR::spliceVariants or the DEXSeq
package.
>
> -steve
[[alternative HTML version deleted]]