Entering edit mode
hi Jianming,
Let's keep the discussion on the bioconductor list.
On Wed, Sep 4, 2013 at 10:43 AM, éµå»ºæ
<jianmingshao1987@gmail.com> wrote:
> Hi Mike,
> Thank you for your mail about my question. The 83 genes were
> selected based on the GWAS results, so I did not know genes'
expression
> pattern between cases and controls, and that was what I want to know
from
> RNA-capture sequencing of 83 genes. The sequencing depth could be
> normalized by RPKM, the traditional RNA-seq gene expression
normalization
> method which normalize gene expression by dividing gene length and
total
> reads number.
>
âYou can go ahead with a differential expression analysis, but keep
in mind
the following problem, given that you selected a small set of genes
which
are candidates for differential expression.
say you have the following counts, for 2 control, 2 case samples:
gene_A: 2 2 2 2
gene_Bâ: 2 2 1 1
What you cannot tell apart is whether the size factors should be
2,2,1,1
(in which case gene A has a fold change of 2) or should be 2,2,2,2 (in
which case gene B has a fold change of 1/2).
In typical RNA-Seq experiments, all genes are assayed, so you can more
reliably estimate size factors by assuming that some subset of genes
do not
change expression level (by using medians or trimmed means) . See for
instance the section on estimate size factors with median ratios here
http://genomebiology.com/2010/11/10/r106 or the TMM normalization
method
here http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2864565/.
If you were to perform a differential analysis with DESeq and then
examine
the results with plotMA(), you wouldn't know exactly where the 0 on
the y
axis should go. One solution to this would be using spike-in
controls.
However, given the dataset you have, hopefully most of the log2 fold
changes would be small (e.g. less than 0.1). If the log2 fold changes
were
spread out farther it would be difficult to draw definitive
conclusions.
Mike
> 2013/9/4 Michael Love <michaelisaiahlove@gmail.com>
>
>> hi Jianming,
>>
>> How many of these 83 candidate genes do you expect to be
differentially
>> expressed?
>>
>> The problem is that: imagine that in the case samples, all 83
candidate
>> genes are upregulated. Without some way of assessing the sequencing
depth
>> (i.e. normalization using spike-in controls) it would be impossible
to tell
>> apart differential expression from sequencing depth.
>>
>> Mike
>>
>>
>> On Tue, Sep 3, 2013 at 10:35 AM, éµå»ºæ
<jianmingshao1987@gmail.com> wrote:
>>
>>> Dear all,
>>> I am a PhD candidate in Beijing Institute of Genomics,
Chinese
>>> Academy of Sciences. Recently I have been worked with data
analysis
>>> concerning RNA capture followed by high throughput sequencing.
Four
>>> samples, 2 cases and 2 controls, were used for sequencing. My
library
>>> preparation protocol is similar to workflow of Exome capture,
except for
>>> the material used for capture was cDNA and the capture library was
>>> customized probes synthesized by Agilent. After mapping, I want
to do
>>> DEG
>>> analysis utilizing DESeq, and I found the gene number would affect
the
>>> results given by DESeq. So my question is whether DESeq compatible
with
>>> limited genes (83 candidate genes for my project)? And would you
please
>>> give me some suggestions about DEG analysis concerning candidate
genes'
>>> RNA-seq? or, for my project, I could just calculate RPKM value for
each
>>> gene, and identify DEGs simply by fold change > 2? Thank you!
>>> Sincerely,
>>>
>>> Jianming SHAO
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
[[alternative HTML version deleted]]