Dear All,
I have RNAseq counts for 400 human donors. This is a random sampling
of human subjects from a population-wide study. I am not interested in
differential expression of genes between certain groups. There are
differences in sequence reads because of library size therefore I need
to normalize the counts.
I have been reading the postings on this list regarding the
normalization methods in DEseq and edgeR. I looked at the reference
manuals of both of these packages. I understand that they both use
different normalization approaches. My understanding is that while
both approaches use the sample information (i.e. whether they are from
control or treatment condition) in order to create a list object as a
first step, this information is not used in the normalization step but
only in the differential expression analysis step. Is this correct?
I am also curious to hear other's opinions regarding different
approaches for a study design like mine. Perhaps I have been looking
at the wrong places but the papers I found in the literature seem to
be concerned with differential expression of transcripts between
groups.
Mete
________________________________
IMPORTANT WARNING: This email (and any attachments) is
o...{{dropped:12}}
hi Mete,
On Mon, Jul 29, 2013 at 3:22 AM, Mete Civelek <mcivelek at="" mednet.ucla.edu=""> wrote:
> Dear All,
>
> I have RNAseq counts for 400 human donors. This is a random sampling
of human subjects from a population-wide study. I am not interested in
differential expression of genes between certain groups. There are
differences in sequence reads because of library size therefore I need
to normalize the counts.
>
> I have been reading the postings on this list regarding the
normalization methods in DEseq and edgeR. I looked at the reference
manuals of both of these packages. I understand that they both use
different normalization approaches. My understanding is that while
both approaches use the sample information (i.e. whether they are from
control or treatment condition) in order to create a list object as a
first step, this information is not used in the normalization step but
only in the differential expression analysis step. Is this correct?
>
Yes, this is true for DESeq/DESeq2. The transformations in DESeq2 have
an argument blind, which defaults to TRUE, which estimates the
dispersion for the transformation without using any information of the
experimental design.
It depends on what you want to do with the normalized data, but the
VST or rlog transformation should help you for instance cluster
samples or genes in a large data set, by stabilizing the variance
across the range of mean counts.
If there are large difference in library sizes, we recommend to use
rlogTransformation(). Furthermore, the rlog implementation in the
devel branch seems to perform qualitatively better than the one in the
release branch. The difference is that in the devel branch, the rlog
transformation uses the fitted dispersion values rather than the
shrunken dispersion estimates. This makes the rlog perform more like
the VST, and avoids squashing what could be large, true differences
across samples for high count genes.
Mike
Hi Mike,
Thank you for the suggestions. I will try out the log transformation
function.
Mete
-------- Original message --------
From: Michael Love <michaelisaiahlove@gmail.com>
Date:
To: "Civelek, Mete" <mcivelek at="" mednet.ucla.edu="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] RNAseq data normalization without differential
expression
hi Mete,
On Mon, Jul 29, 2013 at 3:22 AM, Mete Civelek <mcivelek at="" mednet.ucla.edu=""> wrote:
> Dear All,
>
> I have RNAseq counts for 400 human donors. This is a random sampling
of human subjects from a population-wide study. I am not interested in
differential expression of genes between certain groups. There are
differences in sequence reads because of library size therefore I need
to normalize the counts.
>
> I have been reading the postings on this list regarding the
normalization methods in DEseq and edgeR. I looked at the reference
manuals of both of these packages. I understand that they both use
different normalization approaches. My understanding is that while
both approaches use the sample information (i.e. whether they are from
control or treatment condition) in order to create a list object as a
first step, this information is not used in the normalization step but
only in the differential expression analysis step. Is this correct?
>
Yes, this is true for DESeq/DESeq2. The transformations in DESeq2 have
an argument blind, which defaults to TRUE, which estimates the
dispersion for the transformation without using any information of the
experimental design.
It depends on what you want to do with the normalized data, but the
VST or rlog transformation should help you for instance cluster
samples or genes in a large data set, by stabilizing the variance
across the range of mean counts.
If there are large difference in library sizes, we recommend to use
rlogTransformation(). Furthermore, the rlog implementation in the
devel branch seems to perform qualitatively better than the one in the
release branch. The difference is that in the devel branch, the rlog
transformation uses the fitted dispersion values rather than the
shrunken dispersion estimates. This makes the rlog perform more like
the VST, and avoids squashing what could be large, true differences
across samples for high count genes.
Mike
________________________________
IMPORTANT WARNING: This email (and any attachments) is
o...{{dropped:9}}