Dear Colleagues,
I have an RNA-seq library with 27 samples. 9 of these samples are what
we
call input as the other 18 are derived from them through a biochemical
process. As such, I have already scaled the genes in those 18 samples
to
roughly add-up to the corresponding inputs.
To manage this at the calcNormFactors step I have the following code:
y$samples
y_inputs=y$counts[,grep(".[i]..",colnames(y$counts))]
norm_factor_inputs <- calcNormFactors(y_inputs)
y$samples$norm.factors[grep(".[i]..",row.names(y$samples))] =
norm_factor_inputs
y$samples$norm.factors[grep(".[p]..",row.names(y$samples))] =
norm_factor_inputs
y$samples$norm.factors[grep(".[r]..",row.names(y$samples))] =
norm_factor_inputs
y$samples
So what I'm basically trying to do is calcNormFactors for just the
inputs
and then copy them to be the NormFactors for the p and r samples. Is
copying them like this telling edgeR to re-compute the library sizes?
It
doesn't seem to be happening because the lib.size before and after
these
commands are the same.
Best,
Scott Daniel
Zarnescu Lab
MCB Ph.D. Program
University of Arizona
[[alternative HTML version deleted]]
Dear Scott,
Normalization scaling factors are specific to each library, so you
cannot
meaningfully compute them for one set of libraries and copy the
results to
a larger set of libraries.
If you try to explain what you are actually trying to achieve, we
might be
able to give you some advice.
As it is, I don't yet understand why you aren't simply using
calcNormFactors() in the way it is intended, which would be to apply
it to
all libraries at once, or why you need to do any prior ad hoc scaling.
Not sure what you mean by an 'input' sample, because 'input' is
usually
associated with ChIP-seq rather than RNA-seq.
Best wishes
Gordon
> Date: Tue, 15 Apr 2014 16:40:53 -0700
> From: Scott Daniel <scottdaniel at="" email.arizona.edu="">
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR and calcNormFactors manually
>
> Dear Colleagues,
>
> I have an RNA-seq library with 27 samples. 9 of these samples are
what we
> call input as the other 18 are derived from them through a
biochemical
> process. As such, I have already scaled the genes in those 18
samples to
> roughly add-up to the corresponding inputs.
>
> To manage this at the calcNormFactors step I have the following
code:
> y$samples
> y_inputs=y$counts[,grep(".[i]..",colnames(y$counts))]
> norm_factor_inputs <- calcNormFactors(y_inputs)
> y$samples$norm.factors[grep(".[i]..",row.names(y$samples))] =
> norm_factor_inputs
> y$samples$norm.factors[grep(".[p]..",row.names(y$samples))] =
> norm_factor_inputs
> y$samples$norm.factors[grep(".[r]..",row.names(y$samples))] =
> norm_factor_inputs
> y$samples
>
> So what I'm basically trying to do is calcNormFactors for just the
inputs
> and then copy them to be the NormFactors for the p and r samples. Is
> copying them like this telling edgeR to re-compute the library
sizes? It
> doesn't seem to be happening because the lib.size before and after
these
> commands are the same.
>
> Best,
> Scott Daniel
> Zarnescu Lab
> MCB Ph.D. Program
> University of Arizona
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Hi,
First off, I determined that assigning the normalization factors
manually
as-it-were does affect the results (# of DE genes). I did this by
comparing
it to a set where I just set p and r to a scaling factor of 1. The
library
sizes don't change in the y$samples but the results change.
To go back to the general discussion of why we did ad-hoc scaling,
I'll try
to explain. The 'input' I am speaking of is the RNA sample preceding a
polysome fractionation. We then take the resultant products and split
them
into 'RNP' (which includes the monosome, free RNA and ribo-nucelo-
proteins)
and 'Poly' (which includes actively-translating, polysomal RNA). We've
then
done RNA-seq on all three samples (input, rnp, poly).
The idea is that we scale the poly and rnp counts to the input using a
simple linear model. We had another mathematician explain it like
this:
"So then what you have to do is basically construct a mx2 matrix A,
where m
= total number of genes. Then you fill the first column of A with the
counts for all genes in your polysome sub-sample, and the second
column of
A with the counts for all genes in your RNP sub-sample. The you make a
mx1
matrix Y, which is the counts for all genes in the input sub-sample.
Then
you find the least-squares solution (in R I think it's lsqfit(A,Y) or
something) to the equation Y = Ax and solve for x, which will be a 2x1
vector with your scaling factors n_p and n_r. Then, you replace your
polysome counts with P*n_p and your RNP counts with R*n_r"
So, since we have done this manual scaling, our mathematician
recommended
we calculate normalization factors for input only and then apply these
factors to the corresponding polysome and rnp samples.
Thanks.
Best,
Scott Daniel
Zarnescu Lab
MCB Ph.D. Program
University of Arizona
On Thu, Apr 17, 2014 at 6:26 PM, Gordon K Smyth <smyth@wehi.edu.au>
wrote:
> Dear Scott,
>
> Normalization scaling factors are specific to each library, so you
cannot
> meaningfully compute them for one set of libraries and copy the
results to
> a larger set of libraries.
>
> If you try to explain what you are actually trying to achieve, we
might be
> able to give you some advice.
>
> As it is, I don't yet understand why you aren't simply using
> calcNormFactors() in the way it is intended, which would be to apply
it to
> all libraries at once, or why you need to do any prior ad hoc
scaling.
>
> Not sure what you mean by an 'input' sample, because 'input' is
usually
> associated with ChIP-seq rather than RNA-seq.
>
> Best wishes
> Gordon
>
> Date: Tue, 15 Apr 2014 16:40:53 -0700
>> From: Scott Daniel <scottdaniel@email.arizona.edu>
>> To: bioconductor@r-project.org
>> Subject: [BioC] edgeR and calcNormFactors manually
>>
>> Dear Colleagues,
>>
>> I have an RNA-seq library with 27 samples. 9 of these samples are
what we
>> call input as the other 18 are derived from them through a
biochemical
>> process. As such, I have already scaled the genes in those 18
samples to
>> roughly add-up to the corresponding inputs.
>>
>> To manage this at the calcNormFactors step I have the following
code:
>> y$samples
>> y_inputs=y$counts[,grep(".[i]..",colnames(y$counts))]
>> norm_factor_inputs <- calcNormFactors(y_inputs)
>> y$samples$norm.factors[grep(".[i]..",row.names(y$samples))] =
>> norm_factor_inputs
>> y$samples$norm.factors[grep(".[p]..",row.names(y$samples))] =
>> norm_factor_inputs
>> y$samples$norm.factors[grep(".[r]..",row.names(y$samples))] =
>> norm_factor_inputs
>> y$samples
>>
>> So what I'm basically trying to do is calcNormFactors for just the
inputs
>> and then copy them to be the NormFactors for the p and r samples.
Is
>> copying them like this telling edgeR to re-compute the library
sizes? It
>> doesn't seem to be happening because the lib.size before and after
these
>> commands are the same.
>>
>> Best,
>> Scott Daniel
>> Zarnescu Lab
>> MCB Ph.D. Program
>> University of Arizona
>>
>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:10}}
Dear Scott,
I'm afraid I still have little idea of what you're trying to achieve.
I'm
not even quite sure whether you are looking for differential exprssion
and, if so, between which samples.
Your mathematician is describing least squares regression though the
origin. I don't see that that would be sensible thing to do in the
edgeR
context.
Have you considered
?normalizeChIPtoInput
or
?calcNormOffsetsforChIP
Best
Gordon
On Fri, 25 Apr 2014, Scott Daniel wrote:
> Hi,
>
> First off, I determined that assigning the normalization factors
manually
> as-it-were does affect the results (# of DE genes). I did this by
comparing
> it to a set where I just set p and r to a scaling factor of 1. The
library
> sizes don't change in the y$samples but the results change.
>
> To go back to the general discussion of why we did ad-hoc scaling,
I'll try
> to explain. The 'input' I am speaking of is the RNA sample preceding
a
> polysome fractionation. We then take the resultant products and
split them
> into 'RNP' (which includes the monosome, free RNA and ribo-nucelo-
proteins)
> and 'Poly' (which includes actively-translating, polysomal RNA).
We've then
> done RNA-seq on all three samples (input, rnp, poly).
>
> The idea is that we scale the poly and rnp counts to the input using
a
> simple linear model. We had another mathematician explain it like
this:
> "So then what you have to do is basically construct a mx2 matrix A,
where m
> = total number of genes. Then you fill the first column of A with
the
> counts for all genes in your polysome sub-sample, and the second
column of
> A with the counts for all genes in your RNP sub-sample. The you make
a mx1
> matrix Y, which is the counts for all genes in the input sub-sample.
Then
> you find the least-squares solution (in R I think it's lsqfit(A,Y)
or
> something) to the equation Y = Ax and solve for x, which will be a
2x1
> vector with your scaling factors n_p and n_r. Then, you replace your
> polysome counts with P*n_p and your RNP counts with R*n_r"
>
> So, since we have done this manual scaling, our mathematician
recommended
> we calculate normalization factors for input only and then apply
these
> factors to the corresponding polysome and rnp samples.
>
> Thanks.
>
> Best,
> Scott Daniel
> Zarnescu Lab
> MCB Ph.D. Program
> University of Arizona
>
>
> On Thu, Apr 17, 2014 at 6:26 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote:
>
>> Dear Scott,
>>
>> Normalization scaling factors are specific to each library, so you
cannot
>> meaningfully compute them for one set of libraries and copy the
results to
>> a larger set of libraries.
>>
>> If you try to explain what you are actually trying to achieve, we
might be
>> able to give you some advice.
>>
>> As it is, I don't yet understand why you aren't simply using
>> calcNormFactors() in the way it is intended, which would be to
apply it to
>> all libraries at once, or why you need to do any prior ad hoc
scaling.
>>
>> Not sure what you mean by an 'input' sample, because 'input' is
usually
>> associated with ChIP-seq rather than RNA-seq.
>>
>> Best wishes
>> Gordon
>>
>> Date: Tue, 15 Apr 2014 16:40:53 -0700
>>> From: Scott Daniel <scottdaniel at="" email.arizona.edu="">
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] edgeR and calcNormFactors manually
>>>
>>> Dear Colleagues,
>>>
>>> I have an RNA-seq library with 27 samples. 9 of these samples are
what we
>>> call input as the other 18 are derived from them through a
biochemical
>>> process. As such, I have already scaled the genes in those 18
samples to
>>> roughly add-up to the corresponding inputs.
>>>
>>> To manage this at the calcNormFactors step I have the following
code:
>>> y$samples
>>> y_inputs=y$counts[,grep(".[i]..",colnames(y$counts))]
>>> norm_factor_inputs <- calcNormFactors(y_inputs)
>>> y$samples$norm.factors[grep(".[i]..",row.names(y$samples))] =
>>> norm_factor_inputs
>>> y$samples$norm.factors[grep(".[p]..",row.names(y$samples))] =
>>> norm_factor_inputs
>>> y$samples$norm.factors[grep(".[r]..",row.names(y$samples))] =
>>> norm_factor_inputs
>>> y$samples
>>>
>>> So what I'm basically trying to do is calcNormFactors for just the
inputs
>>> and then copy them to be the NormFactors for the p and r samples.
Is
>>> copying them like this telling edgeR to re-compute the library
sizes? It
>>> doesn't seem to be happening because the lib.size before and after
these
>>> commands are the same.
>>>
>>> Best,
>>> Scott Daniel
>>> Zarnescu Lab
>>> MCB Ph.D. Program
>>> University of Arizona
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}