Hi,
I have replicate sample counts for 2 groups but one sample is 4x
number of mapped reads
Than the other samples.
528,428
625,889
498,569
2,328,333
I divided all the mapped transcript reads by 4 and then did the
normalization and analysis
With edgeR.
What do you recommend to do with the 4th sample counts?
Lana Schaffer
Biostatistics, Informatics
DNA Array Core Facility
858-784-2263
[[alternative HTML version deleted]]
Dear Lana,
edgeR has no difficulty with uneven library sizes, and will adjust for
this automatically for this during the analysis. There is no need for
you
to do anything other than follow a standard analysis pipeline.
You do not need to standardize the 4th sample by dividing the counts
by
dividing by 4, in fact you must not do this since it changes the
mean-variance relationship for your data and invalidates the
subsequent
analysis. You need to input the true read counts into edgeR.
Best wishes
Gordon
> Date: Fri, 21 Oct 2011 15:27:25 -0700
> From: Lana Schaffer <schaffer at="" scripps.edu="">
> To: "'bioconductor at r-project.org'" <bioconductor at="" r-project.org="">
> Subject: [BioC] uneven counts for edgeR
>
> Hi,
> I have replicate sample counts for 2 groups but one sample is 4x
number of mapped reads
> Than the other samples.
> 528,428
>
> 625,889
>
> 498,569
>
> 2,328,333
>
> I divided all the mapped transcript reads by 4 and then did the
> normalization and analysis With edgeR. What do you recommend to do
with
> the 4th sample counts?
>
> Lana Schaffer
> Biostatistics, Informatics
> DNA Array Core Facility
> 858-784-2263
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Gordon,
Thank you for this information.
Is the same true for DeSeq?
Lana
-----Original Message-----
From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU]
Sent: Saturday, October 22, 2011 5:42 PM
To: Lana Schaffer
Cc: Bioconductor mailing list
Subject: uneven counts for edgeR
Dear Lana,
edgeR has no difficulty with uneven library sizes, and will adjust for
this automatically for this during the analysis. There is no need for
you
to do anything other than follow a standard analysis pipeline.
You do not need to standardize the 4th sample by dividing the counts
by
dividing by 4, in fact you must not do this since it changes the
mean-variance relationship for your data and invalidates the
subsequent
analysis. You need to input the true read counts into edgeR.
Best wishes
Gordon
> Date: Fri, 21 Oct 2011 15:27:25 -0700
> From: Lana Schaffer <schaffer at="" scripps.edu="">
> To: "'bioconductor at r-project.org'" <bioconductor at="" r-project.org="">
> Subject: [BioC] uneven counts for edgeR
>
> Hi,
> I have replicate sample counts for 2 groups but one sample is 4x
number of mapped reads
> Than the other samples.
> 528,428
>
> 625,889
>
> 498,569
>
> 2,328,333
>
> I divided all the mapped transcript reads by 4 and then did the
> normalization and analysis With edgeR. What do you recommend to do
with
> the 4th sample counts?
>
> Lana Schaffer
> Biostatistics, Informatics
> DNA Array Core Facility
> 858-784-2263
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}
Gordon Smyth wrote:
> edgeR has no difficulty with uneven library sizes, and will adjust
for
> this automatically for this during the analysis. There is no need
> for you to do anything other than follow a standard analysis
pipeline.
Lana Schaffer wrote:
> Is the same true for DeSeq?
Yes, it is.
Simon
Gordon,
An unnamed company is claiming that the RPKM counts and/or
Some transformation of the RPKM counts is 90% normal, 5% NB,
And 5% poisson distribution using the Akaiki Information Criteria.
Can you explain why this is or is not plausable?
Lana
-----Original Message-----
From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU]
Sent: Saturday, October 22, 2011 5:42 PM
To: Lana Schaffer
Cc: Bioconductor mailing list
Subject: uneven counts for edgeR
Dear Lana,
edgeR has no difficulty with uneven library sizes, and will adjust for
this automatically for this during the analysis. There is no need for
you
to do anything other than follow a standard analysis pipeline.
You do not need to standardize the 4th sample by dividing the counts
by
dividing by 4, in fact you must not do this since it changes the
mean-variance relationship for your data and invalidates the
subsequent
analysis. You need to input the true read counts into edgeR.
Best wishes
Gordon
> Date: Fri, 21 Oct 2011 15:27:25 -0700
> From: Lana Schaffer <schaffer at="" scripps.edu="">
> To: "'bioconductor at r-project.org'" <bioconductor at="" r-project.org="">
> Subject: [BioC] uneven counts for edgeR
>
> Hi,
> I have replicate sample counts for 2 groups but one sample is 4x
number of mapped reads
> Than the other samples.
> 528,428
>
> 625,889
>
> 498,569
>
> 2,328,333
>
> I divided all the mapped transcript reads by 4 and then did the
> normalization and analysis With edgeR. What do you recommend to do
with
> the 4th sample counts?
>
> Lana Schaffer
> Biostatistics, Informatics
> DNA Array Core Facility
> 858-784-2263
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}
Dear Lana,
It sounds strange, but it would be unwise for me to comment without
knowing what they mean. It is of course technically impossible for
RPKM
to be negative binomial or Poisson because RPKM values are not
integers.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.auhttp://www.statsci.org/smyth
On Wed, 26 Oct 2011, Lana Schaffer wrote:
> Gordon,
> An unnamed company is claiming that the RPKM counts and/or
> Some transformation of the RPKM counts is 90% normal, 5% NB,
> And 5% poisson distribution using the Akaiki Information Criteria.
> Can you explain why this is or is not plausable?
>
> Lana
>
> -----Original Message-----
> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU]
> Sent: Saturday, October 22, 2011 5:42 PM
> To: Lana Schaffer
> Cc: Bioconductor mailing list
> Subject: uneven counts for edgeR
>
> Dear Lana,
>
> edgeR has no difficulty with uneven library sizes, and will adjust
for
> this automatically for this during the analysis. There is no need
for you
> to do anything other than follow a standard analysis pipeline.
>
> You do not need to standardize the 4th sample by dividing the counts
by
> dividing by 4, in fact you must not do this since it changes the
> mean-variance relationship for your data and invalidates the
subsequent
> analysis. You need to input the true read counts into edgeR.
>
> Best wishes
> Gordon
>
>
>> Date: Fri, 21 Oct 2011 15:27:25 -0700
>> From: Lana Schaffer <schaffer at="" scripps.edu="">
>> To: "'bioconductor at r-project.org'" <bioconductor at="" r-project.org="">
>> Subject: [BioC] uneven counts for edgeR
>>
>> Hi,
>> I have replicate sample counts for 2 groups but one sample is 4x
number of mapped reads
>> Than the other samples.
>> 528,428
>>
>> 625,889
>>
>> 498,569
>>
>> 2,328,333
>>
>> I divided all the mapped transcript reads by 4 and then did the
>> normalization and analysis With edgeR. What do you recommend to do
with
>> the 4th sample counts?
>>
>> Lana Schaffer
>> Biostatistics, Informatics
>> DNA Array Core Facility
>> 858-784-2263
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}