RNAseq data normalization without differential expression
1
0
Entering edit mode
Mete Civelek ▴ 180
@mete-civelek-4566
Last seen 10.2 years ago
Dear All, I have RNAseq counts for 400 human donors. This is a random sampling of human subjects from a population-wide study. I am not interested in differential expression of genes between certain groups. There are differences in sequence reads because of library size therefore I need to normalize the counts. I have been reading the postings on this list regarding the normalization methods in DEseq and edgeR. I looked at the reference manuals of both of these packages. I understand that they both use different normalization approaches. My understanding is that while both approaches use the sample information (i.e. whether they are from control or treatment condition) in order to create a list object as a first step, this information is not used in the normalization step but only in the differential expression analysis step. Is this correct? I am also curious to hear other's opinions regarding different approaches for a study design like mine. Perhaps I have been looking at the wrong places but the papers I found in the literature seem to be concerned with differential expression of transcripts between groups. Mete ________________________________ IMPORTANT WARNING: This email (and any attachments) is o...{{dropped:12}}
RNASeq Normalization edgeR DESeq RNASeq Normalization edgeR DESeq • 1.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 15 hours ago
United States
hi Mete, On Mon, Jul 29, 2013 at 3:22 AM, Mete Civelek <mcivelek at="" mednet.ucla.edu=""> wrote: > Dear All, > > I have RNAseq counts for 400 human donors. This is a random sampling of human subjects from a population-wide study. I am not interested in differential expression of genes between certain groups. There are differences in sequence reads because of library size therefore I need to normalize the counts. > > I have been reading the postings on this list regarding the normalization methods in DEseq and edgeR. I looked at the reference manuals of both of these packages. I understand that they both use different normalization approaches. My understanding is that while both approaches use the sample information (i.e. whether they are from control or treatment condition) in order to create a list object as a first step, this information is not used in the normalization step but only in the differential expression analysis step. Is this correct? > Yes, this is true for DESeq/DESeq2. The transformations in DESeq2 have an argument blind, which defaults to TRUE, which estimates the dispersion for the transformation without using any information of the experimental design. It depends on what you want to do with the normalized data, but the VST or rlog transformation should help you for instance cluster samples or genes in a large data set, by stabilizing the variance across the range of mean counts. If there are large difference in library sizes, we recommend to use rlogTransformation(). Furthermore, the rlog implementation in the devel branch seems to perform qualitatively better than the one in the release branch. The difference is that in the devel branch, the rlog transformation uses the fitted dispersion values rather than the shrunken dispersion estimates. This makes the rlog perform more like the VST, and avoids squashing what could be large, true differences across samples for high count genes. Mike
ADD COMMENT
0
Entering edit mode
Hi Mike, Thank you for the suggestions. I will try out the log transformation function. Mete -------- Original message -------- From: Michael Love <michaelisaiahlove@gmail.com> Date: To: "Civelek, Mete" <mcivelek at="" mednet.ucla.edu=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] RNAseq data normalization without differential expression hi Mete, On Mon, Jul 29, 2013 at 3:22 AM, Mete Civelek <mcivelek at="" mednet.ucla.edu=""> wrote: > Dear All, > > I have RNAseq counts for 400 human donors. This is a random sampling of human subjects from a population-wide study. I am not interested in differential expression of genes between certain groups. There are differences in sequence reads because of library size therefore I need to normalize the counts. > > I have been reading the postings on this list regarding the normalization methods in DEseq and edgeR. I looked at the reference manuals of both of these packages. I understand that they both use different normalization approaches. My understanding is that while both approaches use the sample information (i.e. whether they are from control or treatment condition) in order to create a list object as a first step, this information is not used in the normalization step but only in the differential expression analysis step. Is this correct? > Yes, this is true for DESeq/DESeq2. The transformations in DESeq2 have an argument blind, which defaults to TRUE, which estimates the dispersion for the transformation without using any information of the experimental design. It depends on what you want to do with the normalized data, but the VST or rlog transformation should help you for instance cluster samples or genes in a large data set, by stabilizing the variance across the range of mean counts. If there are large difference in library sizes, we recommend to use rlogTransformation(). Furthermore, the rlog implementation in the devel branch seems to perform qualitatively better than the one in the release branch. The difference is that in the devel branch, the rlog transformation uses the fitted dispersion values rather than the shrunken dispersion estimates. This makes the rlog perform more like the VST, and avoids squashing what could be large, true differences across samples for high count genes. Mike ________________________________ IMPORTANT WARNING: This email (and any attachments) is o...{{dropped:9}}
ADD REPLY

Login before adding your answer.

Traffic: 700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6