Question

RNAseq data normalization without differential expression

0

Entering edit mode

Mete Civelek ▴ 180

@mete-civelek-4566

Last seen 10.2 years ago

Dear All, I have RNAseq counts for 400 human donors. This is a random sampling of human subjects from a population-wide study. I am not interested in differential expression of genes between certain groups. There are differences in sequence reads because of library size therefore I need to normalize the counts. I have been reading the postings on this list regarding the normalization methods in DEseq and edgeR. I looked at the reference manuals of both of these packages. I understand that they both use different normalization approaches. My understanding is that while both approaches use the sample information (i.e. whether they are from control or treatment condition) in order to create a list object as a first step, this information is not used in the normalization step but only in the differential expression analysis step. Is this correct? I am also curious to hear other's opinions regarding different approaches for a study design like mine. Perhaps I have been looking at the wrong places but the papers I found in the literature seem to be concerned with differential expression of transcripts between groups. Mete ________________________________ IMPORTANT WARNING: This email (and any attachments) is o...{{dropped:12}}

RNASeq Normalization edgeR DESeq RNASeq Normalization edgeR DESeq • 1.1k views

ADD COMMENT • link updated 11.3 years ago by Michael Love 43k • written 11.3 years ago by Mete Civelek ▴ 180

score 0 · Answer 1 · 2013-07-29

hi Mete, On Mon, Jul 29, 2013 at 3:22 AM, Mete Civelek <mcivelek at="" mednet.ucla.edu=""> wrote: > Dear All, > > I have RNAseq counts for 400 human donors. This is a random sampling of human subjects from a population-wide study. I am not interested in differential expression of genes between certain groups. There are differences in sequence reads because of library size therefore I need to normalize the counts. > > I have been reading the postings on this list regarding the normalization methods in DEseq and edgeR. I looked at the reference manuals of both of these packages. I understand that they both use different normalization approaches. My understanding is that while both approaches use the sample information (i.e. whether they are from control or treatment condition) in order to create a list object as a first step, this information is not used in the normalization step but only in the differential expression analysis step. Is this correct? > Yes, this is true for DESeq/DESeq2. The transformations in DESeq2 have an argument blind, which defaults to TRUE, which estimates the dispersion for the transformation without using any information of the experimental design. It depends on what you want to do with the normalized data, but the VST or rlog transformation should help you for instance cluster samples or genes in a large data set, by stabilizing the variance across the range of mean counts. If there are large difference in library sizes, we recommend to use rlogTransformation(). Furthermore, the rlog implementation in the devel branch seems to perform qualitatively better than the one in the release branch. The difference is that in the devel branch, the rlog transformation uses the fitted dispersion values rather than the shrunken dispersion estimates. This makes the rlog perform more like the VST, and avoids squashing what could be large, true differences across samples for high count genes. Mike