Question

How do I remove bad samples/probes before normalization and SWAN?

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hi there, I am learning minfi using a dataset containing 24 samples. I know there are 2 QC samples, 2 duplicated samples, and one bad sample I determined by minfi. My question is: what is the proper procedure to remove these samples from the data? Should I remove these sample file names from the sample sheet, and re-build the RGSet again? Similar question goes to probes identified to have detection p-values higher than 0.01, and CpGs in Chromosome X & Y. I think these CpGs should be excluded before doing normalization and SWAN, but I really don???t know how. One thing I have tried is to remove those probes (and also the 5 samples I want to remove) from MSet.raw, and then use this reduced MSet.raw.reduced to do SWAN: MSet.swan<-preprocessSWAN(RGSet, mSet= MSet.raw.reduced) Here RGSet is still the original one with 24 samples and all 485512 probs, but MSet.raw.reduced has only 19 samples and about 470K CpGs. The MSet.swan I got has same dimensions as MSet.raw.reduced, but I don???t know if this method is valid or not. I do know this cannot be applied to get MSet.norm. If this is not a valid method, what is the correct way to do it? I really appreciate your help and wish you a happy holiday season! Qin -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Sent via the guest posting facility at bioconductor.org.

Normalization minfi Normalization minfi • 3.5k views

ADD COMMENT • link updated 10.4 years ago by Kasper Daniel Hansen ★ 6.5k • written 10.4 years ago by Guest User ★ 13k

score 1 · Answer 1 · 2013-12-10

Bad samples: in general, I believe most people would recommend removing bad samples prior to normalization and analysis. You don't have to rebuild the RGChannelSet, but you do RGset2 = RGset[, goodSamples] I would not exclude probes with low detection p-values. Many people remove sex chromosomes prior to analysis and that makes some sense. For ease of use, I would do it after normalization; although it might depend on the exact normalization strategy I use. For minfi::preprocessQuantile we treat the sex chromosomes in a special fashion, but they don't do it for SWAN. Best, Kasper On Tue, Dec 10, 2013 at 2:45 PM, Qin [guest] <guest@bioconductor.org> wrote: > > Hi there, > > I am learning minfi using a dataset containing 24 samples. I know there > are 2 QC samples, 2 duplicated samples, and one bad sample I determined by > minfi. My question is: what is the proper procedure to remove these samples > from the data? Should I remove these sample file names from the sample > sheet, and re-build the RGSet again? Similar question goes to probes > identified to have detection p-values higher than 0.01, and CpGs in > Chromosome X & Y. I think these CpGs should be excluded before doing > normalization and SWAN, but I really dont know how. One thing I have tried > is to remove those probes (and also the 5 samples I want to remove) from > MSet.raw, and then use this reduced MSet.raw.reduced to do SWAN: > > MSet.swan<-preprocessSWAN(RGSet, mSet= MSet.raw.reduced) > > Here RGSet is still the original one with 24 samples and all 485512 probs, > but MSet.raw.reduced has only 19 samples and about 470K CpGs. The MSet.swan > I got has same dimensions as MSet.raw.reduced, but I dont know if this > method is valid or not. I do know this cannot be applied to get MSet.norm. > If this is not a valid method, what is the correct way to do it? > > I really appreciate your help and wish you a happy holiday season! > > Qin > > -- output of sessionInfo(): > > R version 2.15.2 (2012-10-26) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > -- > Sent via the guest posting facility at bioconductor.org. > [[alternative HTML version deleted]]