How do I remove bad samples/probes before normalization and SWAN?
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi there, I am learning minfi using a dataset containing 24 samples. I know there are 2 QC samples, 2 duplicated samples, and one bad sample I determined by minfi. My question is: what is the proper procedure to remove these samples from the data? Should I remove these sample file names from the sample sheet, and re-build the RGSet again? Similar question goes to probes identified to have detection p-values higher than 0.01, and CpGs in Chromosome X & Y. I think these CpGs should be excluded before doing normalization and SWAN, but I really don???t know how. One thing I have tried is to remove those probes (and also the 5 samples I want to remove) from MSet.raw, and then use this reduced MSet.raw.reduced to do SWAN: MSet.swan<-preprocessSWAN(RGSet, mSet= MSet.raw.reduced) Here RGSet is still the original one with 24 samples and all 485512 probs, but MSet.raw.reduced has only 19 samples and about 470K CpGs. The MSet.swan I got has same dimensions as MSet.raw.reduced, but I don???t know if this method is valid or not. I do know this cannot be applied to get MSet.norm. If this is not a valid method, what is the correct way to do it? I really appreciate your help and wish you a happy holiday season! Qin -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Sent via the guest posting facility at bioconductor.org.
Normalization minfi Normalization minfi • 3.5k views
ADD COMMENT
1
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 10 months ago
United States
Bad samples: in general, I believe most people would recommend removing bad samples prior to normalization and analysis. You don't have to rebuild the RGChannelSet, but you do RGset2 = RGset[, goodSamples] I would not exclude probes with low detection p-values. Many people remove sex chromosomes prior to analysis and that makes some sense. For ease of use, I would do it after normalization; although it might depend on the exact normalization strategy I use. For minfi::preprocessQuantile we treat the sex chromosomes in a special fashion, but they don't do it for SWAN. Best, Kasper On Tue, Dec 10, 2013 at 2:45 PM, Qin [guest] <guest@bioconductor.org> wrote: > > Hi there, > > I am learning minfi using a dataset containing 24 samples. I know there > are 2 QC samples, 2 duplicated samples, and one bad sample I determined by > minfi. My question is: what is the proper procedure to remove these samples > from the data? Should I remove these sample file names from the sample > sheet, and re-build the RGSet again? Similar question goes to probes > identified to have detection p-values higher than 0.01, and CpGs in > Chromosome X & Y. I think these CpGs should be excluded before doing > normalization and SWAN, but I really don’t know how. One thing I have tried > is to remove those probes (and also the 5 samples I want to remove) from > MSet.raw, and then use this reduced MSet.raw.reduced to do SWAN: > > MSet.swan<-preprocessSWAN(RGSet, mSet= MSet.raw.reduced) > > Here RGSet is still the original one with 24 samples and all 485512 probs, > but MSet.raw.reduced has only 19 samples and about 470K CpGs. The MSet.swan > I got has same dimensions as MSet.raw.reduced, but I don’t know if this > method is valid or not. I do know this cannot be applied to get MSet.norm. > If this is not a valid method, what is the correct way to do it? > > I really appreciate your help and wish you a happy holiday season! > > Qin > > -- output of sessionInfo(): > > R version 2.15.2 (2012-10-26) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > -- > Sent via the guest posting facility at bioconductor.org. > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode

Kasper, just to clarify...do you mean that we can leave sex chromosomes in when using minfi::preprocessQuantile?

Also, is there a simple command to remove sex chromosomes from an RGset?

ADD REPLY

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6