MINFI missing samples in qcreports and densityplots
3
0
Entering edit mode
@annabellecongras-12119
Last seen 7.8 years ago

Hello everybody, 

I am using Minfi (1.20.2) to analyse data from Illumina MethylEPIC 850k beadchip. I have 24 samples on 3 arrays. 

I am pretty new to this kinf of analysis, and R in general, so my question may seem really naïve, but I would really appreciate your help. 

When I run qcReport, I have 6 missing samples in the report for DensityPlot and DensityBeanPlot but all samples are present in all others control graphs. I observe the same by running directly densityplot. It seems that the problem come from sampGroups and sampName, because all samples are present when I only run densiPlots without sampName or SampGroup. I don't understant what went wrong... any ideas? Moreover the previous and following steps seem to works fine in the script. 

Here is my code : 

## Set working directory 
setwd(dir="C:/Users/Laure/Documents/iScan/AC166/Minfi_analysis")

## Load packages
require(minfi)
require(limma)
require(IlluminaHumanMethylationEPICmanifest)
require(IlluminaHumanMethylationEPICanno.ilm10b2.hg19)
require(RColorBrewer)

ann850k = getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b2.hg19)

## set directory that contains the idat files
idatdir = "C:/Users/Laure/Documents/iScan/AC166/IDAT" 
list.files(path = idatdir) ##OK found the 3 folder and the sample sheet
list.files(file.path(path=idatdir,"200989060234")) ##OK found 8 green and 8 red

## read the sample sheet that describes the experiment
targets = read.metharray.sheet(idatdir,pattern="samplesheetminfi.csv") 
targets ## OK

## read IDAT files specified in targets file
rgSet = read.metharray.exp(targets = targets)
head(rgSet) ##OK
pheno <- pData(rgSet)
pheno

## calculate the detection p-values
pVals = detectionP(rgSet)
failed.01<- pVals > 0.01
failedProbes <-rownames(failed.01)[rowMeans(failed.01)>0]
sum(rowMeans(failed.01)>0)  # how many probes failed in more thant 0% of samples : 13604
failedProbes

## plot mean detection p-values for all samples
jpeg(filename="Mean_Detection_pVal.jpg")
dotchart(apply(pVals,2,mean),las=2, main="Mean Detection p-values")
dev.off()
#abline(h=0.05,col="red")

## generate a quality control report for the raw data
qcReport(rgSet, sampNames = pheno$SampleName, sampGroups = pheno$Group, maxSamplesPerPage = 24, pdf="qcReport.pdf", controls = c("BISULFITE CONVERSION I", "BISULFITE CONVERSION II", "EXTENSION", "HYBRIDIZATION","NON-POLYMORPHIC", "SPECIFICITY I", "SPECIFICITY II", "TARGET REMOVAL"))
## MISSING 6 SAMPLES!

## Densityplot before normalisation 
densityPlot(rgSet, sampGroups=pheno$Group, add=TRUE, legend = TRUE, main = "Density plots before normalisation") ### MISSING 6 SAMPLES!
densityBeanPlot(rgSet, sampGroups=targets$Group, numPositions = 1000, main = "Bean plots before normalisation")### MISSING 6 SAMPLES!
densityPlot(rgSet, main = "Density plots before normalisation") ## WORKING
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2                                  IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.6.0
 [3] IlluminaHumanMethylationEPICmanifest_0.3.0          limma_3.30.7                                       
 [5] minfi_1.20.2                                        bumphunter_1.14.0                                  
 [7] locfit_1.5-9.1                                      iterators_1.0.8                                    
 [9] foreach_1.4.3                                       Biostrings_2.42.1                                  
[11] XVector_0.14.0                                      SummarizedExperiment_1.4.0                         
[13] GenomicRanges_1.26.2                                GenomeInfoDb_1.10.2                                
[15] IRanges_2.8.1                                       S4Vectors_0.12.1                                   
[17] Biobase_2.34.0                                      BiocGenerics_0.20.0                                

loaded via a namespace (and not attached):
 [1] genefilter_1.56.0        splines_3.3.2            lattice_0.20-34          beanplot_1.2             rtracklayer_1.34.1      
 [6] GenomicFeatures_1.26.2   XML_3.98-1.5             survival_2.39-5          DBI_0.5-1                BiocParallel_1.8.1      
[11] registry_0.3             rngtools_1.2.4           doRNG_1.6                matrixStats_0.51.0       plyr_1.8.4              
[16] pkgmaker_0.22            stringr_1.1.0            zlibbioc_1.20.0          codetools_0.2-15         memoise_1.0.0           
[21] biomaRt_2.30.0           AnnotationDbi_1.36.0     illuminaio_0.16.0        preprocessCore_1.36.0    Rcpp_0.12.8             
[26] xtable_1.8-2             openssl_0.9.6            base64_2.0               annotate_1.52.1          Rsamtools_1.26.1        
[31] digest_0.6.11            stringi_1.1.2            nor1mix_1.2-2            grid_3.3.2               GEOquery_2.40.0         
[36] quadprog_1.5-5           tools_3.3.2              bitops_1.0-6             magrittr_1.5             siggenes_1.48.0         
[41] RCurl_1.95-4.8           RSQLite_1.1-1            MASS_7.3-45              Matrix_1.2-7.1           data.table_1.10.0       
[46] httr_1.2.1               reshape_0.8.6            R6_2.2.0                 mclust_5.2.1             nlme_3.1-128            
[51] GenomicAlignments_1.10.0 multtest_2.30.0 

 

minfi illumina methylationEPIC densityplot • 1.9k views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 16 months ago
United States
Look at the pheno object; perhaps there are missing values? On Fri, Jan 6, 2017 at 2:47 PM, annabelle.congras [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User annabelle.congras <https: support.bioconductor.org="" u="" 12119=""/> wrote Question: > MINFI missing samples in qcreports and densityplots > <https: support.bioconductor.org="" p="" 90855=""/>: > > Hello everybody, > > I am using Minfi (1.20.2) to analyse data from Illumina MethylEPIC 850k > beadchip. I have 24 samples on 3 arrays. > > I am pretty new to this kinf of analysis, and R in general, so my question > may seem really naïve, but I would really appreciate your help. > > When I run qcReport, I have 6 missing samples in the report for > DensityPlot and DensityBeanPlot but all samples are present in all others > control graphs. I observe the same by running directly densityplot. It > seems that the problem come from sampGroups and sampName, because all > samples are present when I only run densiPlots without sampName or > SampGroup. I don't understant what went wrong... any ideas? Moreover the > previous and following steps seem to works fine in the script. > > Here is my code : > > ## Set working directory > setwd(dir="C:/Users/Laure/Documents/iScan/AC166/Minfi_analysis") > > ## Load packages > require(minfi) > require(limma) > require(IlluminaHumanMethylationEPICmanifest) > require(IlluminaHumanMethylationEPICanno.ilm10b2.hg19) > require(RColorBrewer) > > ann850k = getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b2.hg19) > > ## set directory that contains the idat files > idatdir = "C:/Users/Laure/Documents/iScan/AC166/IDAT" > list.files(path = idatdir) ##OK found the 3 folder and the sample sheet > list.files(file.path(path=idatdir,"200989060234")) ##OK found 8 green and 8 red > > ## read the sample sheet that describes the experiment > targets = read.metharray.sheet(idatdir,pattern="samplesheetminfi.csv") > targets ## OK > > ## read IDAT files specified in targets file > rgSet = read.metharray.exp(targets = targets) > head(rgSet) ##OK > pheno <- pData(rgSet) > pheno > > ## calculate the detection p-values > pVals = detectionP(rgSet) > failed.01<- pVals > 0.01 > failedProbes <-rownames(failed.01)[rowMeans(failed.01)>0] > sum(rowMeans(failed.01)>0) # how many probes failed in more thant 0% of samples : 13604 > failedProbes > > ## plot mean detection p-values for all samples > jpeg(filename="Mean_Detection_pVal.jpg") > dotchart(apply(pVals,2,mean),las=2, main="Mean Detection p-values") > dev.off() > #abline(h=0.05,col="red") > > ## generate a quality control report for the raw data > qcReport(rgSet, sampNames = pheno$SampleName, sampGroups = pheno$Group, maxSamplesPerPage = 24, pdf="qcReport.pdf", controls = c("BISULFITE CONVERSION I", "BISULFITE CONVERSION II", "EXTENSION", "HYBRIDIZATION","NON-POLYMORPHIC", "SPECIFICITY I", "SPECIFICITY II", "TARGET REMOVAL")) > ## MISSING 6 SAMPLES! > > ## Densityplot before normalisation > densityPlot(rgSet, sampGroups=pheno$Group, add=TRUE, legend = TRUE, main = "Density plots before normalisation") ### MISSING 6 SAMPLES! > densityBeanPlot(rgSet, sampGroups=targets$Group, numPositions = 1000, main = "Bean plots before normalisation")### MISSING 6 SAMPLES! > densityPlot(rgSet, main = "Density plots before normalisation") ## WORKING > > > sessionInfo() > R version 3.3.2 (2016-10-31) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats4 parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] RColorBrewer_1.1-2 IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.6.0 > [3] IlluminaHumanMethylationEPICmanifest_0.3.0 limma_3.30.7 > [5] minfi_1.20.2 bumphunter_1.14.0 > [7] locfit_1.5-9.1 iterators_1.0.8 > [9] foreach_1.4.3 Biostrings_2.42.1 > [11] XVector_0.14.0 SummarizedExperiment_1.4.0 > [13] GenomicRanges_1.26.2 GenomeInfoDb_1.10.2 > [15] IRanges_2.8.1 S4Vectors_0.12.1 > [17] Biobase_2.34.0 BiocGenerics_0.20.0 > > loaded via a namespace (and not attached): > [1] genefilter_1.56.0 splines_3.3.2 lattice_0.20-34 beanplot_1.2 rtracklayer_1.34.1 > [6] GenomicFeatures_1.26.2 XML_3.98-1.5 survival_2.39-5 DBI_0.5-1 BiocParallel_1.8.1 > [11] registry_0.3 rngtools_1.2.4 doRNG_1.6 matrixStats_0.51.0 plyr_1.8.4 > [16] pkgmaker_0.22 stringr_1.1.0 zlibbioc_1.20.0 codetools_0.2-15 memoise_1.0.0 > [21] biomaRt_2.30.0 AnnotationDbi_1.36.0 illuminaio_0.16.0 preprocessCore_1.36.0 Rcpp_0.12.8 > [26] xtable_1.8-2 openssl_0.9.6 base64_2.0 annotate_1.52.1 Rsamtools_1.26.1 > [31] digest_0.6.11 stringi_1.1.2 nor1mix_1.2-2 grid_3.3.2 GEOquery_2.40.0 > [36] quadprog_1.5-5 tools_3.3.2 bitops_1.0-6 magrittr_1.5 siggenes_1.48.0 > [41] RCurl_1.95-4.8 RSQLite_1.1-1 MASS_7.3-45 Matrix_1.2-7.1 data.table_1.10.0 > [46] httr_1.2.1 reshape_0.8.6 R6_2.2.0 mclust_5.2.1 nlme_3.1-128 > [51] GenomicAlignments_1.10.0 multtest_2.30.0 > > > > ------------------------------ > > Post tags: minfi, illumina, methylationEPIC, densityplot > > You may reply via email or visit MINFI missing samples in qcreports and densityplots >
ADD COMMENT
0
Entering edit mode
@annabellecongras-12119
Last seen 7.8 years ago

Hi Kasper, thanks for your answer.

The pheno object is complete. Moreover I have the same issue when I use targets$SampleName instead  pheno$SampleName in qcReport or densityPlots. 

About that object, is this normal that I obtain 2 different output when using pData (data.frame) or phenoData (Formal Call AnnotatedDataFrame)? 

 

 

 

ADD COMMENT
0
Entering edit mode
@annabellecongras-12119
Last seen 7.8 years ago

Hi again, 

Actually if I don't provide any of the sampName and sampGroup options, the densityplots and densitybeanplots in my qcreport are all empty.. does it help understanding the problem?

ADD COMMENT

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6