Summarisation in oligo package after exclusion of probes containing SNPs

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Dear Benilton/BioC list, Query: Is it possible to perform summarisation using the oligo package after exclusion of probes containing SNPs? Background: I've performed an eQTL analysis where the expression data has been obtained on Affymetrix HuGene ST 1.1 microarrays. After reading in the CEL files, I pre-processed the GeneFeatureSet doing background correction, quantile normalisation, and summarisation to gene-level using the rma function from the oligo package. # What my real data looks like: ( cels <- read.celfiles(pathToFiles) ) # generates a GeneFeatureSet GeneFeatureSet (storageMode: lockedEnvironment) assayData: 1178100 features, 90 samples element names: exprs protocolData rowNames: 75_459_CD16.CEL 76_460_CD16.CEL ... triad0078_H7_498_CD16.CEL (90 total) varLabels: exprs dates varMetadata: labelDescription channel phenoData rowNames: 75_459_CD16.CEL 76_460_CD16.CEL ... triad0078_H7_498_CD16.CEL (90 total) varLabels: cell_type info.batch.name ... age (29 total) varMetadata: labelDescription channel featureData: none experimentData: use 'experimentData(object)' Annotation: pd.hugene.1.1.st.v1 #Example workfow using oligoData for reproducibilty: library(oligo) library(oligoData) data(affyGeneFS) affyGeneFS # a GeneFeatureSet GeneFeatureSet (storageMode: lockedEnvironment) assayData: 1102500 features, 33 samples element names: exprs protocolData rowNames: TisMap_Brain_01_v1_WTGene1.CEL TisMap_Brain_02_v1_WTGene1.CEL ... TisMap_Thyroid_03_v1_WTGene1.CEL (33 total) varLabels: exprs dates varMetadata: labelDescription channel phenoData rowNames: TisMap_Brain_01_v1_WTGene1.CEL TisMap_Brain_02_v1_WTGene1.CEL ... TisMap_Thyroid_03_v1_WTGene1.CEL (33 total) varLabels: index varMetadata: labelDescription channel featureData: none experimentData: use 'experimentData(object)' Annotation: pd.hugene.1.0.st.v1 # generate gene-level expression values: eset <- rma(affyGeneFS) # equivalent to: bgCorrected <- backgroundCorrect(affyGeneFS) # Background correct normalized <- normalize(bgCorrected, method="quantile") # Quantile normalise eset2 <- rma(normalized, background=F, normalize=F, subset=NULL) # Summarize with median polish I would like to re-run the eQTL analysis, this time excluding probes from the expression data that contain SNPs in >1% of CEU individuals to see whether this has any substantial effect on my findings. My concern is the possibility of apparent eQTLs which are in fact artefacts due to less efficient binding between probes and transcripts containing the minor allele. My questions are- 1) Having obtained a list of the probes to exclude, is it valid to simply perform BG correction and quantile normalisation on the whole GeneFeatureSet, but to then remove probes containing SNPs prior to summarisation? Would the validity of mapping the resulting probesets to Entrez ids etc. be questionable if some probes had been excluded? 2) If such an approach is reasonable, how can I map Affymetrix probe- level identifiers to the GeneFeatureSet? # I know how to get feature ids for PM probes and their groupings into gene-level probesets: featureInfo <- oligo:::stArrayPmInfo(normalized, target = 'core') head(featureInfo,10) fid fsetid 1 116371 7892501 2 943979 7892501 3 493089 7892501 4 907039 7892501 5 1033309 7892502 6 653512 7892502 7 690769 7892502 8 997409 7892502 9 379979 7892503 10 469846 7892503 The fsetid s correspond to gene-level probesets. Are the fid s row indices in the GeneFeatureSet, and how can they be linked to Affy probe level ids? Assuming I can map Affy probe ids to the fid column, here is my proposed solution for summarisation after removal of selected probes, but I wonder if there is a more straightforward way? # for demo purposes generate a random list of fids to remove # some fid s are duplicated in the data.frame as they map to multiple probesets, so use 'unique' set.seed(1234) fidToCut <- sample(unique(featureInfo$fid), 10000, replace=F) # remove rows from GeneFeatureSet corressponding to fid s we want to exclude normalized2 <- normalized[-fidToCut, ] # create mapping for the new GeneFeatureSet of how old row indices map to new indices map <- cbind(new_ind= 1:nrow(normalized2), old_ind= featureNames(normalized2) ) # pmi gives row indices of pm probes in 'normalized', not 'normalized2' pmi <- featureInfo[["fid"]] # re-order dataframe so 'old_ind' column is the same as pmi map1 <- map[match(pmi, map[,"old_ind"]),] map1 <- cbind(map1, featureInfo) tail(map1) new_ind old_ind fid fsetid 861488 961136 969962 969962 8180418 861489 208831 210719 210719 8180418 861490 870109 878113 878113 8180418 861491 598190 603636 603636 8180418 861492 858752 866642 866642 8180418 861493 713834 720353 720353 8180418 # we need to get rid of rows with NA... e.g. where 'fid' but no 'new_ind' exists: # effectively recreating a new 'featureInfo' dataframe map1 <- na.omit(map1) # now we can get the new row indices, and the probesets they belong to pmi <- as.numeric( as.character( map1[["new_ind"]] )) # 'new_ind' got coerced to a factor pnVec <- as.character(map1[["fsetid"]]) # subset the normalized probe-level values to keep probes by indexes in pmi pms <- exprs(normalized2)[pmi, , drop = FALSE] dimnames(pms) <- NULL colnames(pms) <- sampleNames(normalized2) # get a matrix of summarized values, which can be used to create an ExpressionSet theExprs <- basicRMA(pms, pnVec, normalize=F, background=F) I'd greatly appreciate any advice. I'm a biologist by background so apologies if this is rather basic. Thanks very much Jimmy Peters, Smith Lab, Cambridge Institute for Medical Research. -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] pd.hugene.1.0.st.v1_3.8.0 oligoData_1.8.0 biomaRt_2.18.0 pd.hugene.1.1.st.v1_3.8.0 RSQLite_0.11.4 DBI_0.2-7 [7] oligo_1.26.0 Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] affxparser_1.34.0 affyio_1.30.0 BiocInstaller_1.12.0 Biostrings_2.30.0 bit_1.1-10 codetools_0.2-8 ff_2.2-12 [8] foreach_1.4.1 GenomicRanges_1.14.1 IRanges_1.20.0 iterators_1.0.6 preprocessCore_1.24.0 RCurl_1.95-4.1 splines_3.0.2 [15] stats4_3.0.2 tools_3.0.2 XML_3.95-0.2 XVector_0.2.0 zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.

probe affy oligo probe affy oligo • 1.8k views

ADD COMMENT • link updated 10.5 years ago by Daniel Bottomly ▴ 20 • written 10.5 years ago by Guest User ★ 13k

0

Entering edit mode

Daniel Bottomly ▴ 20

@daniel-bottomly-6224

Last seen 5.9 years ago

United States

Hi Jimmy: 1.) We do variations on this procedure quite frequently in the field of mouse genetics and do find it to be useful in practice. The most common way that I have seen is to remove the probes ahead of time prior to background correction. That being said, it would be interesting to compare the effect of probe removal ahead of time to probe removal after background correction and normalization. I suppose there will be cases where the removal of a probe from a probeset would remove an assignment to an Entrez gene ID such as the case where the probeset is removed in its entirety. You could always use (re)alignments of the probes to directly determine which genes are interrogated. 2.) I will defer to Benilton or others in terms of the best way to do this. If you are interested and willing to get your hands a little dirty I could point you to our approach based on the oligo package which is still under development but does contain documentation, unit tests, etc. Thanks, Dan On 11/3/13 7:00 AM, "Jimmy Peters [guest]" <guest at="" bioconductor.org=""> wrote: > >Dear Benilton/BioC list, > >Query: Is it possible to perform summarisation using the oligo package >after exclusion of probes containing SNPs? > >Background: I've performed an eQTL analysis where the expression data has >been obtained on Affymetrix HuGene ST 1.1 microarrays. >After reading in the CEL files, I pre-processed the GeneFeatureSet doing >background correction, quantile normalisation, and summarisation to >gene-level using the rma function from the oligo package. > ># What my real data looks like: >( cels <- read.celfiles(pathToFiles) ) # generates a GeneFeatureSet > >GeneFeatureSet (storageMode: lockedEnvironment) >assayData: 1178100 features, 90 samples >element names: exprs >protocolData >rowNames: 75_459_CD16.CEL 76_460_CD16.CEL ... triad0078_H7_498_CD16.CEL >(90 total) >varLabels: exprs dates >varMetadata: labelDescription channel >phenoData >rowNames: 75_459_CD16.CEL 76_460_CD16.CEL ... triad0078_H7_498_CD16.CEL >(90 total) >varLabels: cell_type info.batch.name ... age (29 total) >varMetadata: labelDescription channel >featureData: none >experimentData: use 'experimentData(object)' >Annotation: pd.hugene.1.1.st.v1 > >#Example workfow using oligoData for reproducibilty: > >library(oligo) >library(oligoData) > >data(affyGeneFS) >affyGeneFS # a GeneFeatureSet > >GeneFeatureSet (storageMode: lockedEnvironment) >assayData: 1102500 features, 33 samples >element names: exprs >protocolData >rowNames: TisMap_Brain_01_v1_WTGene1.CEL TisMap_Brain_02_v1_WTGene1.CEL >... TisMap_Thyroid_03_v1_WTGene1.CEL (33 total) >varLabels: exprs dates >varMetadata: labelDescription channel >phenoData >rowNames: TisMap_Brain_01_v1_WTGene1.CEL TisMap_Brain_02_v1_WTGene1.CEL >... TisMap_Thyroid_03_v1_WTGene1.CEL (33 total) >varLabels: index >varMetadata: labelDescription channel >featureData: none >experimentData: use 'experimentData(object)' >Annotation: pd.hugene.1.0.st.v1 > ># generate gene-level expression values: >eset <- rma(affyGeneFS) > ># equivalent to: >bgCorrected <- backgroundCorrect(affyGeneFS) # >Background correct >normalized <- normalize(bgCorrected, method="quantile") # >Quantile normalise >eset2 <- rma(normalized, background=F, normalize=F, subset=NULL) # >Summarize with median polish > >I would like to re-run the eQTL analysis, this time excluding probes from >the expression data that contain >SNPs in >1% of CEU individuals to see whether this has any substantial >effect on my findings. My concern is >the possibility of apparent eQTLs which are in fact artefacts due to less >efficient binding between probes >and transcripts containing the minor allele. > >My questions are- > >1) Having obtained a list of the probes to exclude, is it valid to simply >perform BG correction and quantile normalisation on the whole >GeneFeatureSet, >but to then remove probes containing SNPs prior to summarisation? Would >the validity of mapping the resulting probesets to Entrez ids etc. >be questionable if some probes had been excluded? > >2) If such an approach is reasonable, how can I map Affymetrix >probe-level identifiers to the GeneFeatureSet? > ># I know how to get feature ids for PM probes and their groupings into >gene-level probesets: >featureInfo <- oligo:::stArrayPmInfo(normalized, target = 'core') > >head(featureInfo,10) > > fid fsetid >1 116371 7892501 >2 943979 7892501 >3 493089 7892501 >4 907039 7892501 >5 1033309 7892502 >6 653512 7892502 >7 690769 7892502 >8 997409 7892502 >9 379979 7892503 >10 469846 7892503 > >The fsetid s correspond to gene-level probesets. Are the fid s row >indices in the GeneFeatureSet, >and how can they be linked to Affy probe level ids? > >Assuming I can map Affy probe ids to the fid column, here is my proposed >solution for >summarisation after removal of selected probes, but I wonder if there is >a more straightforward way? > ># for demo purposes generate a random list of fids to remove ># some fid s are duplicated in the data.frame as they map to multiple >probesets, so use 'unique' >set.seed(1234) >fidToCut <- sample(unique(featureInfo$fid), 10000, replace=F) > ># remove rows from GeneFeatureSet corressponding to fid s we want to >exclude >normalized2 <- normalized[-fidToCut, ] > ># create mapping for the new GeneFeatureSet of how old row indices map to >new indices >map <- cbind(new_ind= 1:nrow(normalized2), old_ind= >featureNames(normalized2) ) > ># pmi gives row indices of pm probes in 'normalized', not 'normalized2' >pmi <- featureInfo[["fid"]] > ># re-order dataframe so 'old_ind' column is the same as pmi >map1 <- map[match(pmi, map[,"old_ind"]),] >map1 <- cbind(map1, featureInfo) > >tail(map1) > new_ind old_ind fid fsetid >861488 961136 969962 969962 8180418 >861489 208831 210719 210719 8180418 >861490 870109 878113 878113 8180418 >861491 598190 603636 603636 8180418 >861492 858752 866642 866642 8180418 >861493 713834 720353 720353 8180418 > ># we need to get rid of rows with NA... e.g. where 'fid' but no 'new_ind' >exists: ># effectively recreating a new 'featureInfo' dataframe >map1 <- na.omit(map1) > ># now we can get the new row indices, and the probesets they belong to >pmi <- as.numeric( as.character( map1[["new_ind"]] )) # 'new_ind' got >coerced to a factor >pnVec <- as.character(map1[["fsetid"]]) > ># subset the normalized probe-level values to keep probes by indexes in >pmi >pms <- exprs(normalized2)[pmi, , drop = FALSE] >dimnames(pms) <- NULL >colnames(pms) <- sampleNames(normalized2) ># get a matrix of summarized values, which can be used to create an >ExpressionSet >theExprs <- basicRMA(pms, pnVec, normalize=F, background=F) > >I'd greatly appreciate any advice. I'm a biologist by background so >apologies if this is rather basic. >Thanks very much >Jimmy Peters, Smith Lab, Cambridge Institute for Medical Research. > > -- output of sessionInfo(): > >R version 3.0.2 (2013-09-25) >Platform: x86_64-apple-darwin10.8.0 (64-bit) > >locale: >[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > >attached base packages: >[1] parallel stats graphics grDevices utils datasets methods >base > >other attached packages: > [1] pd.hugene.1.0.st.v1_3.8.0 oligoData_1.8.0 biomaRt_2.18.0 > pd.hugene.1.1.st.v1_3.8.0 RSQLite_0.11.4 DBI_0.2-7 > > [7] oligo_1.26.0 Biobase_2.22.0 >oligoClasses_1.24.0 BiocGenerics_0.8.0 > >loaded via a namespace (and not attached): > [1] affxparser_1.34.0 affyio_1.30.0 BiocInstaller_1.12.0 >Biostrings_2.30.0 bit_1.1-10 codetools_0.2-8 >ff_2.2-12 > [8] foreach_1.4.1 GenomicRanges_1.14.1 IRanges_1.20.0 >iterators_1.0.6 preprocessCore_1.24.0 RCurl_1.95-4.1 >splines_3.0.2 >[15] stats4_3.0.2 tools_3.0.2 XML_3.95-0.2 >XVector_0.2.0 zlibbioc_1.8.0 > >-- >Sent via the guest posting facility at bioconductor.org. > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.5 years ago Daniel Bottomly ▴ 20

0

Entering edit mode

Hi Dan, Thanks very much for your reply. Yes I'd be very interested in trying to implement your approach Best wishes Jimmy On 2013-11-05 16:59, Daniel Bottomly wrote: > Hi Jimmy: > > 1.) We do variations on this procedure quite frequently in the field of > mouse genetics and do find it to be useful in practice. The most common > way that I have seen is to remove the probes ahead of time prior to > background correction. That being said, it would be interesting to > compare the effect of probe removal ahead of time to probe removal > after > background correction and normalization. I suppose there will be cases > where the removal of a probe from a probeset would remove an assignment > to > an Entrez gene ID such as the case where the probeset is removed in its > entirety. You could always use (re)alignments of the probes to > directly > determine which genes are interrogated. > > 2.) I will defer to Benilton or others in terms of the best way to do > this. If you are interested and willing to get your hands a little > dirty > I could point you to our approach based on the oligo package which is > still under development but does contain documentation, unit tests, > etc. > > Thanks, > > Dan > > On 11/3/13 7:00 AM, "Jimmy Peters [guest]" <guest at="" bioconductor.org=""> > wrote: > >> >> Dear Benilton/BioC list, >> >> Query: Is it possible to perform summarisation using the oligo package >> after exclusion of probes containing SNPs? >> >> Background: I've performed an eQTL analysis where the expression data >> has >> been obtained on Affymetrix HuGene ST 1.1 microarrays. >> After reading in the CEL files, I pre-processed the GeneFeatureSet >> doing >> background correction, quantile normalisation, and summarisation to >> gene-level using the rma function from the oligo package. >> >> # What my real data looks like: >> ( cels <- read.celfiles(pathToFiles) ) # generates a GeneFeatureSet >> >> GeneFeatureSet (storageMode: lockedEnvironment) >> assayData: 1178100 features, 90 samples >> element names: exprs >> protocolData >> rowNames: 75_459_CD16.CEL 76_460_CD16.CEL ... >> triad0078_H7_498_CD16.CEL >> (90 total) >> varLabels: exprs dates >> varMetadata: labelDescription channel >> phenoData >> rowNames: 75_459_CD16.CEL 76_460_CD16.CEL ... >> triad0078_H7_498_CD16.CEL >> (90 total) >> varLabels: cell_type info.batch.name ... age (29 total) >> varMetadata: labelDescription channel >> featureData: none >> experimentData: use 'experimentData(object)' >> Annotation: pd.hugene.1.1.st.v1 >> >> #Example workfow using oligoData for reproducibilty: >> >> library(oligo) >> library(oligoData) >> >> data(affyGeneFS) >> affyGeneFS # a GeneFeatureSet >> >> GeneFeatureSet (storageMode: lockedEnvironment) >> assayData: 1102500 features, 33 samples >> element names: exprs >> protocolData >> rowNames: TisMap_Brain_01_v1_WTGene1.CEL >> TisMap_Brain_02_v1_WTGene1.CEL >> ... TisMap_Thyroid_03_v1_WTGene1.CEL (33 total) >> varLabels: exprs dates >> varMetadata: labelDescription channel >> phenoData >> rowNames: TisMap_Brain_01_v1_WTGene1.CEL >> TisMap_Brain_02_v1_WTGene1.CEL >> ... TisMap_Thyroid_03_v1_WTGene1.CEL (33 total) >> varLabels: index >> varMetadata: labelDescription channel >> featureData: none >> experimentData: use 'experimentData(object)' >> Annotation: pd.hugene.1.0.st.v1 >> >> # generate gene-level expression values: >> eset <- rma(affyGeneFS) >> >> # equivalent to: >> bgCorrected <- backgroundCorrect(affyGeneFS) >> # >> Background correct >> normalized <- normalize(bgCorrected, method="quantile") >> # >> Quantile normalise >> eset2 <- rma(normalized, background=F, normalize=F, subset=NULL) >> # >> Summarize with median polish >> >> I would like to re-run the eQTL analysis, this time excluding probes >> from >> the expression data that contain >> SNPs in >1% of CEU individuals to see whether this has any substantial >> effect on my findings. My concern is >> the possibility of apparent eQTLs which are in fact artefacts due to >> less >> efficient binding between probes >> and transcripts containing the minor allele. >> >> My questions are- >> >> 1) Having obtained a list of the probes to exclude, is it valid to >> simply >> perform BG correction and quantile normalisation on the whole >> GeneFeatureSet, >> but to then remove probes containing SNPs prior to summarisation? >> Would >> the validity of mapping the resulting probesets to Entrez ids etc. >> be questionable if some probes had been excluded? >> >> 2) If such an approach is reasonable, how can I map Affymetrix >> probe-level identifiers to the GeneFeatureSet? >> >> # I know how to get feature ids for PM probes and their groupings into >> gene-level probesets: >> featureInfo <- oligo:::stArrayPmInfo(normalized, target = 'core') >> >> head(featureInfo,10) >> >> fid fsetid >> 1 116371 7892501 >> 2 943979 7892501 >> 3 493089 7892501 >> 4 907039 7892501 >> 5 1033309 7892502 >> 6 653512 7892502 >> 7 690769 7892502 >> 8 997409 7892502 >> 9 379979 7892503 >> 10 469846 7892503 >> >> The fsetid s correspond to gene-level probesets. Are the fid s row >> indices in the GeneFeatureSet, >> and how can they be linked to Affy probe level ids? >> >> Assuming I can map Affy probe ids to the fid column, here is my >> proposed >> solution for >> summarisation after removal of selected probes, but I wonder if there >> is >> a more straightforward way? >> >> # for demo purposes generate a random list of fids to remove >> # some fid s are duplicated in the data.frame as they map to multiple >> probesets, so use 'unique' >> set.seed(1234) >> fidToCut <- sample(unique(featureInfo$fid), 10000, replace=F) >> >> # remove rows from GeneFeatureSet corressponding to fid s we want to >> exclude >> normalized2 <- normalized[-fidToCut, ] >> >> # create mapping for the new GeneFeatureSet of how old row indices map >> to >> new indices >> map <- cbind(new_ind= 1:nrow(normalized2), old_ind= >> featureNames(normalized2) ) >> >> # pmi gives row indices of pm probes in 'normalized', not >> 'normalized2' >> pmi <- featureInfo[["fid"]] >> >> # re-order dataframe so 'old_ind' column is the same as pmi >> map1 <- map[match(pmi, map[,"old_ind"]),] >> map1 <- cbind(map1, featureInfo) >> >> tail(map1) >> new_ind old_ind fid fsetid >> 861488 961136 969962 969962 8180418 >> 861489 208831 210719 210719 8180418 >> 861490 870109 878113 878113 8180418 >> 861491 598190 603636 603636 8180418 >> 861492 858752 866642 866642 8180418 >> 861493 713834 720353 720353 8180418 >> >> # we need to get rid of rows with NA... e.g. where 'fid' but no >> 'new_ind' >> exists: >> # effectively recreating a new 'featureInfo' dataframe >> map1 <- na.omit(map1) >> >> # now we can get the new row indices, and the probesets they belong to >> pmi <- as.numeric( as.character( map1[["new_ind"]] )) # 'new_ind' got >> coerced to a factor >> pnVec <- as.character(map1[["fsetid"]]) >> >> # subset the normalized probe-level values to keep probes by indexes >> in >> pmi >> pms <- exprs(normalized2)[pmi, , drop = FALSE] >> dimnames(pms) <- NULL >> colnames(pms) <- sampleNames(normalized2) >> # get a matrix of summarized values, which can be used to create an >> ExpressionSet >> theExprs <- basicRMA(pms, pnVec, normalize=F, background=F) >> >> I'd greatly appreciate any advice. I'm a biologist by background so >> apologies if this is rather basic. >> Thanks very much >> Jimmy Peters, Smith Lab, Cambridge Institute for Medical Research. >> >> -- output of sessionInfo(): >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets >> methods >> base >> >> other attached packages: >> [1] pd.hugene.1.0.st.v1_3.8.0 oligoData_1.8.0 biomaRt_2.18.0 >> pd.hugene.1.1.st.v1_3.8.0 RSQLite_0.11.4 DBI_0.2-7 >> >> [7] oligo_1.26.0 Biobase_2.22.0 >> oligoClasses_1.24.0 BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] affxparser_1.34.0 affyio_1.30.0 BiocInstaller_1.12.0 >> Biostrings_2.30.0 bit_1.1-10 codetools_0.2-8 >> ff_2.2-12 >> [8] foreach_1.4.1 GenomicRanges_1.14.1 IRanges_1.20.0 >> iterators_1.0.6 preprocessCore_1.24.0 RCurl_1.95-4.1 >> splines_3.0.2 >> [15] stats4_3.0.2 tools_3.0.2 XML_3.95-0.2 >> XVector_0.2.0 zlibbioc_1.8.0 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.5 years ago Dr J. Peters ▴ 30

0

Entering edit mode

Hi Daniel,

in order to remove probes before normalization, I filtered out some probes ( ~300000) from the cfs and pgf files (clarim r1 analysis files) and then created a new pd.clariom.d.human library. Around 0.5% and 2% of the transcript clusters and probesets, respectively, were removed.

However, when I read the CEL files, the probes I remove before making the library are still present. In this way, after normalizing (rma), probesets and transcript clusters that should have been removed are indeed present, too, although with lower intensities.

I would appreciate a lot if you could tell me how to avoid having such probes, probesets and transcript clusters that should not be there after excluding them from the library (the common approach you say).

Thanks a lot

ADD REPLY • link 7.1 years ago jacorvar ▴ 40

Login before adding your answer.