limma for homemade microarray - question on NAs and multiple probes for one gene

0

Entering edit mode

zhengyu jiang ▴ 50

@zhengyu-jiang-5661

Last seen 11.2 years ago

Dear Bioconductor experts, We have data from a homemade one-channel microarray that I tried to apply limma for differential expression analysis between matched paired Normal (N) and Tumor (Tumor) samples - 8 biological replicates (one tech replicate has been averaged after normalization). All samples are formatted in one matrix (M). Signals have been quantile normalized between each paired normal and tumor. Signal values below 5 (log scale) have been replaced by "NA" since they are potentially noises. So there are many NAs in M. I followed the user manual and made the codes below. I think the code is correct? My questions are (1) how to deal with NAs - as I did a search but no clear idea (2) how do people do the statistics at the gene level for one gene having multiple probes - averaging or taking median? Thanks, Zhengyu > head(M) N1 N2 N3 N4 N5 N6 N7 N8 T1 T2 T3 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293 NA 7.876259 7.856707 NA T4 T5 T6 T7 T8 2 NA 7.720018 NA 7.752550 NA > eset<-as.matrix(M) > Pair=factor(targets$Pair) > Treat=factor(targets$Treatment,levels=c("N","T")) # compared matched normal to tumors > design<-model.matrix(~Pair+Treat) > targets FilenName Pair Treatment 1 N1 1 N 2 N2 2 N 3 N3 3 N 4 N4 4 N 5 N5 5 N 6 N6 6 N 7 N7 7 N 8 N8 8 N 9 T1 1 T 10 T2 2 T 11 T3 3 T 12 T4 4 T 13 T5 5 T 14 T6 6 T 15 T7 7 T 16 T8 8 T fit_pair<-lmFit(eset,design) fit_pair<-eBayes(fit_pair) R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top 30 [[alternative HTML version deleted]]

Microarray Microarray • 1.9k views

ADD COMMENT • link updated 12.3 years ago by Gordon Smyth 53k • written 12.3 years ago by zhengyu jiang ▴ 50

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 12 months ago

United States

Hi Zhengyu, for turning probe-level data into gene-level data, you may want to look at the function collapseRows in the WGCNA package (on CRAN). The relevant citation is Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR, Horvath S (2011) Strategies for aggregating gene expression data: The collapseRows R function. BMC Bioinformatics12:322. PMID: 21816037, PMCID: PMC3166942 http://www.biomedcentral.com/1471-2105/12/322 As for dealing with the missing data, well, I personally would try to go back and restore the original values unless they are outliers. Even knowing that something is below 5 is better than a missing datum. To remove noise, I would possibly remove probes whose expression consistently remains below 5 but not individual expression values. For example, if a probe is consistently above 7 in cancer samples and below 5 (unexpressed) in normals, you want to identify it - but you will completely miss it if you turn all values below 5 into NA. Hope this helps, Peter On Mon, Jul 8, 2013 at 5:40 PM, zhengyu jiang <zhyjiang2006 at="" gmail.com=""> wrote: > Dear Bioconductor experts, > > We have data from a homemade one-channel microarray that I tried to apply > limma for differential expression analysis between matched paired Normal > (N) and Tumor (Tumor) samples - 8 biological replicates (one tech replicate > has been averaged after normalization). All samples are formatted in one > matrix (M). > > Signals have been quantile normalized between each paired normal and tumor. > Signal values below 5 (log scale) have been replaced by "NA" since they are > potentially noises. So there are many NAs in M. > > I followed the user manual and made the codes below. > > I think the code is correct? My questions are (1) how to deal with NAs - as > I did a search but no clear idea (2) how do people do the statistics at the > gene level for one gene having multiple probes - averaging or taking median? > > Thanks, > Zhengyu > > > > head(M) > N1 N2 N3 N4 N5 N6 N7 > N8 T1 T2 T3 > 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293 NA > 7.876259 7.856707 NA > T4 T5 T6 T7 T8 > 2 NA 7.720018 NA 7.752550 NA > >> eset<-as.matrix(M) >> Pair=factor(targets$Pair) >> Treat=factor(targets$Treatment,levels=c("N","T")) # compared matched > normal to tumors >> design<-model.matrix(~Pair+Treat) >> targets > FilenName Pair Treatment > 1 N1 1 N > 2 N2 2 N > 3 N3 3 N > 4 N4 4 N > 5 N5 5 N > 6 N6 6 N > 7 N7 7 N > 8 N8 8 N > 9 T1 1 T > 10 T2 2 T > 11 T3 3 T > 12 T4 4 T > 13 T5 5 T > 14 T6 6 T > 15 T7 7 T > 16 T8 8 T > fit_pair<-lmFit(eset,design) > fit_pair<-eBayes(fit_pair) > > R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top 30 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.3 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 10 hours ago

WEHI, Melbourne, Australia

Dear Zhengyu, > Date: Mon, 8 Jul 2013 20:40:14 -0400 > From: zhengyu jiang <zhyjiang2006 at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] limma for homemade microarray - question on NAs and > multiple probes for one gene > > Dear Bioconductor experts, > > We have data from a homemade one-channel microarray that I tried to apply > limma for differential expression analysis between matched paired Normal > (N) and Tumor (Tumor) samples - 8 biological replicates (one tech replicate > has been averaged after normalization). All samples are formatted in one > matrix (M). This is a standard problem, well covered in the limma User's Guide. > Signals have been quantile normalized between each paired normal and tumor. I don't know what you mean by this. I recommend ordinary quantile normalization of all the arrays together, without regard to which sample is paired with which. > Signal values below 5 (log scale) have been replaced by "NA" since they are > potentially noises. So there are many NAs in M. Please don't do this. limma can deal with low intensities perfectly weel, and can figure out whether they are noise or not. You are simply removing valid data. If you are concerned about high variability of low intensity probes, then look at plotSA(fit_pair) to examine the mean-variance trend, and use eBayes(fit_pair, trend=TRUE) if there is a strong trend. > I followed the user manual and made the codes below. > > I think the code is correct? Looks ok. > My questions are (1) how to deal with NAs - as I did a search but no > clear idea Don't create them in the first place. This has been said many times before! > (2) how do people do the statistics at the gene level for one gene > having multiple probes - averaging or taking median? Usually we do not find it necessary to aggregate the multiple probes. The multiple probes might represent different transcripts or variants, and some of the probes might not work. Why do you need to take any special action? However, if you feel that you must, the best method to aggregate the multiple probes is probably to retain the probe for each gene with highest fit_pair$Amean value. We have used this strategy in a number of published papers. The rationale for this is to choose the probe corresponding to the most highly expressed transcript for the gene. Alternatively, you could average the probes using the avereps() function in limma. Best wishes Gordon > Thanks, > Zhengyu > > > > head(M) > N1 N2 N3 N4 N5 N6 N7 > N8 T1 T2 T3 > 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293 NA > 7.876259 7.856707 NA > T4 T5 T6 T7 T8 > 2 NA 7.720018 NA 7.752550 NA > >> eset<-as.matrix(M) >> Pair=factor(targets$Pair) >> Treat=factor(targets$Treatment,levels=c("N","T")) # compared matched > normal to tumors >> design<-model.matrix(~Pair+Treat) >> targets > FilenName Pair Treatment > 1 N1 1 N > 2 N2 2 N > 3 N3 3 N > 4 N4 4 N > 5 N5 5 N > 6 N6 6 N > 7 N7 7 N > 8 N8 8 N > 9 T1 1 T > 10 T2 2 T > 11 T3 3 T > 12 T4 4 T > 13 T5 5 T > 14 T6 6 T > 15 T7 7 T > 16 T8 8 T > fit_pair<-lmFit(eset,design) > fit_pair<-eBayes(fit_pair) > > R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top 30 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 12.3 years ago Gordon Smyth 53k

0

Entering edit mode

Hi Gordon, Sorry to bother you again. I have two questions. (1) I started the "eset<-as.matrix(log(M)) #### take log intensities" with a matrix and I have a separate annotation file which contains columns of "ID", "Sample Name", "Chr", "Coordinate", ""GeneSymbol". The following code to top list is below. How can I add the annotation to the final toplist output ? eset <- normalizeBetweenArrays(eset, method="quantile") ## quantile normalization for all arrays design<-model.matrix(~Pair+Treat) targets<-readTargets("targets.txt",row.names=1) ## or row name =1 Pair<-factor(targets$Pair) Treat<-factor(targets$Treatment,levels=c("N","T")) fit_pair<-lmFit(eset,design) plotSA(fit_pair) fit= eBayes(fit_pair, trend=TRUE) R=topTable(fit, coef="TreatT", adjust="BH",number=30 (2) I have 6 sample data (3 control and 3 treatment) in one file. It contains columns of "ID", "Sample Name", "Chr", "Coordinate", ""GeneSymbol", "XRaw" (expression raw data). I don't have control probes. If I want to use the Spot quality weights function "wt.fun" to read in these data as described in the limma manual to define all XRaw values below 1000 as 0 weight, is that possible to modify the code? I can change all column names if needed. But I don't know what columns are required for read.illmn? myfun <- function(x) as.numeric(x$XRaw <1000) read.ilmn("062813_data.txt",probeid="ID",annotation=c("Chr","Coordinat e","GeneSymbol",expr="XRaw"), wt.fun=myfun) Thanks, Zhengyu On Tue, Jul 9, 2013 at 7:34 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Zhengyu, > > Date: Mon, 8 Jul 2013 20:40:14 -0400 >> From: zhengyu jiang <zhyjiang2006@gmail.com> >> To: bioconductor@r-project.org >> Subject: [BioC] limma for homemade microarray - question on NAs and >> multiple probes for one gene >> >> Dear Bioconductor experts, >> >> We have data from a homemade one-channel microarray that I tried to apply >> limma for differential expression analysis between matched paired Normal >> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech >> replicate >> has been averaged after normalization). All samples are formatted in one >> matrix (M). >> > > This is a standard problem, well covered in the limma User's Guide. > > Signals have been quantile normalized between each paired normal and >> tumor. >> > > I don't know what you mean by this. I recommend ordinary quantile > normalization of all the arrays together, without regard to which sample is > paired with which. > > Signal values below 5 (log scale) have been replaced by "NA" since they >> are >> potentially noises. So there are many NAs in M. >> > > Please don't do this. limma can deal with low intensities perfectly weel, > and can figure out whether they are noise or not. You are simply removing > valid data. > > If you are concerned about high variability of low intensity probes, then > look at > > plotSA(fit_pair) > > to examine the mean-variance trend, and use > > eBayes(fit_pair, trend=TRUE) > > if there is a strong trend. > > I followed the user manual and made the codes below. >> >> I think the code is correct? >> > > Looks ok. > > My questions are (1) how to deal with NAs - as I did a search but no >> clear idea >> > > Don't create them in the first place. This has been said many times > before! > > (2) how do people do the statistics at the gene level for one gene having >> multiple probes - averaging or taking median? >> > > Usually we do not find it necessary to aggregate the multiple probes. The > multiple probes might represent different transcripts or variants, and some > of the probes might not work. Why do you need to take any special action? > > However, if you feel that you must, the best method to aggregate the > multiple probes is probably to retain the probe for each gene with highest > fit_pair$Amean value. We have used this strategy in a number of published > papers. The rationale for this is to choose the probe corresponding to the > most highly expressed transcript for the gene. Alternatively, you could > average the probes using the avereps() function in limma. > > Best wishes > Gordon > > Thanks, >> Zhengyu >> >> >> > head(M) >> N1 N2 N3 N4 N5 N6 N7 >> N8 T1 T2 T3 >> 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293 NA >> 7.876259 7.856707 NA >> T4 T5 T6 T7 T8 >> 2 NA 7.720018 NA 7.752550 NA >> >> eset<-as.matrix(M) >>> Pair=factor(targets$Pair) >>> Treat=factor(targets$**Treatment,levels=c("N","T")) # compared >>> matched >>> >> normal to tumors >> >>> design<-model.matrix(~Pair+**Treat) >>> targets >>> >> FilenName Pair Treatment >> 1 N1 1 N >> 2 N2 2 N >> 3 N3 3 N >> 4 N4 4 N >> 5 N5 5 N >> 6 N6 6 N >> 7 N7 7 N >> 8 N8 8 N >> 9 T1 1 T >> 10 T2 2 T >> 11 T3 3 T >> 12 T4 4 T >> 13 T5 5 T >> 14 T6 6 T >> 15 T7 7 T >> 16 T8 8 T >> fit_pair<-lmFit(eset,design) >> fit_pair<-eBayes(fit_pair) >> >> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top >> 30 >> > > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:10}}

ADD REPLY • link 12.3 years ago zhengyu jiang ▴ 50

0

Entering edit mode

Dear Zhengyu, On Mon, 22 Jul 2013, zhengyu jiang wrote: > Hi Gordon, > > Sorry to bother you again. I have two questions. > > (1) I started the "eset<-as.matrix(log(M)) #### take log intensities" > with a matrix and I have a separate annotation file which contains > columns of "ID", "Sample Name", "Chr", "Coordinate", ""GeneSymbol". > The following code to top list is below. How can I add the annotation > to the final toplist output ? The first and best way to answer questions is to read the documentation. If you type ?topTable, you will see that there is an argument for a data.frame containing gene annotation. So you read the annotation into a data.frame, then use topTable(fit, ..., genelist=yourdatannotation) > eset <- normalizeBetweenArrays(eset, method="quantile") ## quantile > normalization for all arrays > design<-model.matrix(~Pair+Treat) > targets<-readTargets("targets.txt",row.names=1) ## or row name =1 > Pair<-factor(targets$Pair) > Treat<-factor(targets$Treatment,levels=c("N","T")) > fit_pair<-lmFit(eset,design) > plotSA(fit_pair) > fit= eBayes(fit_pair, trend=TRUE) > R=topTable(fit, coef="TreatT", adjust="BH",number=30 > > (2) I have 6 sample data (3 control and 3 treatment) in one file. Is this an Illumina BeadChip. You don't say but, if not, using read.ilmn is not appropriate. > It contains columns of "ID", "Sample Name", "Chr", "Coordinate", > ""GeneSymbol", "XRaw" (expression raw data). I don't have control > probes. If I want to use the Spot quality weights function "wt.fun" to > read in these data as described in the limma manual to define all XRaw > values below 1000 as 0 weight, And yet I have already asked you please not to do this. > is that possible to modify the code? I can change all column names if > needed. But I don't know what columns are required for read.illmn? Again, please read the documentation. Type ?read.ilmn and you will see that there is no argument called wt.fun. You can only use read.ilmn for Illumina BeadChips. This type of array does not have spots, so trying to set spot weights would obviously make no sense. The Biconductor posting guide says "Read the relevant R documentation ... If you are having trouble with function somefunc, try ?somefunc." The limma User's Guide says "Mailing list etiquette requires that you read the relevant help page carefully before posting a problem to the list." Best wishes Gordon > myfun <- function(x) as.numeric(x$XRaw <1000) > read.ilmn("062813_data.txt",probeid="ID",annotation=c("Chr","Coordin ate","GeneSymbol",expr="XRaw"), > wt.fun=myfun) > Thanks, > Zhengyu > > > On Tue, Jul 9, 2013 at 7:34 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Zhengyu, >> >> Date: Mon, 8 Jul 2013 20:40:14 -0400 >>> From: zhengyu jiang <zhyjiang2006 at="" gmail.com=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] limma for homemade microarray - question on NAs and >>> multiple probes for one gene >>> >>> Dear Bioconductor experts, >>> >>> We have data from a homemade one-channel microarray that I tried to apply >>> limma for differential expression analysis between matched paired Normal >>> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech >>> replicate >>> has been averaged after normalization). All samples are formatted in one >>> matrix (M). >>> >> >> This is a standard problem, well covered in the limma User's Guide. >> >> Signals have been quantile normalized between each paired normal and >>> tumor. >>> >> >> I don't know what you mean by this. I recommend ordinary quantile >> normalization of all the arrays together, without regard to which sample is >> paired with which. >> >> Signal values below 5 (log scale) have been replaced by "NA" since they >>> are >>> potentially noises. So there are many NAs in M. >>> >> >> Please don't do this. limma can deal with low intensities perfectly weel, >> and can figure out whether they are noise or not. You are simply removing >> valid data. >> >> If you are concerned about high variability of low intensity probes, then >> look at >> >> plotSA(fit_pair) >> >> to examine the mean-variance trend, and use >> >> eBayes(fit_pair, trend=TRUE) >> >> if there is a strong trend. >> >> I followed the user manual and made the codes below. >>> >>> I think the code is correct? >>> >> >> Looks ok. >> >> My questions are (1) how to deal with NAs - as I did a search but no >>> clear idea >>> >> >> Don't create them in the first place. This has been said many times >> before! >> >> (2) how do people do the statistics at the gene level for one gene having >>> multiple probes - averaging or taking median? >>> >> >> Usually we do not find it necessary to aggregate the multiple probes. The >> multiple probes might represent different transcripts or variants, and some >> of the probes might not work. Why do you need to take any special action? >> >> However, if you feel that you must, the best method to aggregate the >> multiple probes is probably to retain the probe for each gene with highest >> fit_pair$Amean value. We have used this strategy in a number of published >> papers. The rationale for this is to choose the probe corresponding to the >> most highly expressed transcript for the gene. Alternatively, you could >> average the probes using the avereps() function in limma. >> >> Best wishes >> Gordon >> >> Thanks, >>> Zhengyu >>> >>> >>>> head(M) >>> N1 N2 N3 N4 N5 N6 N7 >>> N8 T1 T2 T3 >>> 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293 NA >>> 7.876259 7.856707 NA >>> T4 T5 T6 T7 T8 >>> 2 NA 7.720018 NA 7.752550 NA >>> >>> eset<-as.matrix(M) >>>> Pair=factor(targets$Pair) >>>> Treat=factor(targets$**Treatment,levels=c("N","T")) # compared >>>> matched >>>> >>> normal to tumors >>> >>>> design<-model.matrix(~Pair+**Treat) >>>> targets >>>> >>> FilenName Pair Treatment >>> 1 N1 1 N >>> 2 N2 2 N >>> 3 N3 3 N >>> 4 N4 4 N >>> 5 N5 5 N >>> 6 N6 6 N >>> 7 N7 7 N >>> 8 N8 8 N >>> 9 T1 1 T >>> 10 T2 2 T >>> 11 T3 3 T >>> 12 T4 4 T >>> 13 T5 5 T >>> 14 T6 6 T >>> 15 T7 7 T >>> 16 T8 8 T >>> fit_pair<-lmFit(eset,design) >>> fit_pair<-eBayes(fit_pair) >>> >>> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top >>> 30 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 12.3 years ago Gordon Smyth 53k

0

Entering edit mode

sorry, i have a question. why did you do log(M) did you only extract matched probe signals? shan On Thu, Jul 25, 2013 at 7:21 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Zhengyu, > > > On Mon, 22 Jul 2013, zhengyu jiang wrote: > > Hi Gordon, >> >> Sorry to bother you again. I have two questions. >> >> (1) I started the "eset<-as.matrix(log(M)) #### take log intensities" >> with a matrix and I have a separate annotation file which contains columns >> of "ID", "Sample Name", "Chr", "Coordinate", ""GeneSymbol". The following >> code to top list is below. How can I add the annotation to the final >> toplist output ? >> > > The first and best way to answer questions is to read the documentation. > If you type ?topTable, you will see that there is an argument for a > data.frame containing gene annotation. So you read the annotation into a > data.frame, then use > > topTable(fit, ..., genelist=yourdatannotation) > > > eset <- normalizeBetweenArrays(eset, method="quantile") ## quantile >> normalization for all arrays >> design<-model.matrix(~Pair+**Treat) >> targets<-readTargets("targets.**txt",row.names=1) ## or row name =1 >> Pair<-factor(targets$Pair) >> Treat<-factor(targets$**Treatment,levels=c("N","T")) >> fit_pair<-lmFit(eset,design) >> plotSA(fit_pair) >> fit= eBayes(fit_pair, trend=TRUE) >> R=topTable(fit, coef="TreatT", adjust="BH",number=30 >> >> (2) I have 6 sample data (3 control and 3 treatment) in one file. >> > > Is this an Illumina BeadChip. You don't say but, if not, using read.ilmn > is not appropriate. > > > It contains columns of "ID", "Sample Name", "Chr", "Coordinate", >> ""GeneSymbol", "XRaw" (expression raw data). I don't have control probes. >> If I want to use the Spot quality weights function "wt.fun" to read in >> these data as described in the limma manual to define all XRaw values below >> 1000 as 0 weight, >> > > And yet I have already asked you please not to do this. > > > is that possible to modify the code? I can change all column names if >> needed. But I don't know what columns are required for read.illmn? >> > > Again, please read the documentation. Type ?read.ilmn and you will see > that there is no argument called wt.fun. You can only use read.ilmn for > Illumina BeadChips. This type of array does not have spots, so trying to > set spot weights would obviously make no sense. > > The Biconductor posting guide says > > "Read the relevant R documentation ... If you are having trouble with > function somefunc, try ?somefunc." > > The limma User's Guide says > > "Mailing list etiquette requires that you read the relevant help page > carefully before posting a problem to the list." > > Best wishes > Gordon > > > myfun <- function(x) as.numeric(x$XRaw <1000) >> read.ilmn("062813_data.txt",**probeid="ID",annotation=c("** >> Chr","Coordinate","GeneSymbol"**,expr="XRaw"), >> wt.fun=myfun) >> Thanks, >> Zhengyu >> >> >> On Tue, Jul 9, 2013 at 7:34 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: >> >> Dear Zhengyu, >>> >>> Date: Mon, 8 Jul 2013 20:40:14 -0400 >>> >>>> From: zhengyu jiang <zhyjiang2006@gmail.com> >>>> To: bioconductor@r-project.org >>>> Subject: [BioC] limma for homemade microarray - question on NAs and >>>> multiple probes for one gene >>>> >>>> Dear Bioconductor experts, >>>> >>>> We have data from a homemade one-channel microarray that I tried to >>>> apply >>>> limma for differential expression analysis between matched paired Normal >>>> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech >>>> replicate >>>> has been averaged after normalization). All samples are formatted in one >>>> matrix (M). >>>> >>>> >>> This is a standard problem, well covered in the limma User's Guide. >>> >>> Signals have been quantile normalized between each paired normal and >>> >>>> tumor. >>>> >>>> >>> I don't know what you mean by this. I recommend ordinary quantile >>> normalization of all the arrays together, without regard to which sample >>> is >>> paired with which. >>> >>> Signal values below 5 (log scale) have been replaced by "NA" since they >>> >>>> are >>>> potentially noises. So there are many NAs in M. >>>> >>>> >>> Please don't do this. limma can deal with low intensities perfectly >>> weel, >>> and can figure out whether they are noise or not. You are simply >>> removing >>> valid data. >>> >>> If you are concerned about high variability of low intensity probes, then >>> look at >>> >>> plotSA(fit_pair) >>> >>> to examine the mean-variance trend, and use >>> >>> eBayes(fit_pair, trend=TRUE) >>> >>> if there is a strong trend. >>> >>> I followed the user manual and made the codes below. >>> >>>> >>>> I think the code is correct? >>>> >>>> >>> Looks ok. >>> >>> My questions are (1) how to deal with NAs - as I did a search but no >>> >>>> clear idea >>>> >>>> >>> Don't create them in the first place. This has been said many times >>> before! >>> >>> (2) how do people do the statistics at the gene level for one gene >>> having >>> >>>> multiple probes - averaging or taking median? >>>> >>>> >>> Usually we do not find it necessary to aggregate the multiple probes. The >>> multiple probes might represent different transcripts or variants, and >>> some >>> of the probes might not work. Why do you need to take any special >>> action? >>> >>> However, if you feel that you must, the best method to aggregate the >>> multiple probes is probably to retain the probe for each gene with >>> highest >>> fit_pair$Amean value. We have used this strategy in a number of >>> published >>> papers. The rationale for this is to choose the probe corresponding to >>> the >>> most highly expressed transcript for the gene. Alternatively, you could >>> average the probes using the avereps() function in limma. >>> >>> Best wishes >>> Gordon >>> >>> Thanks, >>> >>>> Zhengyu >>>> >>>> >>>> head(M) >>>>> >>>> N1 N2 N3 N4 N5 N6 N7 >>>> N8 T1 T2 T3 >>>> 2 8.622724 7.423568 NA NA 7.487174 NA 8.516293 >>>> NA >>>> 7.876259 7.856707 NA >>>> T4 T5 T6 T7 T8 >>>> 2 NA 7.720018 NA 7.752550 NA >>>> >>>> eset<-as.matrix(M) >>>> >>>>> Pair=factor(targets$Pair) >>>>> Treat=factor(targets$****Treatment,levels=c("N","T")) # compared >>>>> matched >>>>> >>>>> normal to tumors >>>> >>>> design<-model.matrix(~Pair+****Treat) >>>>> targets >>>>> >>>>> FilenName Pair Treatment >>>> 1 N1 1 N >>>> 2 N2 2 N >>>> 3 N3 3 N >>>> 4 N4 4 N >>>> 5 N5 5 N >>>> 6 N6 6 N >>>> 7 N7 7 N >>>> 8 N8 8 N >>>> 9 T1 1 T >>>> 10 T2 2 T >>>> 11 T3 3 T >>>> 12 T4 4 T >>>> 13 T5 5 T >>>> 14 T6 6 T >>>> 15 T7 7 T >>>> 16 T8 8 T >>>> fit_pair<-lmFit(eset,design) >>>> fit_pair<-eBayes(fit_pair) >>>> >>>> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top >>>> 30 >>>> >>> > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:26}}

ADD REPLY • link 12.3 years ago wang peter ★ 2.0k

Login before adding your answer.