[Bioc-devel] help with limma design

0

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

Hi Kaiyu, First off, this isn't an appropriate question for Bioc-devel. That list is intended for questions about developing Bioconductor packages, not questions about how to use the packages. I have re-directed to the correct list. Kaiyu Shen wrote: > Hello, folks: > I am now using limma package to analyze the two-color arrays. Here are > the six arrays that I have: > > # Cy3 Cy5 > Array1 MU1 WT > Array2 WT MU1 > Array3 MU2 WT > Array4 WT MU2 > Array5 MU3 WT > Array6 WT MU3 > > What I want to analyze is to study the MU1 vs WT. > I tried two analysis ways, to make it easier, I have not introduced any > pre-processing methods: > > A. Just have the first two arrays for analysis > > # Cy3 Cy5 > Array1 MU1 WT > Array2 WT MU1 > > object=readTargets("limma.txt") > RG=read.maimages(object,source="agilent") > MA=normalizeWithinarray(RG) > design=c(1,-1) > fit=lmFit(MA,design) > fit=eBayes(fit) > topTable(fit) > > > B. I include all six arrays to have other analysis simultaneously > > # Cy3 Cy5 > Array1 MU1 WT > Array2 WT MU1 > Array3 MU2 WT > Array4 WT MU2 > Array5 MU3 WT > Array6 WT MU3 > > object=readTargets("limma.txt") > RG=read.maimages(object,source="agilent") > MA=normalizeWithinarray(RG) > design=cbind(mu1=c(1,-1,0,0,0,0),mu2=c(0,0,1,-1,0,0),mu3=c(0,0,0,0,1 ,-1)) > cont.matrix=makeContrasts(mu1,mu2,mu3,levels=design) > fit=lmFit(MA,design) > fit2=contrasts.fit(fit,cont.matrix) > fit2=eBayes(fit2) > topTable(fit2,coef=1) #to get the first comparison (array1 vs array2) > > > However, these two methods do not give me the same results. > Would somebody give me some suggestions of these two methods? The differences are primarily due to the fact that you are fitting a linear model here, so the denominator of your t-statistic is a measure of the variability within each of the groups you have defined. In the first case you have only two groups, whereas in the second case you have six groups. How this affects your results depends on the data. In the second case you have increased the amount of data used to compute the sums of squares of error (SSE), which will tend to make this value smaller, and might result in more genes being significant (smaller denominator => larger t-statistic => more genes). However, if the variability within the second two groups is much higher than in the first, then this will tend to inflate the SSE, and you will get fewer genes. Best, Jim > > Thank you very much > > _______________________________________________ > Bioc-devel at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

limma limma • 640 views

ADD COMMENT • link 14.9 years ago James W. MacDonald 66k

0

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

Hi Kaiyu, First off, this isn't an appropriate question for Bioc-devel. That list is intended for questions about developing Bioconductor packages, not questions about how to use the packages. I have re-directed to the correct list. Kaiyu Shen wrote: > Hello, folks: > I am now using limma package to analyze the two-color arrays. Here are > the six arrays that I have: > > # Cy3 Cy5 > Array1 MU1 WT > Array2 WT MU1 > Array3 MU2 WT > Array4 WT MU2 > Array5 MU3 WT > Array6 WT MU3 > > What I want to analyze is to study the MU1 vs WT. > I tried two analysis ways, to make it easier, I have not introduced any > pre-processing methods: > > A. Just have the first two arrays for analysis > > # Cy3 Cy5 > Array1 MU1 WT > Array2 WT MU1 > > object=readTargets("limma.txt") > RG=read.maimages(object,source="agilent") > MA=normalizeWithinarray(RG) > design=c(1,-1) > fit=lmFit(MA,design) > fit=eBayes(fit) > topTable(fit) > > > B. I include all six arrays to have other analysis simultaneously > > # Cy3 Cy5 > Array1 MU1 WT > Array2 WT MU1 > Array3 MU2 WT > Array4 WT MU2 > Array5 MU3 WT > Array6 WT MU3 > > object=readTargets("limma.txt") > RG=read.maimages(object,source="agilent") > MA=normalizeWithinarray(RG) > design=cbind(mu1=c(1,-1,0,0,0,0),mu2=c(0,0,1,-1,0,0),mu3=c(0,0,0,0,1 ,-1)) > cont.matrix=makeContrasts(mu1,mu2,mu3,levels=design) > fit=lmFit(MA,design) > fit2=contrasts.fit(fit,cont.matrix) > fit2=eBayes(fit2) > topTable(fit2,coef=1) #to get the first comparison (array1 vs array2) > > > However, these two methods do not give me the same results. > Would somebody give me some suggestions of these two methods? The differences are primarily due to the fact that you are fitting a linear model here, so the denominator of your t-statistic is a measure of the variability within each of the groups you have defined. In the first case you have only two groups, whereas in the second case you have six groups. How this affects your results depends on the data. In the second case you have increased the amount of data used to compute the sums of squares of error (SSE), which will tend to make this value smaller, and might result in more genes being significant (smaller denominator => larger t-statistic => more genes). However, if the variability within the second two groups is much higher than in the first, then this will tend to inflate the SSE, and you will get fewer genes. Best, Jim > > Thank you very much > > _______________________________________________ > Bioc-devel at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826

ADD COMMENT • link 14.9 years ago James W. MacDonald 66k

0

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

Hi Kaiyu, First off, this isn't an appropriate question for Bioc-devel. That list is intended for questions about developing Bioconductor packages, not questions about how to use the packages. I have re-directed to the correct list. Kaiyu Shen wrote: > Hello, folks: > I am now using limma package to analyze the two-color arrays. Here are > the six arrays that I have: > > # Cy3 Cy5 > Array1 MU1 WT > Array2 WT MU1 > Array3 MU2 WT > Array4 WT MU2 > Array5 MU3 WT > Array6 WT MU3 > > What I want to analyze is to study the MU1 vs WT. > I tried two analysis ways, to make it easier, I have not introduced any > pre-processing methods: > > A. Just have the first two arrays for analysis > > # Cy3 Cy5 > Array1 MU1 WT > Array2 WT MU1 > > object=readTargets("limma.txt") > RG=read.maimages(object,source="agilent") > MA=normalizeWithinarray(RG) > design=c(1,-1) > fit=lmFit(MA,design) > fit=eBayes(fit) > topTable(fit) > > > B. I include all six arrays to have other analysis simultaneously > > # Cy3 Cy5 > Array1 MU1 WT > Array2 WT MU1 > Array3 MU2 WT > Array4 WT MU2 > Array5 MU3 WT > Array6 WT MU3 > > object=readTargets("limma.txt") > RG=read.maimages(object,source="agilent") > MA=normalizeWithinarray(RG) > design=cbind(mu1=c(1,-1,0,0,0,0),mu2=c(0,0,1,-1,0,0),mu3=c(0,0,0,0,1 ,-1)) > cont.matrix=makeContrasts(mu1,mu2,mu3,levels=design) > fit=lmFit(MA,design) > fit2=contrasts.fit(fit,cont.matrix) > fit2=eBayes(fit2) > topTable(fit2,coef=1) #to get the first comparison (array1 vs array2) > > > However, these two methods do not give me the same results. > Would somebody give me some suggestions of these two methods? The differences are primarily due to the fact that you are fitting a linear model here, so the denominator of your t-statistic is a measure of the variability within each of the groups you have defined. In the first case you have only two groups, whereas in the second case you have six groups. How this affects your results depends on the data. In the second case you have increased the amount of data used to compute the sums of squares of error (SSE), which will tend to make this value smaller, and might result in more genes being significant (smaller denominator => larger t-statistic => more genes). However, if the variability within the second two groups is much higher than in the first, then this will tend to inflate the SSE, and you will get fewer genes. Best, Jim > > Thank you very much > > _______________________________________________ > Bioc-devel at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826

ADD COMMENT • link 14.9 years ago James W. MacDonald 66k

Login before adding your answer.