Paired limma analysis

0

Entering edit mode

mb3058@columbia.edu ▴ 10

@mb3058columbiaedu-5436

Last seen 11.4 years ago

I am running a paired analysis with limma. My platform is affymetrix mouse gene st.1.0 array I have a paired group (hi, lo) with duplicates per group. My script so far is this: exprs <- read.table("data.txt",as.is=T,sep="\t",header=T,row.names=1) targets<-readTargets("pairedTargetFile.txt") Pair <- factor(targets$Group) Treat <- factor(targets$Treatment, levels=c("hi","lo")) design <- model.matrix(~Pair+Treat) fit <- lmFit(exprs, design) fit <- eBayes(fit) #I want to print out the limma results for all the probes Results<- topTable(fit,number=241425,adjust.method="BH") write.table(Results,"paired_Results.txt",sep="\t") ID X.Intercept. Pair2 Treatlo AveExpr F P.Value adj.P.Val 10469363 10.590048 2.186194 -0.636504 11.364893 1566.532105 0.000000469 0.00000942 10365564 10.12985925 2.3047915 -0.2214415 11.17153425 1528.543391 0.000000495 0.00000942 10469367 9.8800415 2.440832 -0.652667 10.774124 1417.807515 0.000000583 0.00000942 10517519 10.1252575 2.183145 0.212485 11.3230725 1403.521849 0.000000596 0.00000942 10469380 9.61192575 2.9290725 -0.8228465 10.66503875 1396.225476 0.000000603 0.00000942 Q1.I am trying to interpret some of the fields. Could someone help me understand what the fields X.Intercept. Pair2 Treatlo stand for? Q2. When I run the following line to get results Results <- topTable(fit, coef="Treatlo"), I get the following results: ID logFC AveExpr t P.Value adj.P.Val B 10354461 4.390143 5.666826 14.06184 8.62E-05 0.5279716 -0.7055664 10583079 4.249853 6.16608 13.51989 1.02E-04 0.5279716 -0.7296509 10587318 4.180087 4.887135 13.34857 1.08E-04 0.5279716 -0.7378131 10370298 4.0321 5.502014 12.69299 1.33E-04 0.5279716 -0.7717809 10392551 4.259277 4.614051 12.44214 1.45E-04 0.5279716 -0.7860399 10368320 4.387814 5.911313 12.36954 1.49E-04 0.5279716 -0.7903085 10587807 3.805685 4.826062 12.30147 1.52E-04 0.5279716 -0.7943703 10504815 3.793085 5.657937 12.27763 1.54E-04 0.5279716 -0.7958069 10357489 3.968974 4.666258 12.23582 1.56E-04 0.5279716 -0.7983445 10481521 3.757882 7.856306 12.15789 1.60E-04 0.5279716 -0.8031341 However when I run the following command to get results for all the probes, I get different values for the same probes. See PValue, Adj. P value ID X.Intercept. Pair2 Treatlo AveExpr F P.Value adj.P.Val 10354461 2.963005 1.0175 4.390143 5.6668265 441.9193584 7.37E-06 1.48E-05 10583079 5.2467745 -2.411241 4.249853 6.1660805 516.5649337 5.25E-06 1.24E-05 10587318 2.88690775 -0.1796325 4.1800875 4.88713525 334.0547737 1.35E-05 2.19E-05 10370298 4.54191225 -2.1118965 4.0321005 5.50201425 408.7688335 8.74E-06 1.64E-05 10392551 2.4626765 0.043472 4.259277 4.614051 261.1260083 2.31E-05 3.27E-05 10368320 3.76314325 -0.0914745 4.3878135 5.91131275 377.3505032 1.04E-05 1.83E-05 10587807 2.69620575 0.4540285 3.8056845 4.82606225 325.4919972 1.43E-05 2.28E-05 10504815 5.27270925 -3.0226305 3.7930855 5.65793675 458.5242305 6.81E-06 1.42E-05 10357489 2.90779925 -0.4520565 3.9689735 4.66625775 286.4197106 1.89E-05 2.80E-05 10481521 6.637718 -1.320705 3.757882 7.8563065 794.2002708 2.06E-06 9.55E-06

limma limma • 1.9k views

ADD COMMENT • link updated 13.5 years ago by James W. MacDonald 68k • written 13.5 years ago by mb3058@columbia.edu ▴ 10

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi mb3058 at columbia.edu, On 8/2/2012 6:59 PM, mb3058 at columbia.edu wrote: > I am running a paired analysis with limma. > My platform is affymetrix mouse gene st.1.0 array > I have a paired group (hi, lo) with duplicates per group. > My script so far is this: > exprs <- read.table("data.txt",as.is=T,sep="\t",header=T,row.names=1) > targets<-readTargets("pairedTargetFile.txt") > Pair <- factor(targets$Group) > Treat <- factor(targets$Treatment, levels=c("hi","lo")) > design <- model.matrix(~Pair+Treat) > fit <- lmFit(exprs, design) > fit <- eBayes(fit) > #I want to print out the limma results for all the probes > Results<- topTable(fit,number=241425,adjust.method="BH") > write.table(Results,"paired_Results.txt",sep="\t") > > ID X.Intercept. Pair2 Treatlo AveExpr F P.Value > adj.P.Val > 10469363 10.590048 2.186194 -0.636504 11.364893 > 1566.532105 0.000000469 0.00000942 > 10365564 10.12985925 2.3047915 -0.2214415 11.17153425 > 1528.543391 0.000000495 0.00000942 > 10469367 9.8800415 2.440832 -0.652667 10.774124 > 1417.807515 0.000000583 0.00000942 > 10517519 10.1252575 2.183145 0.212485 11.3230725 > 1403.521849 0.000000596 0.00000942 > 10469380 9.61192575 2.9290725 -0.8228465 10.66503875 > 1396.225476 0.000000603 0.00000942 > > Q1.I am trying to interpret some of the fields. Could someone help me > understand what the fields X.Intercept. Pair2 Treatlo stand for? Well, I can tell you what the coefficients are estimating, but make no warranties concerning your understanding. The X.Intercept can be thought of as a baseline. Technically it estimates the mean expression of the hi treated samples for the first of the pairs. The Pair2 coefficient estimates the difference between the pairs, and the Treatlo coefficient estimates the difference between the hi and lo treatment. I would assume that you want to find those genes that are differentially expressed between hi and lo, after adjusting for the pairing. For that you want the Treatlo coefficient. > > > > Q2. When I run the following line to get results > Results <- topTable(fit, coef="Treatlo"), I get the following results: > ID logFC AveExpr t P.Value adj.P.Val B > 10354461 4.390143 5.666826 14.06184 8.62E-05 > 0.5279716 -0.7055664 > 10583079 4.249853 6.16608 13.51989 1.02E-04 > 0.5279716 -0.7296509 > 10587318 4.180087 4.887135 13.34857 1.08E-04 > 0.5279716 -0.7378131 > 10370298 4.0321 5.502014 12.69299 1.33E-04 0.5279716 > -0.7717809 > 10392551 4.259277 4.614051 12.44214 1.45E-04 > 0.5279716 -0.7860399 > 10368320 4.387814 5.911313 12.36954 1.49E-04 > 0.5279716 -0.7903085 > 10587807 3.805685 4.826062 12.30147 1.52E-04 > 0.5279716 -0.7943703 > 10504815 3.793085 5.657937 12.27763 1.54E-04 > 0.5279716 -0.7958069 > 10357489 3.968974 4.666258 12.23582 1.56E-04 > 0.5279716 -0.7983445 > 10481521 3.757882 7.856306 12.15789 1.60E-04 > 0.5279716 -0.8031341 > > However when I run the following command to get results for all the > probes, I get different values for the same probes. See PValue, Adj. P > value Well, you don't show the command you use. I would assume you ran topTable() without specifying a coefficient, in which case the following portions from ?topTable might be instructive: coef: column number or column name specifying which coefficient or contrast of the linear model is of interest. For 'topTable', can also be a vector of column subscripts, in which case the gene ranking is by F-statistic for that set of contrasts. and 'topTableF' ranks genes on the basis of moderated F-statistics for one or more coefficients. If 'topTable' is called with 'coef' has length greater than 1, then the specified columns will be extracted from 'fit' and 'topTableF' called on the result. 'topTable' with 'coef=NULL' is the same as 'topTableF', unless the fitted model 'fit' has only one column. The default to topTable() is coef=NULL, in which case you are calling topTableF(), and computing the F-statistic over all coefficients, which I highly doubt is what you want to do. Best, Jim > ID X.Intercept. Pair2 Treatlo AveExpr F P.Value > adj.P.Val > 10354461 2.963005 1.0175 4.390143 5.6668265 > 441.9193584 7.37E-06 1.48E-05 > 10583079 5.2467745 -2.411241 4.249853 6.1660805 > 516.5649337 5.25E-06 1.24E-05 > 10587318 2.88690775 -0.1796325 4.1800875 4.88713525 > 334.0547737 1.35E-05 2.19E-05 > 10370298 4.54191225 -2.1118965 4.0321005 5.50201425 > 408.7688335 8.74E-06 1.64E-05 > 10392551 2.4626765 0.043472 4.259277 4.614051 > 261.1260083 2.31E-05 3.27E-05 > 10368320 3.76314325 -0.0914745 4.3878135 5.91131275 > 377.3505032 1.04E-05 1.83E-05 > 10587807 2.69620575 0.4540285 3.8056845 4.82606225 > 325.4919972 1.43E-05 2.28E-05 > 10504815 5.27270925 -3.0226305 3.7930855 5.65793675 > 458.5242305 6.81E-06 1.42E-05 > 10357489 2.90779925 -0.4520565 3.9689735 4.66625775 > 286.4197106 1.89E-05 2.80E-05 > 10481521 6.637718 -1.320705 3.757882 7.8563065 > 794.2002708 2.06E-06 9.55E-06 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 13.5 years ago James W. MacDonald 68k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Please don't take things off-list. On 8/3/2012 12:21 PM, mb3058 at columbia.edu wrote: > Hi Jim, > > Thank you for replying to my questions. Your answers have somewhat > cleared the doubts that I had about the results. I will be obliged if > you could reply to some follow up questions I had. > > 1.Out of Pair2 and Treatlo, which one is the logFC column (my input > data is log2 transformed)? I have no idea what you are asking here. > 2.What threshold should I be taking for Treatlo coefficient to get > genes that are significantly differentially expressed between hi and lo. That's up to you, isn't it? If you are doing the analysis, then you have to take responsibility for the choices you make. Why would you abdicate this responsibility to some random dude on a listserv? If I said you need a p-value of 0.5 and a -1.5 fold change, would you use that advice? I have to admit, I am somewhat boggled. Best, Jim > > Thanks again > > > > Quoting "James W. MacDonald" <jmacdon at="" uw.edu="">: > >> Hi mb3058 at columbia.edu, >> >> On 8/2/2012 6:59 PM, mb3058 at columbia.edu wrote: >>> I am running a paired analysis with limma. >>> My platform is affymetrix mouse gene st.1.0 array >>> I have a paired group (hi, lo) with duplicates per group. >>> My script so far is this: >>> exprs <- read.table("data.txt",as.is=T,sep="\t",header=T,row.names=1) >>> targets<-readTargets("pairedTargetFile.txt") >>> Pair <- factor(targets$Group) >>> Treat <- factor(targets$Treatment, levels=c("hi","lo")) >>> design <- model.matrix(~Pair+Treat) >>> fit <- lmFit(exprs, design) >>> fit <- eBayes(fit) >>> #I want to print out the limma results for all the probes >>> Results<- topTable(fit,number=241425,adjust.method="BH") >>> write.table(Results,"paired_Results.txt",sep="\t") >>> >>> ID X.Intercept. Pair2 Treatlo AveExpr F P.Value >>> adj.P.Val >>> 10469363 10.590048 2.186194 -0.636504 11.364893 >>> 1566.532105 0.000000469 0.00000942 >>> 10365564 10.12985925 2.3047915 -0.2214415 11.17153425 >>> 1528.543391 0.000000495 0.00000942 >>> 10469367 9.8800415 2.440832 -0.652667 10.774124 >>> 1417.807515 0.000000583 0.00000942 >>> 10517519 10.1252575 2.183145 0.212485 11.3230725 >>> 1403.521849 0.000000596 0.00000942 >>> 10469380 9.61192575 2.9290725 -0.8228465 10.66503875 >>> 1396.225476 0.000000603 0.00000942 >>> >>> Q1.I am trying to interpret some of the fields. Could someone help >>> me understand what the fields X.Intercept. Pair2 Treatlo >>> stand for? >> >> Well, I can tell you what the coefficients are estimating, but make no >> warranties concerning your understanding. >> >> The X.Intercept can be thought of as a baseline. Technically it >> estimates the mean expression of the hi treated samples for the first >> of the pairs. The Pair2 coefficient estimates the difference between >> the pairs, and the Treatlo coefficient estimates the difference between >> the hi and lo treatment. >> >> I would assume that you want to find those genes that are >> differentially expressed between hi and lo, after adjusting for the >> pairing. For that you want the Treatlo coefficient. >> >>> >>> >>> >>> Q2. When I run the following line to get results >>> Results <- topTable(fit, coef="Treatlo"), I get the following results: >>> ID logFC AveExpr t P.Value adj.P.Val B >>> 10354461 4.390143 5.666826 14.06184 8.62E-05 >>> 0.5279716 -0.7055664 >>> 10583079 4.249853 6.16608 13.51989 1.02E-04 >>> 0.5279716 -0.7296509 >>> 10587318 4.180087 4.887135 13.34857 1.08E-04 >>> 0.5279716 -0.7378131 >>> 10370298 4.0321 5.502014 12.69299 1.33E-04 0.5279716 >>> -0.7717809 >>> 10392551 4.259277 4.614051 12.44214 1.45E-04 >>> 0.5279716 -0.7860399 >>> 10368320 4.387814 5.911313 12.36954 1.49E-04 >>> 0.5279716 -0.7903085 >>> 10587807 3.805685 4.826062 12.30147 1.52E-04 >>> 0.5279716 -0.7943703 >>> 10504815 3.793085 5.657937 12.27763 1.54E-04 >>> 0.5279716 -0.7958069 >>> 10357489 3.968974 4.666258 12.23582 1.56E-04 >>> 0.5279716 -0.7983445 >>> 10481521 3.757882 7.856306 12.15789 1.60E-04 >>> 0.5279716 -0.8031341 >>> >>> However when I run the following command to get results for all the >>> probes, I get different values for the same probes. See PValue, >>> Adj. P value >> >> Well, you don't show the command you use. I would assume you ran >> topTable() without specifying a coefficient, in which case the >> following portions from ?topTable might be instructive: >> >> coef: column number or column name specifying which coefficient or >> contrast of the linear model is of interest. For 'topTable', >> can also be a vector of column subscripts, in which case the >> gene ranking is by F-statistic for that set of contrasts. >> >> and >> >> 'topTableF' ranks genes on the basis of moderated F-statistics for >> one or more coefficients. If 'topTable' is called with 'coef' has >> length greater than 1, then the specified columns will be >> extracted from 'fit' and 'topTableF' called on the result. >> 'topTable' with 'coef=NULL' is the same as 'topTableF', unless the >> fitted model 'fit' has only one column. >> >> The default to topTable() is coef=NULL, in which case you are calling >> topTableF(), and computing the F-statistic over all coefficients, which >> I highly doubt is what you want to do. >> >> Best, >> >> Jim >> >> >> >>> ID X.Intercept. Pair2 Treatlo AveExpr F P.Value >>> adj.P.Val >>> 10354461 2.963005 1.0175 4.390143 5.6668265 >>> 441.9193584 7.37E-06 1.48E-05 >>> 10583079 5.2467745 -2.411241 4.249853 6.1660805 >>> 516.5649337 5.25E-06 1.24E-05 >>> 10587318 2.88690775 -0.1796325 4.1800875 4.88713525 >>> 334.0547737 1.35E-05 2.19E-05 >>> 10370298 4.54191225 -2.1118965 4.0321005 5.50201425 >>> 408.7688335 8.74E-06 1.64E-05 >>> 10392551 2.4626765 0.043472 4.259277 4.614051 >>> 261.1260083 2.31E-05 3.27E-05 >>> 10368320 3.76314325 -0.0914745 4.3878135 5.91131275 >>> 377.3505032 1.04E-05 1.83E-05 >>> 10587807 2.69620575 0.4540285 3.8056845 4.82606225 >>> 325.4919972 1.43E-05 2.28E-05 >>> 10504815 5.27270925 -3.0226305 3.7930855 5.65793675 >>> 458.5242305 6.81E-06 1.42E-05 >>> 10357489 2.90779925 -0.4520565 3.9689735 4.66625775 >>> 286.4197106 1.89E-05 2.80E-05 >>> 10481521 6.637718 -1.320705 3.757882 7.8563065 >>> 794.2002708 2.06E-06 9.55E-06 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 13.5 years ago James W. MacDonald 68k

Login before adding your answer.