Deciding on a cut off after QC

0

Entering edit mode

Ankit Pal ▴ 230

@ankit-pal-1242

Last seen 9.6 years ago

Dear All, I'm using LIMMA to analyse a set of GPR files. I used the weight fuction to apply QC parameter threshold values recommended by Genepix. The code for the same is >myfun <- function(x,threshold=55){ + okred <- abs(x[,"% > B635+2SD"]) < threshold + okgreen <- abs(x[,"% > B532+2SD"]) < threshold + as.numeric(okgreen & okred) } On completion of the analysis, all the spots showed up in the results file inspite of being flagged off. I understand that on being flagged off by limma (wt = 0), the spots are not considered for further analysis. Is there any way they can be excluded from the final result file. Also, if I get an output of all the spots (38000 in my case) how do I decide on a cut off. Do I use the rank or something else? Waiting eagerly for a reply, thank you, -Ankit Stay connected, organized, and protected. Take the tour:

limma limma • 1.7k views

ADD COMMENT • link 19.0 years ago Ankit Pal ▴ 230

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 39 minutes ago

WEHI, Melbourne, Australia

> Date: Sun, 15 May 2005 21:34:18 -0700 (PDT) > From: Ankit Pal <pal_ankit2000@yahoo.com> > Subject: [BioC] Deciding on a cut off after QC > To: bioconductor@stat.math.ethz.ch > > Dear All, > I'm using LIMMA to analyse a set of GPR files. > I used the weight fuction to apply QC parameter > threshold values recommended by Genepix. > The code for the same is > >>myfun <- function(x,threshold=55){ > + okred <- abs(x[,"% > B635+2SD"]) < threshold > + okgreen <- abs(x[,"% > B532+2SD"]) < threshold > + as.numeric(okgreen & okred) > } > > On completion of the analysis, all the spots showed > up in the results file inspite of being flagged off. I > understand that on being flagged off by limma (wt = > 0), the spots are not considered for further analysis. > Is there any way they can be excluded from the final > result file. What result file? > Also, if I get an output of all the spots (38000 in my > case) how do I decide on a cut off. Do I use the rank > or something else? What output? Gordon > Waiting eagerly for a reply, > thank you, > -Ankit

ADD COMMENT • link 19.0 years ago Gordon Smyth 50k

0

Entering edit mode

Ankit Pal ▴ 230

@ankit-pal-1242

Last seen 9.6 years ago

Dear Dr Smyth, I'm sorry for not having specified which result file. It is the final result summary we get after we give the command Resultfile <- topTable(fit,n=200, adjust="fdr") A sample result file has been attached. The code I used for my analysis is > targets <- readTargets("target.txt") #The QC filter > myfun <- function(x,threshold=55){ + okred <- abs(x[,"% > B635+2SD"]) > threshold + okgreen <- abs(x[,"% > B532+2SD"]) > threshold + okflag <- abs(x[,"Flags"]) > 0 + okRGN <- abs(x[,"Rgn R˛"]) > 0.6 + as.numeric(okgreen || okred || okflag || okRGN) + } #end of QC filter > RG_7 <- read.maimages(targets$FileName, source="genepix",wt.fun=myfun) > RG_7$genes <- readGAL() > RG_7$printer <- getLayout(RG_7$genes) > MA_7 <- normalizeWithinArrays(RG_7,method="loess") > MA_7 <- normalizeBetweenArrays(MA_7) > fit_7 <- lmFit(MA_7, design=c(1,-1,1,-1)) > fit_7 <- eBayes(fit_7) > options(digits=3) > Resultfile_7 <- topTable(fit_7, n=39000, adjust="fdr") > Resdat_7 <-data.frame(Resultfile_7) > write.table(Resdat_,file='Result.csv',quote = FALSE, sep = "\t") I understand that the spots that do not qualify the QC filter are given a weight of "0" by limma and are not considered for normalization and will not affect the analysis. The result file I get contains all the spots (38000) in my case. Didn't the spots that were bad get removed from the final result? If not what is the cut off value (B, p etc) that I need to use to get a set of reliable spots(I cant use all the 38000) from my result file for my analysis. Is there a fixed formula to derive the same as the values vary with the analysis. Waiting for your reply, Thank you, -Ankit --- Gordon K Smyth <smyth@wehi.edu.au> wrote: > > Date: Sun, 15 May 2005 21:34:18 -0700 (PDT) > > From: Ankit Pal <pal_ankit2000@yahoo.com> > > Subject: [BioC] Deciding on a cut off after QC > > To: bioconductor@stat.math.ethz.ch > > > > Dear All, > > I'm using LIMMA to analyse a set of GPR files. > > I used the weight fuction to apply QC parameter > > threshold values recommended by Genepix. > > The code for the same is > > > >>myfun <- function(x,threshold=55){ > > + okred <- abs(x[,"% > B635+2SD"]) < threshold > > + okgreen <- abs(x[,"% > B532+2SD"]) < threshold > > + as.numeric(okgreen & okred) > > } > > > > On completion of the analysis, all the spots > showed > > up in the results file inspite of being flagged > off. I > > understand that on being flagged off by limma (wt > = > > 0), the spots are not considered for further > analysis. > > Is there any way they can be excluded from the > final > > result file. > > What result file? > > > Also, if I get an output of all the spots (38000 > in my > > case) how do I decide on a cut off. Do I use the > rank > > or something else? > > What output? > > Gordon > > > Waiting eagerly for a reply, > > thank you, > > -Ankit > > > Stay connected, organized, and protected. Take the tour: -------------- next part -------------- Block Row Column ID Name M A t P.Value B 4004 5 29 12 NC_001552 9627219_516_rc | Sendai virus -2.33 2.57 -5.07 0.679 -2.72 4013 5 29 21 - MJ-2000-88_501 -2.72 4.05 -4.95 0.679 -2.76 34570 43 22 25 NM_019694 scl0056384.1_230 2.24 5.51 4.87 0.679 -2.78 3502 5 10 23 - scl13898.1.1_259 -2.60 4.37 -4.86 0.679 -2.78 32222 40 25 23 NM_153458 scl0001924.1_45 2.09 5.64 4.82 0.679 -2.79 31332 39 22 23 NM_025574 scl0066459.2_45 2.04 4.39 4.81 0.679 -2.80 7238 9 29 10 NC_001552 9627219_435_rc | Sendai virus -2.17 2.72 -4.71 0.679 -2.83 27862 35 14 5 - scl21854.2_284 2.05 3.97 4.70 0.679 -2.83 32256 40 27 3 NM_146139 scl0001929.1_72 2.00 5.97 4.66 0.679 -2.84 26620 33 28 3 NM_029357 scl0002178.1_304 2.45 6.12 4.60 0.679 -2.86 3275 5 2 12 XM_135092 scl36514.13.4_62 -3.29 3.07 -4.55 0.679 -2.88 34632 43 25 6 - EMPTY 2.20 7.70 4.55 0.679 -2.88 3490 5 10 11 - scl30794.6.1_93 -2.68 5.65 -4.54 0.679 -2.88 167 1 7 5 - scl37855.3.1_1 -2.11 3.18 -4.52 0.679 -2.89 31373 39 24 10 NM_007692 scl0012651.2_322 2.25 6.42 4.52 0.679 -2.89 31919 40 14 17 XM_130346 scl19143.19.1_31 2.29 4.18 4.51 0.679 -2.89 31455 39 27 11 NM_139236 scl0002764.1_60 2.00 5.00 4.51 0.679 -2.89 21694 27 25 12 NM_010325 scl000715.1_17 2.26 3.15 4.50 0.679 -2.89 4003 5 29 11 NC_001503 9626965_214_rc | Mouse mammary tumor virus -2.71 2.81 -4.48 0.679 -2.90 22370 28 20 14 NM_008645 scl017836.18_18 2.70 6.33 4.43 0.679 -2.92 244 1 10 1 - scl45289.2.1_35 -2.32 3.48 -4.42 0.679 -2.92 33082 41 27 20 BC042842 scl0003611.1_1554 2.04 5.38 4.41 0.679 -2.93 3278 5 2 15 XM_135940 scl54814.1.2_130 -2.62 3.57 -4.40 0.679 -2.93 32204 40 25 5 NM_008183 scl0014863.1_0 2.02 5.65 4.37 0.679 -2.94 7216 9 28 15 NM_008697 scl0002340.1_98 -2.42 3.53 -4.36 0.679 -2.94 29998 38 3 11 NM_011361 scl39091.7.1_0 2.40 3.78 4.35 0.679 -2.95 31475 39 28 4 M33467 scl0003417.1_10 1.83 4.12 4.30 0.679 -2.97 274 1 11 4 NM_178646 scl47812.1_441 -2.12 3.56 -4.28 0.679 -2.97 29838 37 27 12 NM_009372 scl0001623.1_60 2.52 5.75 4.28 0.679 -2.97 30621 38 26 13 NM_029620 scl0003537.1_10 2.17 5.43 4.27 0.679 -2.98 3926 5 26 15 NM_018865 scl0002564.1_1395 -2.16 3.13 -4.25 0.679 -2.98 3299 5 3 9 NM_173735 scl42940.8_203 -2.42 2.78 -4.24 0.679 -2.99 10018 13 12 13 NM_153786 scl38936.3.1_27 -2.73 2.57 -4.22 0.679 -3.00 37140 46 28 6 NM_009759 scl0002909.1_40 1.89 5.01 4.22 0.679 -3.00 3516 5 11 10 NM_173423 scl51436.1.1_135 -2.04 3.03 -4.19 0.679 -3.01 3461 5 9 9 - scl24020.1.1_3 -2.67 3.99 -4.19 0.679 -3.01 30367 38 17 2 NM_027853 scl071664.1_311 2.06 3.69 4.18 0.679 -3.01 7265 9 30 10 - MJ-250-11_13 -2.57 2.83 -4.16 0.679 -3.02 791 1 30 8 - MJ-1000-72_435 -2.39 3.26 -4.15 0.679 -3.02 28154 35 24 27 NM_010325 scl0014719.1_330 1.80 5.64 4.14 0.679 -3.02 31503 39 29 5 AF304551 IGHV1S131|AF304551|Ig_heavy_variable_1S131_48 2.18 6.61 4.14 0.679 -3.02 25577 32 19 12 NM_010561 scl016201.19_34 2.64 6.13 4.14 0.679 -3.02 33092 41 28 3 NM_028990 scl0001077.1_623 2.32 6.10 4.13 0.679 -3.03 26519 33 24 10 NM_018823 scl0054446.1_138 2.42 6.60 4.12 0.679 -3.03 254 1 10 11 NM_146446 scl28511.1.1_86 -2.08 3.04 -4.12 0.679 -3.03 3973 5 28 8 NM_023655 scl0003501.1_700 -1.97 2.70 -4.11 0.679 -3.03 7240 9 29 12 NC_001846 9629812_535 | Murine hepatitis virus -1.75 2.58 -4.11 0.679 -3.03 169 1 7 7 NM_023608 scl54748.8.1_13 -2.08 3.79 -4.10 0.679 -3.04 36918 46 19 27 NM_011949 scl026413.2_16 2.18 8.92 4.09 0.679 -3.04 31417 39 25 27 NM_184053 scl0001200.1_31 1.78 4.72 4.08 0.679 -3.05 26621 33 28 4 NM_183336 scl0002922.1_48 2.11 3.59 4.08 0.679 -3.05 3835 5 23 5 NM_013560 scl0015507.1_320 -2.16 2.86 -4.08 0.679 -3.05 3497 5 10 18 - scl43880.2_317 -1.97 3.93 -4.06 0.679 -3.05 37929 47 27 13 BC019769 scl0002315.1_12 1.89 4.69 4.06 0.679 -3.06 28828 36 19 27 XM_358370 scl017364.14_266 2.25 5.99 4.05 0.679 -3.06 38757 48 28 5 NM_177089 scl000578.1_190 2.03 4.57 4.05 0.679 -3.06 23424 29 29 16 - MJ-3000-114_1506 2.56 5.87 4.04 0.679 -3.06 29003 36 26 13 BC066093 scl0002019.1_208 2.77 8.09 4.04 0.679 -3.06 13281 17 13 13 NM_025872 scl1412.1.1_281 -1.77 3.17 -4.01 0.679 -3.08 25725 32 24 25 NM_019927 scl0023806.2_110 1.99 3.80 3.99 0.679 -3.08 35186 44 15 21 NM_175347 scl0106393.1_61 1.76 3.80 3.99 0.679 -3.08 32853 41 19 7 XM_488538 scl0319934.1_56 3.07 6.50 3.97 0.679 -3.09 171 1 7 9 - scl41474.1.1_55 -3.29 2.83 -3.97 0.679 -3.09 38779 48 28 27 AE000663 TRBV5|AE000663|T_cell_receptor_beta_variable_5_143 2.13 4.85 3.97 0.679 -3.09 37127 46 27 20 AK031926 scl0003274.1_34 2.08 6.41 3.96 0.679 -3.09 3761 5 20 12 NM_008852 scl018742.1_85 -1.79 2.58 -3.96 0.679 -3.09 308 1 12 11 XM_488860 scl14835.1.1_101 -2.82 2.90 -3.95 0.679 -3.10 97 1 4 16 NM_013633 scl50775.4.1_25 -2.34 3.36 -3.94 0.679 -3.10 33223 42 2 27 NM_019420 scl50036.1.4_192 3.04 8.01 3.94 0.679 -3.10 34200 43 9 6 - scl25225.3.1_30 1.79 5.43 3.93 0.679 -3.11 32104 40 21 13 - scl0077969.1_1 2.09 5.59 3.93 0.679 -3.11 35566 44 29 23 - MJ-1000-76_241 1.95 7.52 3.90 0.679 -3.12 59 1 3 5 NM_025696 scl52928.27_344 -1.93 3.95 -3.89 0.679 -3.12 31204 39 18 3 NM_207550 scl0404310.1_281 3.68 9.19 3.89 0.679 -3.12 30594 38 25 13 AF124385 scl0001994.1_3 2.97 6.47 3.89 0.679 -3.12 27142 34 17 13 - scl0320791.1_13 2.38 6.91 3.89 0.679 -3.12 7021 9 21 9 NM_172824 scl00239839.2_254 -2.23 4.00 -3.89 0.679 -3.12 25810 32 28 2 NM_023061 scl0003554.1_4 1.85 6.25 3.88 0.679 -3.13 25635 32 21 16 NM_207298 scl0099151.1_201 2.40 6.52 3.87 0.679 -3.13 31258 39 20 3 NM_009686 scl011787.1_19 1.79 4.94 3.87 0.679 -3.13 6700 9 9 12 NM_027614 scl43167.2.1_10 -2.01 3.32 -3.87 0.679 -3.13 15674 20 12 6 NM_173421 scl47999.3_178 1.84 5.89 3.86 0.679 -3.13 231 1 9 15 - scl53117.1.4_247 -2.24 3.82 -3.86 0.679 -3.14 3282 5 2 19 NM_018871 scl25917.6_37 -2.95 3.95 -3.85 0.679 -3.14 217 1 9 1 NM_029681 scl25936.3.1_82 -1.72 3.71 -3.85 0.679 -3.14 30669 38 28 7 BC044883 scl0004073.1_36 2.61 7.36 3.84 0.679 -3.14 32274 40 27 21 NM_029945 scl0001884.1_59 1.72 4.92 3.84 0.679 -3.14 37809 47 23 1 XM_129809 scl0070155.1_99 2.29 8.15 3.83 0.679 -3.15 35928 45 13 8 NM_170758 scl40740.8_132 -2.30 3.37 -3.83 0.679 -3.15 32641 41 11 11 XM_484710 scl52179.17.1_47 2.15 6.46 3.83 0.679 -3.15 21676 27 24 21 NM_175657 scl00319161.1_8 2.12 5.11 3.82 0.679 -3.15 29680 37 21 16 - scl0066491.1_300 1.66 5.04 3.81 0.679 -3.15 8796 11 27 4 NM_133798 scl0002392.1_125 3.35 5.83 3.80 0.679 -3.16 87 1 4 6 XM_355937 scl32097.7.1_17 -2.36 4.67 -3.79 0.679 -3.16 28118 35 23 18 NM_009567 scl0022755.1_133 1.73 4.77 3.78 0.679 -3.17 4107 6 3 8 NM_013465 scl49315.8.1_6 -2.93 4.70 -3.78 0.679 -3.17 37052 46 24 26 NM_026293 scl0067652.2_149 1.86 4.99 3.77 0.679 -3.17 33140 41 29 24 - MJ-3000-103_2290 1.66 5.11 3.77 0.679 -3.17 32208 40 25 9 - EMPTY 1.85 6.74 3.77 0.679 -3.17 14776 19 8 25 - scl3899.5.1_15 2.48 6.50 3.77 0.679 -3.17 33200 42 2 4 NM_011118 scl0018812.1_77 -2.16 3.47 -3.76 0.679 -3.18 36336 45 28 11 BC054076 scl0003827.1_21 1.67 5.36 3.75 0.679 -3.18 30701 38 29 12 NC_002512 20198505_5605 | Rat cytomegalovirus 1.65 4.92 3.73 0.679 -3.19 38722 48 26 24 NM_025578 scl0001137.1_14 2.12 7.26 3.73 0.679 -3.19 30244 38 12 14 NM_011079 scl25995.10_72 1.75 5.06 3.71 0.679 -3.20 746 1 28 17 NM_134046 scl000016.1_3 -1.73 3.41 -3.71 0.679 -3.20

ADD COMMENT • link 19.0 years ago Ankit Pal ▴ 230

0

Entering edit mode

On Mon, May 16, 2005 9:49 pm, Ankit Pal said: > Dear Dr Smyth, > I'm sorry for not having specified which result file. > It is the final result summary we get after we give > the command > Resultfile <- topTable(fit,n=200, adjust="fdr") > A sample result file has been attached. > The code I used for my analysis is > >> targets <- readTargets("target.txt") > > #The QC filter >> myfun <- function(x,threshold=55){ > + okred <- abs(x[,"% > B635+2SD"]) > threshold > + okgreen <- abs(x[,"% > B532+2SD"]) > threshold > + okflag <- abs(x[,"Flags"]) > 0 > + okRGN <- abs(x[,"Rgn R?"]) > 0.6 > + as.numeric(okgreen || okred || okflag || okRGN) > + } > #end of QC filter > >> RG_7 <- read.maimages(targets$FileName, > source="genepix",wt.fun=myfun) >> RG_7$genes <- readGAL() As I said last week, this command is not needed with GenePix data. You should omit it. >> RG_7$printer <- getLayout(RG_7$genes) >> MA_7 <- normalizeWithinArrays(RG_7,method="loess") >> MA_7 <- normalizeBetweenArrays(MA_7) >> fit_7 <- lmFit(MA_7, design=c(1,-1,1,-1)) >> fit_7 <- eBayes(fit_7) >> options(digits=3) >> Resultfile_7 <- topTable(fit_7, n=39000, > adjust="fdr") >> Resdat_7 <-data.frame(Resultfile_7) Resultfile_7 is already a data.frame. >> write.table(Resdat_,file='Result.csv',quote = FALSE, > sep = "\t") > > I understand that the spots that do not qualify the QC > filter are given a weight of "0" by limma and are not > considered for normalization and will not affect the > analysis. > The result file I get contains all the spots (38000) > in my case. Mmm, this doesn't make sense. You have 4 arrays with 39000 spots on each array. Hence you have 4*39000 spots, not 39000. The output from topTable() gives a summarized log-ratio for each probe, not a result for each spot. Gordon > Didn't the spots that were bad get removed from the > final result? > If not what is the cut off value (B, p etc) that I > need to use to get a set of reliable spots(I cant use > all the 38000) from my result file for my analysis. > Is there a fixed formula to derive the same as the > values vary with the analysis. > Waiting for your reply, > Thank you, > -Ankit

ADD REPLY • link 19.0 years ago Gordon Smyth 50k

0

Entering edit mode

Ankit Pal ▴ 230

@ankit-pal-1242

Last seen 9.6 years ago

Dear Dr Smyth, I guess there has been a mistake in communication. There are 39000 spots on the array but the number of probes are lesser as there are replicates(not a fixed number) on the array. The output from toptable() gives me a total of 39000 rows, one for each spot. I need to give an output of the spots most significant in this list. What is the cut off value(p or B or any other) I need to use above which I can consider a set of differentially expressed genes significant enough for further use (eg :function based clustering of all significantly overexpressed genes). I cannot use all the genes in the topTable() output. Where do I put the cut off? Is there a way I can calculate it? sincerely, -Ankit --- Gordon K Smyth <smyth@wehi.edu.au> wrote: > On Mon, May 16, 2005 9:49 pm, Ankit Pal said: > > Dear Dr Smyth, > > I'm sorry for not having specified which result > file. > > It is the final result summary we get after we > give > > the command > > Resultfile <- topTable(fit,n=200, adjust="fdr") > > A sample result file has been attached. > > The code I used for my analysis is > > > >> targets <- readTargets("target.txt") > > > > #The QC filter > >> myfun <- function(x,threshold=55){ > > + okred <- abs(x[,"% > B635+2SD"]) > threshold > > + okgreen <- abs(x[,"% > B532+2SD"]) > threshold > > + okflag <- abs(x[,"Flags"]) > 0 > > + okRGN <- abs(x[,"Rgn R˛"]) > 0.6 > > + as.numeric(okgreen || okred || okflag || okRGN) > > + } > > #end of QC filter > > > >> RG_7 <- read.maimages(targets$FileName, > > source="genepix",wt.fun=myfun) > >> RG_7$genes <- readGAL() > > As I said last week, this command is not needed with > GenePix data. You should omit it. > > >> RG_7$printer <- getLayout(RG_7$genes) > >> MA_7 <- > normalizeWithinArrays(RG_7,method="loess") > >> MA_7 <- normalizeBetweenArrays(MA_7) > >> fit_7 <- lmFit(MA_7, design=c(1,-1,1,-1)) > >> fit_7 <- eBayes(fit_7) > >> options(digits=3) > >> Resultfile_7 <- topTable(fit_7, n=39000, > > adjust="fdr") > >> Resdat_7 <-data.frame(Resultfile_7) > > Resultfile_7 is already a data.frame. > > >> write.table(Resdat_,file='Result.csv',quote = > FALSE, > > sep = "\t") > > > > I understand that the spots that do not qualify > the QC > > filter are given a weight of "0" by limma and are > not > > considered for normalization and will not affect > the > > analysis. > > The result file I get contains all the spots > (38000) > > in my case. > > Mmm, this doesn't make sense. You have 4 arrays > with 39000 spots on each array. Hence you have > 4*39000 spots, not 39000. The output from > topTable() gives a summarized log-ratio for each > probe, > not a result for each spot. > > Gordon > > > Didn't the spots that were bad get removed from > the > > final result? > > If not what is the cut off value (B, p etc) that I > > need to use to get a set of reliable spots(I cant > use > > all the 38000) from my result file for my > analysis. > > Is there a fixed formula to derive the same as the > > values vary with the analysis. > > Waiting for your reply, > > Thank you, > > -Ankit > > >

ADD COMMENT • link 19.0 years ago Ankit Pal ▴ 230

0

Entering edit mode

See comments below. On Mon, 2005-05-16 at 05:42 -0700, Ankit Pal wrote: > Dear Dr Smyth, > I guess there has been a mistake in communication. > There are 39000 spots on the array but the number of > probes are lesser as there are replicates(not a fixed > number) on the array. > The output from toptable() gives me a total of 39000 > rows, one for each spot. > I need to give an output of the spots most significant > in this list. I do not understand this clearly. Do you mean you have some duplicated spots or is there redundancy when you convert to something common like unigene ID ? If the former, I believe there are some facilities within LIMMA for equal number duplications per spots but I am not sure. For the latter, you could use tapply or something to find the min/max of the statistics. > What is the cut off value(p or B or any other) I need > to use above which I can consider a set of > differentially expressed genes significant enough for > further use (eg :function based clustering of all > significantly overexpressed genes). > I cannot use all the genes in the topTable() output. > Where do I put the cut off? > Is there a way I can calculate it? Selection of p-value threshold is arbitrary. However it would be a good idea to adjust for multiple comparison. My preferred method is the FDR by Benjamini and Hochberg but there are many other methods too each with a different meaning. I will illustrate FDR using the slightly modified example from help(topTable) : set.seed(123) # for reproducibility only - do not use this in real life M <- matrix(rnorm(10000*6,sd=0.3), 10000, 6) M[1:100, 1:3] <- M[1:100, 1:3] + 2 design <- cbind( First3Arrays=c(1,1,1,0,0,0), Last3Arrays=c(0,0,0,1,1,1)) fit <- lmFit(M, design=design) fit <- eBayes(fit) tab <- topTable( fit, adjust.method="fdr", sort.by="p", n=nrow(M) ) positives <- rownames(tab)[ which(tab[ , "P.Value"] < 0.05) ] The "positives" are the names of genes that we declare as differentially expression and it is expected that 5% of these genes are false positives on average. Note that in this case we can explicitly calculate the FP rate as follows because we have simulated the datasets. mean( ifelse( as.numeric(positives) <= 100, 0, 1 ) ) 0.06542056 I am sure Gordon Symth has a far more elegant and flexible version of this documented somewhere in his user guide. Regards, Adai > sincerely, > -Ankit > > --- Gordon K Smyth <smyth@wehi.edu.au> wrote: > > On Mon, May 16, 2005 9:49 pm, Ankit Pal said: > > > Dear Dr Smyth, > > > I'm sorry for not having specified which result > > file. > > > It is the final result summary we get after we > > give > > > the command > > > Resultfile <- topTable(fit,n=200, adjust="fdr") > > > A sample result file has been attached. > > > The code I used for my analysis is > > > > > >> targets <- readTargets("target.txt") > > > > > > #The QC filter > > >> myfun <- function(x,threshold=55){ > > > + okred <- abs(x[,"% > B635+2SD"]) > threshold > > > + okgreen <- abs(x[,"% > B532+2SD"]) > threshold > > > + okflag <- abs(x[,"Flags"]) > 0 > > > + okRGN <- abs(x[,"Rgn R"]) > 0.6 > > > + as.numeric(okgreen || okred || okflag || okRGN) > > > + } > > > #end of QC filter > > > > > >> RG_7 <- read.maimages(targets$FileName, > > > source="genepix",wt.fun=myfun) > > >> RG_7$genes <- readGAL() > > > > As I said last week, this command is not needed with > > GenePix data. You should omit it. > > > > >> RG_7$printer <- getLayout(RG_7$genes) > > >> MA_7 <- > > normalizeWithinArrays(RG_7,method="loess") > > >> MA_7 <- normalizeBetweenArrays(MA_7) > > >> fit_7 <- lmFit(MA_7, design=c(1,-1,1,-1)) > > >> fit_7 <- eBayes(fit_7) > > >> options(digits=3) > > >> Resultfile_7 <- topTable(fit_7, n=39000, > > > adjust="fdr") > > >> Resdat_7 <-data.frame(Resultfile_7) > > > > Resultfile_7 is already a data.frame. > > > > >> write.table(Resdat_,file='Result.csv',quote = > > FALSE, > > > sep = "\t") > > > > > > I understand that the spots that do not qualify > > the QC > > > filter are given a weight of "0" by limma and are > > not > > > considered for normalization and will not affect > > the > > > analysis. > > > The result file I get contains all the spots > > (38000) > > > in my case. > > > > Mmm, this doesn't make sense. You have 4 arrays > > with 39000 spots on each array. Hence you have > > 4*39000 spots, not 39000. The output from > > topTable() gives a summarized log-ratio for each > > probe, > > not a result for each spot. > > > > Gordon > > > > > Didn't the spots that were bad get removed from > > the > > > final result? > > > If not what is the cut off value (B, p etc) that I > > > need to use to get a set of reliable spots(I cant > > use > > > all the 38000) from my result file for my > > analysis. > > > Is there a fixed formula to derive the same as the > > > values vary with the analysis. > > > Waiting for your reply, > > > Thank you, > > > -Ankit > > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 19.0 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

Dear Adai, Thank you for the detailed response. Correct me if I'm wrong. What you are saying is that after applying the "fdr" I give a cut off of 0.05 for the p-value and write all those spots into an object (positives). That means, I do not consider the B values. In essence, it is just a one sample t test afetr adjusting the p-value and taking into consideration all those spots that have a p-value < 0.05? >From my code in a previous mail to Gordon Smyth, I have done a quality control (QC) using parameters and threshold values prescribed by genepix. I know from experience, a large number of spots do not get through the filter. As I understand from LIMMA, a weight of "0" is given to any spot that does not get through the QC filter. How do I exclude these spots from the final result summary file? I need to get a set of significantly differentially expressed genes that have got through the QC filter. How do I do it using LIMMA? Thank you, -Ankit --- Adaikalavan Ramasamy <ramasamy@cancer.org.uk> wrote: > See comments below. > > On Mon, 2005-05-16 at 05:42 -0700, Ankit Pal wrote: > > Dear Dr Smyth, > > I guess there has been a mistake in communication. > > There are 39000 spots on the array but the number > of > > probes are lesser as there are replicates(not a > fixed > > number) on the array. > > The output from toptable() gives me a total of > 39000 > > rows, one for each spot. > > I need to give an output of the spots most > significant > > in this list. > > I do not understand this clearly. Do you mean you > have some duplicated > spots or is there redundancy when you convert to > something common like > unigene ID ? > > If the former, I believe there are some facilities > within LIMMA for > equal number duplications per spots but I am not > sure. For the latter, > you could use tapply or something to find the > min/max of the statistics. > > > What is the cut off value(p or B or any other) I > need > > to use above which I can consider a set of > > differentially expressed genes significant enough > for > > further use (eg :function based clustering of all > > significantly overexpressed genes). > > I cannot use all the genes in the topTable() > output. > > Where do I put the cut off? > > Is there a way I can calculate it? > > Selection of p-value threshold is arbitrary. However > it would be a good > idea to adjust for multiple comparison. My preferred > method is the FDR > by Benjamini and Hochberg but there are many other > methods too each with > a different meaning. I will illustrate FDR using the > slightly modified > example from help(topTable) : > > set.seed(123) # for reproducibility only - do not > use this in real life > > M <- matrix(rnorm(10000*6,sd=0.3), 10000, 6) > M[1:100, 1:3] <- M[1:100, 1:3] + 2 > design <- cbind( First3Arrays=c(1,1,1,0,0,0), > Last3Arrays=c(0,0,0,1,1,1)) > fit <- lmFit(M, design=design) > fit <- eBayes(fit) > > tab <- topTable( fit, adjust.method="fdr", > sort.by="p", n=nrow(M) ) > > positives <- rownames(tab)[ which(tab[ , "P.Value"] > < 0.05) ] > > The "positives" are the names of genes that we > declare as differentially > expression and it is expected that 5% of these genes > are false positives > on average. > > Note that in this case we can explicitly calculate > the FP rate as > follows because we have simulated the datasets. > > mean( ifelse( as.numeric(positives) <= 100, 0, 1 ) ) > 0.06542056 > > I am sure Gordon Symth has a far more elegant and > flexible version of > this documented somewhere in his user guide. > > Regards, Adai > > > > sincerely, > > -Ankit > > > > --- Gordon K Smyth <smyth@wehi.edu.au> wrote: > > > On Mon, May 16, 2005 9:49 pm, Ankit Pal said: > > > > Dear Dr Smyth, > > > > I'm sorry for not having specified which > result > > > file. > > > > It is the final result summary we get after we > > > give > > > > the command > > > > Resultfile <- topTable(fit,n=200, > adjust="fdr") > > > > A sample result file has been attached. > > > > The code I used for my analysis is > > > > > > > >> targets <- readTargets("target.txt") > > > > > > > > #The QC filter > > > >> myfun <- function(x,threshold=55){ > > > > + okred <- abs(x[,"% > B635+2SD"]) > threshold > > > > + okgreen <- abs(x[,"% > B532+2SD"]) > > threshold > > > > + okflag <- abs(x[,"Flags"]) > 0 > > > > + okRGN <- abs(x[,"Rgn R"]) > 0.6 > > > > + as.numeric(okgreen || okred || okflag || > okRGN) > > > > + } > > > > #end of QC filter > > > > > > > >> RG_7 <- read.maimages(targets$FileName, > > > > source="genepix",wt.fun=myfun) > > > >> RG_7$genes <- readGAL() > > > > > > As I said last week, this command is not needed > with > > > GenePix data. You should omit it. > > > > > > >> RG_7$printer <- getLayout(RG_7$genes) > > > >> MA_7 <- > > > normalizeWithinArrays(RG_7,method="loess") > > > >> MA_7 <- normalizeBetweenArrays(MA_7) > > > >> fit_7 <- lmFit(MA_7, design=c(1,-1,1,-1)) > > > >> fit_7 <- eBayes(fit_7) > > > >> options(digits=3) > > > >> Resultfile_7 <- topTable(fit_7, n=39000, > > > > adjust="fdr") > > > >> Resdat_7 <-data.frame(Resultfile_7) > > > > > > Resultfile_7 is already a data.frame. > > > > > > >> write.table(Resdat_,file='Result.csv',quote = > > > FALSE, > > > > sep = "\t") > > > > > > > > I understand that the spots that do not > qualify > > > the QC > > > > filter are given a weight of "0" by limma and > are > > > not > > > > considered for normalization and will not > affect > > > the > > > > analysis. > > > > The result file I get contains all the spots > > > (38000) > > > > in my case. > > > > > > Mmm, this doesn't make sense. You have 4 arrays > > > with 39000 spots on each array. Hence you have > > > 4*39000 spots, not 39000. The output from > > > topTable() gives a summarized log-ratio for each > > > probe, > > > not a result for each spot. > > > > > > Gordon > > > > > > > Didn't the spots that were bad get removed > from > > > the > > > > final result? > > > > If not what is the cut off value (B, p etc) > that I > > > > need to use to get a set of reliable spots(I > cant > > > use > > > > all the 38000) from my result file for my > > > analysis. > > > > Is there a fixed formula to derive the same as > the > > > > values vary with the analysis. > > > > Waiting for your reply, > > > > Thank you, > > > > -Ankit > > > > > > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >

ADD REPLY • link 19.0 years ago Ankit Pal ▴ 230

0

Entering edit mode

At 01:48 PM 17/05/2005, Ankit Pal wrote: >Dear Adai, >Thank you for the detailed response. >Correct me if I'm wrong. >What you are saying is that after applying the "fdr" >I give a cut off of 0.05 for the p-value and write all >those spots into an object (positives). >That means, I do not consider the B values. This is one way to proceed. Have you read the section in the User's Guide on "Statistics for differential expression" which discusses which statistic to use? >In essence, it is just a one sample t test afetr >adjusting the p-value and taking into consideration >all those spots that have a p-value < 0.05? > > From my code in a previous mail to Gordon Smyth, I >have done a quality control (QC) using parameters and >threshold values prescribed by genepix. >I know from experience, a large number of spots do not >get through the filter. >As I understand from LIMMA, a weight of "0" is given >to any spot that does not get through the QC filter. >How do I exclude these spots from the final result >summary file? These spots are already excluded. You seem to be confused by the fact that excluding a spot does not exclude the corresponding probe, since there are four spots for each probe. You will get a valid t-statistic and p-value for a probe if any one of the spots for that probe get a positive weight. (Note you can get a Bayesian t-statistic even for a probe with no residual df.) >I need to get a set of significantly differentially >expressed genes that have got through the QC filter. >How do I do it using LIMMA? This is what you have been getting all along. Gordon >Thank you, >-Ankit

ADD REPLY • link 19.0 years ago Gordon Smyth 50k

0

Entering edit mode

See comments below. If anyone finds them erroneous, please correct me. On Mon, 2005-05-16 at 20:48 -0700, Ankit Pal wrote: > Dear Adai, > Thank you for the detailed response. > Correct me if I'm wrong. > What you are saying is that after applying the "fdr" > I give a cut off of 0.05 for the p-value and write all > those spots into an object (positives). > That means, I do not consider the B values. In some sense, yes. In paragraph 2 of section 7 of [1], Symth says that posterior odds (B) depend on the estimate of the proportion of genes that are differentially expressed while moderated t-test (whose p-values are given in topTable output) does not. The B values are useful if you want to calculate posterior probabilities of differential expression. [1] Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, No. 1, Article 3. http://www.bepress.com/sagmb/vol3/iss1/art3/ > In essence, it is just a one sample t test afetr > adjusting the p-value and taking into consideration > all those spots that have a p-value < 0.05? No, this is based on moderated t-test where the variance in the denominator is shrunk towards a global variance like what SAM does. But unlike SAM, LIMMA has 1) a better theoretical derivation and shrinks the variance instead of standard deviation 2) shows that the moderated t-test follows a t-distribution with augmented degrees of freedom and thus no need to obtain p-values by permutation testing, hence computationally faster 3) more flexible to extend to other designs > >From my code in a previous mail to Gordon Smyth, I > have done a quality control (QC) using parameters and > threshold values prescribed by genepix. > I know from experience, a large number of spots do not > get through the filter. > As I understand from LIMMA, a weight of "0" is given > to any spot that does not get through the QC filter. > How do I exclude these spots from the final result > summary file? > I need to get a set of significantly differentially > expressed genes that have got through the QC filter. > How do I do it using LIMMA? I assume you have read the section 5.4 of the User Guide http://bioinf.wehi.edu.au/limma/usersguide.pdf I am not familiar with spot filtering nor genepix, so I cannot help you much here. But re-reading your previous mail, I noticed that you used something like "% > B635+2SD". Are you sure this is not "% > B635 + 2 SD". Note that the spaces matter ! Otherwise you can try renaming the columns to something like "green2sd" and "red2sd" just in case R is mis-reading the spaces somehow. Best wishes, Adai > Thank you, > -Ankit > > > > > --- Adaikalavan Ramasamy <ramasamy@cancer.org.uk> > wrote: > > > See comments below. > > > > On Mon, 2005-05-16 at 05:42 -0700, Ankit Pal wrote: > > > Dear Dr Smyth, > > > I guess there has been a mistake in communication. > > > There are 39000 spots on the array but the number > > of > > > probes are lesser as there are replicates(not a > > fixed > > > number) on the array. > > > The output from toptable() gives me a total of > > 39000 > > > rows, one for each spot. > > > I need to give an output of the spots most > > significant > > > in this list. > > > > I do not understand this clearly. Do you mean you > > have some duplicated > > spots or is there redundancy when you convert to > > something common like > > unigene ID ? > > > > If the former, I believe there are some facilities > > within LIMMA for > > equal number duplications per spots but I am not > > sure. For the latter, > > you could use tapply or something to find the > > min/max of the statistics. > > > > > What is the cut off value(p or B or any other) I > > need > > > to use above which I can consider a set of > > > differentially expressed genes significant enough > > for > > > further use (eg :function based clustering of all > > > significantly overexpressed genes). > > > I cannot use all the genes in the topTable() > > output. > > > Where do I put the cut off? > > > Is there a way I can calculate it? > > > > Selection of p-value threshold is arbitrary. However > > it would be a good > > idea to adjust for multiple comparison. My preferred > > method is the FDR > > by Benjamini and Hochberg but there are many other > > methods too each with > > a different meaning. I will illustrate FDR using the > > slightly modified > > example from help(topTable) : > > > > set.seed(123) # for reproducibility only - do not > > use this in real life > > > > M <- matrix(rnorm(10000*6,sd=0.3), 10000, 6) > > M[1:100, 1:3] <- M[1:100, 1:3] + 2 > > design <- cbind( First3Arrays=c(1,1,1,0,0,0), > > Last3Arrays=c(0,0,0,1,1,1)) > > fit <- lmFit(M, design=design) > > fit <- eBayes(fit) > > > > tab <- topTable( fit, adjust.method="fdr", > > sort.by="p", n=nrow(M) ) > > > > positives <- rownames(tab)[ which(tab[ , "P.Value"] > > < 0.05) ] > > > > The "positives" are the names of genes that we > > declare as differentially > > expression and it is expected that 5% of these genes > > are false positives > > on average. > > > > Note that in this case we can explicitly calculate > > the FP rate as > > follows because we have simulated the datasets. > > > > mean( ifelse( as.numeric(positives) <= 100, 0, 1 ) ) > > 0.06542056 > > > > I am sure Gordon Symth has a far more elegant and > > flexible version of > > this documented somewhere in his user guide. > > > > Regards, Adai > > > > > > > sincerely, > > > -Ankit > > > > > > --- Gordon K Smyth <smyth@wehi.edu.au> wrote: > > > > On Mon, May 16, 2005 9:49 pm, Ankit Pal said: > > > > > Dear Dr Smyth, > > > > > I'm sorry for not having specified which > > result > > > > file. > > > > > It is the final result summary we get after we > > > > give > > > > > the command > > > > > Resultfile <- topTable(fit,n=200, > > adjust="fdr") > > > > > A sample result file has been attached. > > > > > The code I used for my analysis is > > > > > > > > > >> targets <- readTargets("target.txt") > > > > > > > > > > #The QC filter > > > > >> myfun <- function(x,threshold=55){ > > > > > + okred <- abs(x[,"% > B635+2SD"]) > threshold > > > > > + okgreen <- abs(x[,"% > B532+2SD"]) > > > threshold > > > > > + okflag <- abs(x[,"Flags"]) > 0 > > > > > + okRGN <- abs(x[,"Rgn R"]) > 0.6 > > > > > + as.numeric(okgreen || okred || okflag || > > okRGN) > > > > > + } > > > > > #end of QC filter > > > > > > > > > >> RG_7 <- read.maimages(targets$FileName, > > > > > source="genepix",wt.fun=myfun) > > > > >> RG_7$genes <- readGAL() > > > > > > > > As I said last week, this command is not needed > > with > > > > GenePix data. You should omit it. > > > > > > > > >> RG_7$printer <- getLayout(RG_7$genes) > > > > >> MA_7 <- > > > > normalizeWithinArrays(RG_7,method="loess") > > > > >> MA_7 <- normalizeBetweenArrays(MA_7) > > > > >> fit_7 <- lmFit(MA_7, design=c(1,-1,1,-1)) > > > > >> fit_7 <- eBayes(fit_7) > > > > >> options(digits=3) > > > > >> Resultfile_7 <- topTable(fit_7, n=39000, > > > > > adjust="fdr") > > > > >> Resdat_7 <-data.frame(Resultfile_7) > > > > > > > > Resultfile_7 is already a data.frame. > > > > > > > > >> write.table(Resdat_,file='Result.csv',quote = > > > > FALSE, > > > > > sep = "\t") > > > > > > > > > > I understand that the spots that do not > > qualify > > > > the QC > > > > > filter are given a weight of "0" by limma and > > are > > > > not > > > > > considered for normalization and will not > > affect > > > > the > > > > > analysis. > > > > > The result file I get contains all the spots > > > > (38000) > > > > > in my case. > > > > > > > > Mmm, this doesn't make sense. You have 4 arrays > > > > with 39000 spots on each array. Hence you have > > > > 4*39000 spots, not 39000. The output from > > > > topTable() gives a summarized log-ratio for each > > > > probe, > > > > not a result for each spot. > > > > > > > > Gordon > > > > > > > > > Didn't the spots that were bad get removed > > from > > > > the > > > > > final result? > > > > > If not what is the cut off value (B, p etc) > > that I > > > > > need to use to get a set of reliable spots(I > > cant > > > > use > > > > > all the 38000) from my result file for my > > > > analysis. > > > > > Is there a fixed formula to derive the same as > > the > > > > > values vary with the analysis. > > > > > Waiting for your reply, > > > > > Thank you, > > > > > -Ankit > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com >

ADD REPLY • link 19.0 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

Ankit Pal ▴ 230

@ankit-pal-1242

Last seen 9.6 years ago

Dear Dr Smyth, I would prefer using the p-value with a threshold value of 0.05. In the case of an experiment(I sent you the code) I did, the reult file contains p-values (after fdr) in range of 0.96 - 0.97. How did I get such values and where do I place a cut off in this case? Am I doing something wrong? In the user guide it says "If none of the raw p-value are less than 1/G, where G is the number of genes included in the fit, then all of the adjusted p-values will the equal to 1.Since 1/G is about the expected size of the smallest p-values given purely random variation and uniform p-values, this means that there is no overall evidence of differential expression." I am not too sure I understand what it means. Should there be atleast one p-value < 1/G? I am attaching a sample data output file to this mail. What should be done in my case? Thank you, sincerely, -Ankit --- Gordon Smyth <smyth@wehi.edu.au> wrote: > At 01:48 PM 17/05/2005, Ankit Pal wrote: > >Dear Adai, > >Thank you for the detailed response. > >Correct me if I'm wrong. > >What you are saying is that after applying the > "fdr" > >I give a cut off of 0.05 for the p-value and write > all > >those spots into an object (positives). > >That means, I do not consider the B values. > > This is one way to proceed. Have you read the > section in the User's Guide > on "Statistics for differential expression" which > discusses which statistic > to use? > > >In essence, it is just a one sample t test afetr > >adjusting the p-value and taking into consideration > >all those spots that have a p-value < 0.05? > > > > From my code in a previous mail to Gordon Smyth, I > >have done a quality control (QC) using parameters > and > >threshold values prescribed by genepix. > >I know from experience, a large number of spots do > not > >get through the filter. > >As I understand from LIMMA, a weight of "0" is > given > >to any spot that does not get through the QC > filter. > >How do I exclude these spots from the final result > >summary file? > > These spots are already excluded. You seem to be > confused by the fact that > excluding a spot does not exclude the corresponding > probe, since there are > four spots for each probe. You will get a valid > t-statistic and p-value for > a probe if any one of the spots for that probe get a > positive weight. (Note > you can get a Bayesian t-statistic even for a probe > with no residual df.) > > >I need to get a set of significantly differentially > >expressed genes that have got through the QC > filter. > >How do I do it using LIMMA? > > This is what you have been getting all along. > > Gordon > > >Thank you, > >-Ankit > > __________________________________ -------------- next part -------------- Block Row Column ID Name M A t P.Value B 15254 19 26 17 NM_033613 scl0001747.1_14 1.4052 5.986 5.654 0.9627 -4.360 29822 37 26 23 AB117944 scl0003290.1_30 1.5988 5.571 5.557 0.9627 -4.362 18221 23 16 18 NM_033578 scl093703.1_214 1.3846 7.291 5.498 0.9627 -4.363 33734 42 21 25 NM_175677 scl00319236.1_170 1.2535 7.807 5.123 0.9627 -4.370 30787 39 2 18 - scl11230.1.1_125 1.5640 10.169 4.974 0.9627 -4.374 21676 27 24 21 NM_175657 scl00319161.1_8 1.3489 6.576 4.839 0.9627 -4.377 10983 14 18 7 NM_007999 scl014156.1_86 1.2005 4.078 4.821 0.9627 -4.377 26138 33 10 7 NM_028209 scl24036.11_204 1.4833 10.811 4.772 0.9627 -4.378 25635 32 21 16 NM_207298 scl0099151.1_201 1.2067 7.145 4.751 0.9627 -4.379 33449 42 11 10 - scl42410.1.3_0 -1.1950 6.997 -4.685 0.9627 -4.381 11683 15 14 6 - scl35668.26.1_1 1.3209 2.752 4.499 0.9627 -4.385 13694 17 28 21 - scl000055.1_1_REVCOMP 1.1755 6.008 4.496 0.9627 -4.386 14944 19 15 4 - scl27546.1.1069_39 -1.0644 6.182 -4.489 0.9627 -4.386 29762 37 24 17 NM_025284 scl0019240.2_329 1.2948 9.347 4.479 0.9627 -4.386 11789 15 18 4 XM_148353 scl067609.1_307 1.6845 3.191 4.433 0.9627 -4.387 2764 4 13 13 NM_178934 scl39088.5.1_30 1.1694 6.481 4.367 0.9627 -4.389 29678 37 21 14 NM_146075 scl00224640.2_275 1.2598 6.786 4.359 0.9627 -4.389 33435 42 10 23 - scl11006.1.1_298 -1.5154 4.933 -4.258 0.9627 -4.393 12481 16 13 22 NM_173731 scl36691.12_254 1.3582 2.918 4.248 0.9627 -4.393 19597 25 7 19 NM_147027 scl46618.1.187_149 2.2705 9.797 4.190 0.9627 -4.395 32290 40 28 10 NM_153798 scl0000113.1_1 1.0130 4.491 4.157 0.9627 -4.396 11762 15 17 4 - scl073362.2_30 1.4282 2.531 4.128 0.9627 -4.397 15866 20 19 9 - scl077449.2_20 1.3121 2.870 4.127 0.9627 -4.397 14762 19 8 11 NM_025536 scl47782.2_279 1.0271 5.729 4.109 0.9627 -4.397 27464 34 29 11 NC_001819 9629514_329 | Rauscher murine leukemia virus 1.2519 9.932 4.048 0.9627 -4.399 32482 41 5 14 NM_016875 scl41354.10.1_7 1.3216 10.076 4.030 0.9627 -4.400 9365 12 18 7 - scl069350.1_47 1.2922 4.009 4.003 0.9627 -4.401 15841 20 18 11 NM_027268 scl069938.2_27 1.7431 2.031 4.000 0.9627 -4.401 7426 10 6 10 - scl28349.3_240 1.1551 7.934 3.981 0.9627 -4.402 33257 42 4 7 NM_028243 scl32376.11_593 0.9688 6.692 3.977 0.9627 -4.402 15479 20 4 27 NM_146551 scl5601.1.1_211 0.9451 4.314 3.962 0.9627 -4.402 5896 8 9 17 NM_028108 scl49079.8_50 -1.2436 6.015 -3.962 0.9627 -4.402 21375 27 13 17 NM_007830 scl16375.7_22 0.9112 6.972 3.909 0.9627 -4.404 21793 27 29 3 K00604 IGHV1S16|K00604|Ig_heavy_variable_1S16_234 0.9863 6.598 3.904 0.9627 -4.404 29994 38 3 7 NM_018827 scl33727.4.1_6 1.0797 7.621 3.891 0.9627 -4.405 13393 17 17 17 NM_019737 scl056386.2_4 1.2250 3.139 3.871 0.9627 -4.406 15345 19 29 27 - AFFX- BioB-M-at_62 0.9412 6.040 3.864 0.9627 -4.406 29758 37 24 13 NM_013655 scl0020315.1_134 0.9538 5.124 3.861 0.9627 -4.406 15716 20 13 21 NM_175528 scl29095.10_195 1.3016 2.642 3.847 0.9627 -4.406 10670 14 6 18 XM_134476 scl34486.9.1_200 1.7255 9.395 3.839 0.9627 -4.407

ADD COMMENT • link 19.0 years ago Ankit Pal ▴ 230

0

Entering edit mode

At 02:58 PM 17/05/2005, Ankit Pal wrote: >Dear Dr Smyth, >I would prefer using the p-value with a threshold >value of 0.05. >In the case of an experiment(I sent you the code) I >did, the reult file contains p-values (after fdr) in >range of 0.96 - 0.97. Clear result. No selected genes. End of story. >How did I get such values and where do I place a cut >off in this case? >Am I doing something wrong? >In the user guide it says >"If none of the raw p-value are less than 1/G, >where G is the number of genes included in the fit, >then all of the adjusted p-values will the >equal to 1.Since 1/G is about the expected size of the >smallest p-values given purely random variation and >uniform p-values, this means that there >is no overall evidence of differential expression." > >I am not too sure I understand what it means. >Should there be atleast one p-value < 1/G? No there need not be. The above text which you have quoted already tells you so. What would be the meaning of statistical significance, if you were guaranteed to get a significant result in every single experiment that you did? >I am attaching a sample data output file to this mail. >What should be done in my case? Perhaps take a more sophisticated approach to background correction and spot filtering. Or else accept that you don't have statistical significance for this data. Gordon >Thank you, >sincerely, >-Ankit

ADD REPLY • link 19.0 years ago Gordon Smyth 50k

Login before adding your answer.