use of voom function with attract package
2
0
Entering edit mode
Guest User ★ 12k
@guest-user-4897
Last seen 8.1 years ago
Dear Bioconductor, I am trying to use the attract package to find the processes that are differentially activated between cell types of the same lineage, using RNA-Seq data. Since the attract package is designed to work with microarray data, I decided to use the voom function to transform my data and change the findAttractors() function accordingly, to accommodate this type of data. Since this is not trivial, I want to make sure that I am using the output from the voom function correctly. The main part of the findAttractors() uses lm to model the expression in relation to the cell type (group) and then an anova to get the F statistic for each gene: fstat <- apply(dat.detect.wkegg, 1, function(y, x) { anova(lm(y ~ x))[[4]][1]}, x = group) where dat.detect.wkegg is the matrix of the normalized expression values with the genes per row and the samples per column. (To give some more context, the function then uses the log2 values of the fstat and does a t test between the gene values of a specific pathway vs all the gene values to identify the significant pathways. ) What I want to do is change the above to: counts_data <- DGEList(counts=rnaseq,group=celltype) counts_data_norm <- calcNormFactors(counts_data) design <- model.matrix( ~ celltype) anal_voom <- voom(counts_data_norm, design) dat.detect.wkegg <- as.list(as.data.frame(t(anal_voom$E))) voom_weights <- as.list(as.data.frame(t(anal_voom$weights))) fstat <- mapply(function(y, w, group) {anova(lm(y ~ group, weights=w))[[4]][1]}, dat.detect.wkegg, voom_weights, MoreArgs = list(group=celltype)) Is this the way to go with using the weights from voom, or am I getting this very wrong? Many thanks in advance for your reply! Best wishes, Emmanouela -- output of sessionInfo(): sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.ISO-8859-1 LC_NUMERIC=C LC_TIME=en_GB.ISO-8859-1 LC_COLLATE=en_GB.ISO-8859-1 LC_MONETARY=en_GB.ISO-8859-1 LC_MESSAGES=en_GB.ISO-8859-1 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] attract_1.14.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 GO.db_2.10.1 Matrix_1.1-3 cluster_1.15.2 annotate_1.40.1 org.Mm.eg.db_2.10.1 [10] KEGG.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.13 loaded via a namespace (and not attached): [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.1 GSEABase_1.24.0 IRanges_1.20.7 lattice_0.20-29 RBGL_1.38.0 splines_3.0.1 [9] stats4_3.0.1 survival_2.37-7 tcltk_3.0.1 tools_3.0.1 XML_3.98-1.1 xtable_1.7-3 -- Sent via the guest posting facility at bioconductor.org.
Pathways GO attract Pathways GO attract • 2.3k views
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 2.1 years ago
Scripps Research, La Jolla, CA
Hi Emmanouela, I believe that is the correct way to use the voom weights in a call to lm. However, if you are using voom, you might also want to use limma as you normally would to compute moderated F statistics. Just use topTable with n=Inf and sort="none" to get the full table with no reordering of rows, and pull out the F column. -Ryan On Tue Apr 15 09:44:42 2014, Emmanouela Repapi [guest] wrote: > > Dear Bioconductor, > > I am trying to use the attract package to find the processes that are differentially activated between cell types of the same lineage, using RNA-Seq data. Since the attract package is designed to work with microarray data, I decided to use the voom function to transform my data and change the findAttractors() function accordingly, to accommodate this type of data. Since this is not trivial, I want to make sure that I am using the output from the voom function correctly. > > The main part of the findAttractors() uses lm to model the expression in relation to the cell type (group) and then an anova to get the F statistic for each gene: > fstat <- apply(dat.detect.wkegg, 1, function(y, x) { > anova(lm(y ~ x))[[4]][1]}, x = group) > where dat.detect.wkegg is the matrix of the normalized expression values with the genes per row and the samples per column. > (To give some more context, the function then uses the log2 values of the fstat and does a t test between the gene values of a specific pathway vs all the gene values to identify the significant pathways. ) > > What I want to do is change the above to: > > counts_data <- DGEList(counts=rnaseq,group=celltype) > counts_data_norm <- calcNormFactors(counts_data) > > design <- model.matrix( ~ celltype) > anal_voom <- voom(counts_data_norm, design) > dat.detect.wkegg <- as.list(as.data.frame(t(anal_voom$E))) > voom_weights <- as.list(as.data.frame(t(anal_voom$weights))) > > fstat <- mapply(function(y, w, group) {anova(lm(y ~ group, weights=w))[[4]][1]}, > dat.detect.wkegg, voom_weights, MoreArgs = list(group=celltype)) > > Is this the way to go with using the weights from voom, or am I getting this very wrong? > > Many thanks in advance for your reply! > > Best wishes, > Emmanouela > > > > > -- output of sessionInfo(): > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.ISO-8859-1 LC_NUMERIC=C LC_TIME=en_GB.ISO-8859-1 LC_COLLATE=en_GB.ISO-8859-1 LC_MONETARY=en_GB.ISO-8859-1 LC_MESSAGES=en_GB.ISO-8859-1 > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] attract_1.14.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 GO.db_2.10.1 Matrix_1.1-3 cluster_1.15.2 annotate_1.40.1 org.Mm.eg.db_2.10.1 > [10] KEGG.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.13 > > loaded via a namespace (and not attached): > [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.1 GSEABase_1.24.0 IRanges_1.20.7 lattice_0.20-29 RBGL_1.38.0 splines_3.0.1 > [9] stats4_3.0.1 survival_2.37-7 tcltk_3.0.1 tools_3.0.1 XML_3.98-1.1 xtable_1.7-3 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode
@gordon-smyth
Last seen 26 minutes ago
WEHI, Melbourne, Australia
Dear Emmanouela, The limma package is designed to fit linear models, and it can compute t-statistics and F-statistics faster than making your own loop to lm(). If you want F-statistics for distinguishing the cell types, why not: fit <- lmFit(anal_voom, design) fit <- eBayes(fit[,-1]) Then the F-statistics will be in fit$F. If you want to know whether a particular KEGG pathway tends to have larger F-statistics, you could also use: geneSetTest(index, fit$F) where index selects genes in the pathway. If there are only two cell types, a better way would be: camera(anal_voom, index, design) With camera, index could be a list of index vectors for all the KEGG pathways at once. Best wishes Gordon > Date: Tue, 15 Apr 2014 09:44:42 -0700 (PDT) > From: "Emmanouela Repapi [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, emmanouela.repapi at imm.ox.ac.uk > Subject: [BioC] use of voom function with attract package > > > Dear Bioconductor, > > I am trying to use the attract package to find the processes that are > differentially activated between cell types of the same lineage, using > RNA-Seq data. Since the attract package is designed to work with > microarray data, I decided to use the voom function to transform my data > and change the findAttractors() function accordingly, to accommodate > this type of data. Since this is not trivial, I want to make sure that I > am using the output from the voom function correctly. > > The main part of the findAttractors() uses lm to model the expression in relation to the cell type (group) and then an anova to get the F statistic for each gene: > fstat <- apply(dat.detect.wkegg, 1, function(y, x) { > anova(lm(y ~ x))[[4]][1]}, x = group) > where dat.detect.wkegg is the matrix of the normalized expression values with the genes per row and the samples per column. > (To give some more context, the function then uses the log2 values of the fstat and does a t test between the gene values of a specific pathway vs all the gene values to identify the significant pathways. ) > > What I want to do is change the above to: > > counts_data <- DGEList(counts=rnaseq,group=celltype) > counts_data_norm <- calcNormFactors(counts_data) > > design <- model.matrix( ~ celltype) > anal_voom <- voom(counts_data_norm, design) > dat.detect.wkegg <- as.list(as.data.frame(t(anal_voom$E))) > voom_weights <- as.list(as.data.frame(t(anal_voom$weights))) > > fstat <- mapply(function(y, w, group) {anova(lm(y ~ group, weights=w))[[4]][1]}, > dat.detect.wkegg, voom_weights, MoreArgs = list(group=celltype)) > > Is this the way to go with using the weights from voom, or am I getting this very wrong? > > Many thanks in advance for your reply! > > Best wishes, > Emmanouela > > > > > -- output of sessionInfo(): > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.ISO-8859-1 LC_NUMERIC=C LC_TIME=en_GB.ISO-8859-1 LC_COLLATE=en_GB.ISO-8859-1 LC_MONETARY=en_GB.ISO-8859-1 LC_MESSAGES=en_GB.ISO-8859-1 > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] attract_1.14.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 GO.db_2.10.1 Matrix_1.1-3 cluster_1.15.2 annotate_1.40.1 org.Mm.eg.db_2.10.1 > [10] KEGG.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.13 > > loaded via a namespace (and not attached): > [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.1 GSEABase_1.24.0 IRanges_1.20.7 lattice_0.20-29 RBGL_1.38.0 splines_3.0.1 > [9] stats4_3.0.1 survival_2.37-7 tcltk_3.0.1 tools_3.0.1 XML_3.98-1.1 xtable_1.7-3 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
0
Entering edit mode
Dear Gordon and Ryan, Thank you both very much for your replies, using the F statistic from the lmFit and the geneSetTest seems like the right way to go for what I need to do. On a similar note, do you have a function in mind for finding the genes with similar expression profiles within a specific pathway (as a replacement of the other function of attract, findSynexprs)? Thank you again for your help. Best wishes, Emmanouela On 18 Apr 2014, at 02:16, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Emmanouela, > > The limma package is designed to fit linear models, and it can compute t-statistics and F-statistics faster than making your own loop to lm(). If you want F-statistics for distinguishing the cell types, why not: > > fit <- lmFit(anal_voom, design) > fit <- eBayes(fit[,-1]) > > Then the F-statistics will be in fit$F. > > If you want to know whether a particular KEGG pathway tends to have larger F-statistics, you could also use: > > geneSetTest(index, fit$F) > > where index selects genes in the pathway. If there are only two cell types, a better way would be: > > camera(anal_voom, index, design) > > With camera, index could be a list of index vectors for all the KEGG pathways at once. > > Best wishes > Gordon > >> Date: Tue, 15 Apr 2014 09:44:42 -0700 (PDT) >> From: "Emmanouela Repapi [guest]" <guest@bioconductor.org> >> To: bioconductor@r-project.org, emmanouela.repapi@imm.ox.ac.uk >> Subject: [BioC] use of voom function with attract package >> >> >> Dear Bioconductor, >> >> I am trying to use the attract package to find the processes that are differentially activated between cell types of the same lineage, using RNA-Seq data. Since the attract package is designed to work with microarray data, I decided to use the voom function to transform my data and change the findAttractors() function accordingly, to accommodate this type of data. Since this is not trivial, I want to make sure that I am using the output from the voom function correctly. >> >> The main part of the findAttractors() uses lm to model the expression in relation to the cell type (group) and then an anova to get the F statistic for each gene: >> fstat <- apply(dat.detect.wkegg, 1, function(y, x) { >> anova(lm(y ~ x))[[4]][1]}, x = group) >> where dat.detect.wkegg is the matrix of the normalized expression values with the genes per row and the samples per column. >> (To give some more context, the function then uses the log2 values of the fstat and does a t test between the gene values of a specific pathway vs all the gene values to identify the significant pathways. ) >> >> What I want to do is change the above to: >> >> counts_data <- DGEList(counts=rnaseq,group=celltype) >> counts_data_norm <- calcNormFactors(counts_data) >> >> design <- model.matrix( ~ celltype) >> anal_voom <- voom(counts_data_norm, design) >> dat.detect.wkegg <- as.list(as.data.frame(t(anal_voom$E))) >> voom_weights <- as.list(as.data.frame(t(anal_voom$weights))) >> >> fstat <- mapply(function(y, w, group) {anova(lm(y ~ group, weights=w))[[4]][1]}, >> dat.detect.wkegg, voom_weights, MoreArgs = list(group=celltype)) >> >> Is this the way to go with using the weights from voom, or am I getting this very wrong? >> >> Many thanks in advance for your reply! >> >> Best wishes, >> Emmanouela >> >> >> >> >> -- output of sessionInfo(): >> >> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.ISO-8859-1 LC_NUMERIC=C LC_TIME=en_GB.ISO-8859-1 LC_COLLATE=en_GB.ISO-8859-1 LC_MONETARY=en_GB.ISO-8859-1 LC_MESSAGES=en_GB.ISO-8859-1 >> [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] attract_1.14.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 GO.db_2.10.1 Matrix_1.1-3 cluster_1.15.2 annotate_1.40.1 org.Mm.eg.db_2.10.1 >> [10] KEGG.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.13 >> >> loaded via a namespace (and not attached): >> [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.1 GSEABase_1.24.0 IRanges_1.20.7 lattice_0.20-29 RBGL_1.38.0 splines_3.0.1 >> [9] stats4_3.0.1 survival_2.37-7 tcltk_3.0.1 tools_3.0.1 XML_3.98-1.1 xtable_1.7-3 > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
0
Entering edit mode
Hi Emmanouela, I don't know of an off-the-shelf function to do something similar to findSynexprs but using precision weights. I guess one would do hiearchical clustering of the genes, and would define the distance measure to be a Pearson correlation calculation but using weights. Not hard to do, but not yet ready off-the-shelf as far as I know. Best wishes Gordon On Thu, 24 Apr 2014, Emmanouela Repapi wrote: > Dear Gordon and Ryan, > > Thank you both very much for your replies, using the F statistic from > the lmFit and the geneSetTest seems like the right way to go for what I > need to do. > On a similar note, do you have a function in mind for finding the genes > with similar expression profiles within a specific pathway (as a > replacement of the other function of attract, findSynexprs)? > > Thank you again for your help. > > Best wishes, > Emmanouela > > On 18 Apr 2014, at 02:16, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Emmanouela, >> >> The limma package is designed to fit linear models, and it can compute t-statistics and F-statistics faster than making your own loop to lm(). If you want F-statistics for distinguishing the cell types, why not: >> >> fit <- lmFit(anal_voom, design) >> fit <- eBayes(fit[,-1]) >> >> Then the F-statistics will be in fit$F. >> >> If you want to know whether a particular KEGG pathway tends to have larger F-statistics, you could also use: >> >> geneSetTest(index, fit$F) >> >> where index selects genes in the pathway. If there are only two cell types, a better way would be: >> >> camera(anal_voom, index, design) >> >> With camera, index could be a list of index vectors for all the KEGG pathways at once. >> >> Best wishes >> Gordon >> >>> Date: Tue, 15 Apr 2014 09:44:42 -0700 (PDT) >>> From: "Emmanouela Repapi [guest]" <guest at="" bioconductor.org=""> >>> To: bioconductor at r-project.org, emmanouela.repapi at imm.ox.ac.uk >>> Subject: [BioC] use of voom function with attract package >>> >>> >>> Dear Bioconductor, >>> >>> I am trying to use the attract package to find the processes that are differentially activated between cell types of the same lineage, using RNA-Seq data. Since the attract package is designed to work with microarray data, I decided to use the voom function to transform my data and change the findAttractors() function accordingly, to accommodate this type of data. Since this is not trivial, I want to make sure that I am using the output from the voom function correctly. >>> >>> The main part of the findAttractors() uses lm to model the expression in relation to the cell type (group) and then an anova to get the F statistic for each gene: >>> fstat <- apply(dat.detect.wkegg, 1, function(y, x) { >>> anova(lm(y ~ x))[[4]][1]}, x = group) >>> where dat.detect.wkegg is the matrix of the normalized expression values with the genes per row and the samples per column. >>> (To give some more context, the function then uses the log2 values of the fstat and does a t test between the gene values of a specific pathway vs all the gene values to identify the significant pathways. ) >>> >>> What I want to do is change the above to: >>> >>> counts_data <- DGEList(counts=rnaseq,group=celltype) >>> counts_data_norm <- calcNormFactors(counts_data) >>> >>> design <- model.matrix( ~ celltype) >>> anal_voom <- voom(counts_data_norm, design) >>> dat.detect.wkegg <- as.list(as.data.frame(t(anal_voom$E))) >>> voom_weights <- as.list(as.data.frame(t(anal_voom$weights))) >>> >>> fstat <- mapply(function(y, w, group) {anova(lm(y ~ group, weights=w))[[4]][1]}, >>> dat.detect.wkegg, voom_weights, MoreArgs = list(group=celltype)) >>> >>> Is this the way to go with using the weights from voom, or am I getting this very wrong? >>> >>> Many thanks in advance for your reply! >>> >>> Best wishes, >>> Emmanouela >>> >>> >>> >>> >>> -- output of sessionInfo(): >>> >>> sessionInfo() >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_GB.ISO-8859-1 LC_NUMERIC=C LC_TIME=en_GB.ISO-8859-1 LC_COLLATE=en_GB.ISO-8859-1 LC_MONETARY=en_GB.ISO-8859-1 LC_MESSAGES=en_GB.ISO-8859-1 >>> [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] attract_1.14.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 GO.db_2.10.1 Matrix_1.1-3 cluster_1.15.2 annotate_1.40.1 org.Mm.eg.db_2.10.1 >>> [10] KEGG.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.13 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.1 GSEABase_1.24.0 IRanges_1.20.7 lattice_0.20-29 RBGL_1.38.0 splines_3.0.1 >>> [9] stats4_3.0.1 survival_2.37-7 tcltk_3.0.1 tools_3.0.1 XML_3.98-1.1 xtable_1.7-3 >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
0
Entering edit mode
Dear Gordon, Ok, thats what I thought. Thank you again so much for all your help! Best wishes, Emmanouela Emmanouela Repapi, PhD Computational Biology Research Group Weatherall Institute of Molecular Medicine University of Oxford, OX3 9DS Tel: (01865 2)22253 On 25 Apr 2014, at 00:25, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Hi Emmanouela, > > I don't know of an off-the-shelf function to do something similar to findSynexprs but using precision weights. I guess one would do hiearchical clustering of the genes, and would define the distance measure to be a Pearson correlation calculation but using weights. Not hard to do, but not yet ready off-the-shelf as far as I know. > > Best wishes > Gordon > > On Thu, 24 Apr 2014, Emmanouela Repapi wrote: > >> Dear Gordon and Ryan, >> >> Thank you both very much for your replies, using the F statistic from the lmFit and the geneSetTest seems like the right way to go for what I need to do. > >> On a similar note, do you have a function in mind for finding the genes with similar expression profiles within a specific pathway (as a replacement of the other function of attract, findSynexprs)? >> >> Thank you again for your help. >> >> Best wishes, >> Emmanouela >> >> On 18 Apr 2014, at 02:16, Gordon K Smyth <smyth@wehi.edu.au> wrote: >> >>> Dear Emmanouela, >>> >>> The limma package is designed to fit linear models, and it can compute t-statistics and F-statistics faster than making your own loop to lm(). If you want F-statistics for distinguishing the cell types, why not: >>> >>> fit <- lmFit(anal_voom, design) >>> fit <- eBayes(fit[,-1]) >>> >>> Then the F-statistics will be in fit$F. >>> >>> If you want to know whether a particular KEGG pathway tends to have larger F-statistics, you could also use: >>> >>> geneSetTest(index, fit$F) >>> >>> where index selects genes in the pathway. If there are only two cell types, a better way would be: >>> >>> camera(anal_voom, index, design) >>> >>> With camera, index could be a list of index vectors for all the KEGG pathways at once. >>> >>> Best wishes >>> Gordon >>> >>>> Date: Tue, 15 Apr 2014 09:44:42 -0700 (PDT) >>>> From: "Emmanouela Repapi [guest]" <guest@bioconductor.org> >>>> To: bioconductor@r-project.org, emmanouela.repapi@imm.ox.ac.uk >>>> Subject: [BioC] use of voom function with attract package >>>> >>>> >>>> Dear Bioconductor, >>>> >>>> I am trying to use the attract package to find the processes that are differentially activated between cell types of the same lineage, using RNA-Seq data. Since the attract package is designed to work with microarray data, I decided to use the voom function to transform my data and change the findAttractors() function accordingly, to accommodate this type of data. Since this is not trivial, I want to make sure that I am using the output from the voom function correctly. >>>> >>>> The main part of the findAttractors() uses lm to model the expression in relation to the cell type (group) and then an anova to get the F statistic for each gene: >>>> fstat <- apply(dat.detect.wkegg, 1, function(y, x) { >>>> anova(lm(y ~ x))[[4]][1]}, x = group) >>>> where dat.detect.wkegg is the matrix of the normalized expression values with the genes per row and the samples per column. >>>> (To give some more context, the function then uses the log2 values of the fstat and does a t test between the gene values of a specific pathway vs all the gene values to identify the significant pathways. ) >>>> >>>> What I want to do is change the above to: >>>> >>>> counts_data <- DGEList(counts=rnaseq,group=celltype) >>>> counts_data_norm <- calcNormFactors(counts_data) >>>> >>>> design <- model.matrix( ~ celltype) >>>> anal_voom <- voom(counts_data_norm, design) >>>> dat.detect.wkegg <- as.list(as.data.frame(t(anal_voom$E))) >>>> voom_weights <- as.list(as.data.frame(t(anal_voom$weights))) >>>> >>>> fstat <- mapply(function(y, w, group) {anova(lm(y ~ group, weights=w))[[4]][1]}, >>>> dat.detect.wkegg, voom_weights, MoreArgs = list(group=celltype)) >>>> >>>> Is this the way to go with using the weights from voom, or am I getting this very wrong? >>>> >>>> Many thanks in advance for your reply! >>>> >>>> Best wishes, >>>> Emmanouela >>>> >>>> >>>> >>>> >>>> -- output of sessionInfo(): >>>> >>>> sessionInfo() >>>> R version 3.0.1 (2013-05-16) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_GB.ISO-8859-1 LC_NUMERIC=C LC_TIME=en_GB.ISO-8859-1 LC_COLLATE=en_GB.ISO-8859-1 LC_MONETARY=en_GB.ISO-8859-1 LC_MESSAGES=en_GB.ISO-8859-1 >>>> [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] attract_1.14.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 GO.db_2.10.1 Matrix_1.1-3 cluster_1.15.2 annotate_1.40.1 org.Mm.eg.db_2.10.1 >>>> [10] KEGG.db_2.10.1 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.13 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] AnnotationForge_1.4.4 genefilter_1.44.0 grid_3.0.1 GSEABase_1.24.0 IRanges_1.20.7 lattice_0.20-29 RBGL_1.38.0 splines_3.0.1 >>>> [9] stats4_3.0.1 survival_2.37-7 tcltk_3.0.1 tools_3.0.1 XML_3.98-1.1 xtable_1.7-3 >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the addressee. >>> You must not disclose, forward, print or use it without the permission of the sender. >>> ______________________________________________________________________ >> >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}