error with retrieving dn and ds using biomart
1
0
Entering edit mode
lemon tree ▴ 40
@lemon-tree-5473
Last seen 10.3 years ago
Dear All, I have a list of human genes with EntrezGene IDs. And I want to retrieve dN and dS values of them from biomaRt. I used the following command. library(biomaRt) ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") inputfile<-"gene_list.txt"; geneIDs<-read.table(inputfile,header=F); dnds <- getBM(attributes = c("entrezgene", "mmusculus_homolog_dn", "mmusculus_homolog_ds"), filters = c("entrezgene"), values= geneIDs, mart = ensembl) There is an error as the following: Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple attribute pages are not allowed Could anyone give some suggestions on this? Thanks very much! Best all! -- David Wang [[alternative HTML version deleted]]
biomaRt biomaRt • 2.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
Hi David, On 3/19/13 12:26 PM, David Wang wrote: > Dear All, > I have a list of human genes with EntrezGene IDs. And I want to > retrieve dN and dS values of them from biomaRt. > I used the following command. > library(biomaRt) > ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") > inputfile<-"gene_list.txt"; > geneIDs<-read.table(inputfile,header=F); > dnds <- getBM(attributes = c("entrezgene", "mmusculus_homolog_dn", > "mmusculus_homolog_ds"), filters = c("entrezgene"), values= geneIDs, mart = > ensembl) > There is an error as the following: > Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple > attribute pages are not allowed There are certain attribute pages on the Biomart server, and you cannot get data from more than one. > x <- listAttributes(mart, what=c("name","description","page")) > x[x[,1] %in% c("entrezgene","mmusculus_homolog_dn","mmusculus_homolog_ds"),] name description page 47 entrezgene EntrezGene ID feature_page 577 mmusculus_homolog_dn dN homologs 578 mmusculus_homolog_ds dS homologs So the homolog data you want come from the homologs page, but the entrezgene data come from the feature_page, hence you cannot get all three at one time. Best, Jim > > Could anyone give some suggestions on this? Thanks very much! > > Best all! >
ADD COMMENT
0
Entering edit mode
Dear Jim and all, Thank you! I have changed the EntrezGene IDs to ensembl IDs to redo it. A simple example is as the following: dnds <- getBM(attributes = c("ensembl_gene_id", "mmusculus_homolog_dn","mmusculus_homolog_ds"), filters = "ensembl_gene_id", values = "ENSG00000139618", mart = mart) But there is an error as the following: Error in `[.data.frame`(result, , attributes) : undefined columns selected Could you give me some suggestion about this? Thank you very much! On Tue, Mar 19, 2013 at 12:03 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi David, > > > > On 3/19/13 12:26 PM, David Wang wrote: > There are certain attribute pages on the Biomart server, and you cannot > get data from more than one. > > > x <- listAttributes(mart, what=c("name","description","**page")) > > x[x[,1] %in% c("entrezgene","mmusculus_**homolog_dn","mmusculus_** > homolog_ds"),] > name description page > 47 entrezgene EntrezGene ID feature_page > 577 mmusculus_homolog_dn dN homologs > 578 mmusculus_homolog_ds dS homologs > > So the homolog data you want come from the homologs page, but the > entrezgene data come from the feature_page, hence you cannot get all three > at one time. > > Best, > > Jim > > > -- David Wang [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
I have 18 RNAseq libraries for 9 samples in duplicate. When I generated edgeR object, I found all the norm.factors are 1 even though the library size are so different. Could anyone please explain this for me and let me know if it is correct? Thanks. Here are the code some results: > group=factor(substring(names, 1,2)) > group  [1] MP FA FR FW MA FW MM FP MA MW FM MM MW MP MM FP FA FR Levels: FA FM FP FR FW MA MM MP MW > design=model.matrix(~0+group) > y=DGEList(counts=data.used, group=group) Calculating library sizes from column totals. > y An object of class "DGEList" $counts           MP  FA  FR FW  MA FW.1  MM  FP MA.1  MW  FM MM.1 MW.1 MP.1 MM.2 FP.1 GS_14929 221 284 170 23 267  105 209 106   52 218 158  277  170  926 185  211 GS_09776  75  32  17 28  41   96  86  15   23  65  23   27   65   73 87   85 GS_18434  36  22  19  8  21    6  22   7    7  27  13    6   21    2 23   15 GS_08334  44  77  23 50  41   78  78  14   21  61  29    8   46   59 94  132 GS_09550  82  92  45 54 105   75  95  18   79 153  41    8  111   11 86  178          FA.1 FR.1 GS_14929  159    0 GS_09776   52  143 GS_18434   14   23 GS_08334   92   60 GS_09550   97   39 15457 more rows ... $samples    group lib.size norm.factors MP    MP  2220863            1 FA    FA  2179157            1 FR    FR  1181036            1 FW    FW  1305802            1 MA    MA  1780507            1 13 more rows ... > y$samples      group lib.size norm.factors MP      MP  2220863            1 FA      FA  2179157            1 FR      FR  1181036            1 FW      FW  1305802            1 MA      MA  1780507            1 FW.1    FW  3388037            1 MM      MM  2564495            1 FP      FP   749402            1 MA.1    MA  1299287            1 MW      MW  2579787            1 FM      FM  1355962            1 MM.1    MM   867100            1 MW.1    MW  2581031            1 MP.1    MP  2115532            1 MM.2    MM  2869116            1 FP.1    FP  3112223            1 FA.1    FA  2715113            1 FR.1    FR  2770152            1 ------------------------- Here I can see, MP lib.size is 2220863, wherease FP lib.size is only 749402, but both norm.factors are 1 I am very confused... [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Caprice, Probably you also want to run: y <- calcNormFactors(y) ? then have a look at y$samples ? Best, Mark On 19.03.2013, at 22:38, capricy gao <capricyg at="" yahoo.com=""> wrote: > I have 18 RNAseq libraries for 9 samples in duplicate. When I generated edgeR object, I found all the norm.factors are 1 even though the library size are so different. Could anyone please explain this for me and let me know if it is correct? Thanks. > > Here are the code some results: > >> group=factor(substring(names, 1,2)) >> group > [1] MP FA FR FW MA FW MM FP MA MW FM MM MW MP MM FP FA FR > Levels: FA FM FP FR FW MA MM MP MW >> design=model.matrix(~0+group) >> y=DGEList(counts=data.used, group=group) > Calculating library sizes from column totals. >> y > An object of class "DGEList" > $counts > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 MP.1 MM.2 FP.1 > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 185 211 > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 23 15 > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 94 132 > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 86 178 > FA.1 FR.1 > GS_14929 159 0 > GS_09776 52 143 > GS_18434 14 23 > GS_08334 92 60 > GS_09550 97 39 > 15457 more rows ... > > $samples > group lib.size norm.factors > MP MP 2220863 1 > FA FA 2179157 1 > FR FR 1181036 1 > FW FW 1305802 1 > MA MA 1780507 1 > 13 more rows ... > >> y$samples > group lib.size norm.factors > MP MP 2220863 1 > FA FA 2179157 1 > FR FR 1181036 1 > FW FW 1305802 1 > MA MA 1780507 1 > FW.1 FW 3388037 1 > MM MM 2564495 1 > FP FP 749402 1 > MA.1 MA 1299287 1 > MW MW 2579787 1 > FM FM 1355962 1 > MM.1 MM 867100 1 > MW.1 MW 2581031 1 > MP.1 MP 2115532 1 > MM.2 MM 2869116 1 > FP.1 FP 3112223 1 > FA.1 FA 2715113 1 > FR.1 FR 2770152 1 > ------------------------- > Here I can see, MP lib.size is 2220863, wherease FP lib.size is only 749402, but both norm.factors are 1 > > I am very confused... > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Geez, I forgot it. Thanks a lot!! ________________________________ From: Mark Robinson <mark.robinson@imls.uzh.ch> Cc: "Bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Tuesday, March 19, 2013 4:48 PM Subject: Re: [BioC] edgeR strange normalized factor for RNAseq data Hi Caprice, Probably you also want to run: y <- calcNormFactors(y) … then have a look at y$samples … Best, Mark > I have 18 RNAseq libraries for 9 samples in duplicate. When I generated edgeR object, I found all the norm.factors are 1 even though the library size are so different. Could anyone please explain this for me and let me know if it is correct? Thanks. > > Here are the code some results: > >> group=factor(substring(names, 1,2)) >> group >  [1] MP FA FR FW MA FW MM FP MA MW FM MM MW MP MM FP FA FR > Levels: FA FM FP FR FW MA MM MP MW >> design=model.matrix(~0+group) >> y=DGEList(counts=data.used, group=group) > Calculating library sizes from column totals. >> y > An object of class "DGEList" > $counts >          MP  FA  FR FW  MA FW.1  MM  FP MA.1  MW  FM MM.1 MW.1 MP.1 MM.2 FP.1 > GS_14929 221 284 170 23 267  105 209 106  52 218 158  277  170  926 185  211 > GS_09776  75  32  17 28  41  96  86  15  23  65  23  27  65  73 87  85 > GS_18434  36  22  19  8  21    6  22  7    7  27  13    6  21    2 23  15 > GS_08334  44  77  23 50  41  78  78  14  21  61  29    8  46  59 94  132 > GS_09550  82  92  45 54 105  75  95  18  79 153  41    8  111  11 86  178 >          FA.1 FR.1 > GS_14929  159    0 > GS_09776  52  143 > GS_18434  14  23 > GS_08334  92  60 > GS_09550  97  39 > 15457 more rows ... > > $samples >    group lib.size norm.factors > MP    MP  2220863            1 > FA    FA  2179157            1 > FR    FR  1181036            1 > FW    FW  1305802            1 > MA    MA  1780507            1 > 13 more rows ... > >> y$samples >      group lib.size norm.factors > MP      MP  2220863            1 > FA      FA  2179157            1 > FR      FR  1181036            1 > FW      FW  1305802            1 > MA      MA  1780507            1 > FW.1    FW  3388037            1 > MM      MM  2564495            1 > FP      FP  749402            1 > MA.1    MA  1299287            1 > MW      MW  2579787            1 > FM      FM  1355962            1 > MM.1    MM  867100            1 > MW.1    MW  2581031            1 > MP.1    MP  2115532            1 > MM.2    MM  2869116            1 > FP.1    FP  3112223            1 > FA.1    FA  2715113            1 > FR.1    FR  2770152            1 > ------------------------- > Here I can see, MP lib.size is 2220863, wherease FP lib.size is only 749402, but both norm.factors are 1 > > I am very confused... >     [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi David, Apparently the Biomart server is doing something crazy and returning Alpaca homologs. You can see this if you first do options(error=recover) dnds <- getBM(attributes = c("ensembl_gene_id", "mmusculus_homolog_dn"," mmusculus_homolog_ds"), filters = "ensembl_gene_id", values = ensgs, mart = mart) and then you will see something like > getBM(attributes = c("ensembl_gene_id", "mmusculus_homolog_dn","mmusculus_homolog_ds"), filters = "ensembl_gene_id", values = ensgs, mart = mart) Error in `[.data.frame`(result, , attributes) : undefined columns selected Enter a frame number, or 0 to exit 1: getBM(attributes = c("ensembl_gene_id", "mmusculus_homolog_dn", "mmusculus_ 2: result[, attributes] 3: `[.data.frame`(result, , attributes) If you choose 2, you will enter the debugger at that point and can then do something like Browse[1]> head(result) ensembl_gene_id vpacos_homolog_dn vpacos_homolog_ds 1 ENSG00000114786 0.1161 0.5425 2 ENSG00000125954 NA NA 3 ENSG00000159403 0.1330 0.6510 4 ENSG00000251246 0.0765 0.6598 5 ENSG00000255730 NA NA 6 ENSG00000256349 0.0536 0.6109 and then if you grep vpacos in the results from listAttributes, you can see that these are Alpaca homologs. Or at least the header indicates that. So either Steffen Durinck or somebody at Ensembl will have to look into this, as I am now officially out of my depth. Best, Jim On Tue, Mar 19, 2013 at 2:22 PM, David Wang <lemon.wang218@gmail.com> wrote: > Dear Jim and all, > Thank you! I have changed the EntrezGene IDs to ensembl IDs to redo > it. A simple example is as the following: > dnds <- getBM(attributes = c("ensembl_gene_id", > "mmusculus_homolog_dn","mmusculus_homolog_ds"), filters = > "ensembl_gene_id", values = "ENSG00000139618", mart = mart) > But there is an error as the following: > Error in `[.data.frame`(result, , attributes) : > undefined columns selected > > Could you give me some suggestion about this? Thank you very much! > > On Tue, Mar 19, 2013 at 12:03 PM, James W. MacDonald <jmacdon@uw.edu>wrote: > >> Hi David, >> >> >> >> On 3/19/13 12:26 PM, David Wang wrote: >> There are certain attribute pages on the Biomart server, and you cannot >> get data from more than one. >> >> > x <- listAttributes(mart, what=c("name","description","**page")) >> > x[x[,1] %in% c("entrezgene","mmusculus_**homolog_dn","mmusculus_** >> homolog_ds"),] >> name description page >> 47 entrezgene EntrezGene ID feature_page >> 577 mmusculus_homolog_dn dN homologs >> 578 mmusculus_homolog_ds dS homologs >> >> So the homolog data you want come from the homologs page, but the >> entrezgene data come from the feature_page, hence you cannot get all three >> at one time. >> >> Best, >> >> Jim >> >> >> -- > David Wang > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear David, I have tried the following queries on an out of date R and biomaRt and I got the mouse dn and ds values back: mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="www.ensembl.org", path="/biomart/martservice", "hsapiens_gene_ensembl") dnds <- getBM(attributes = c("ensembl_gene_id","mmusculus_homolog_dn","mmusculus_homolog_ds"), filters ="ensembl_gene_id", values = "ENSG00000139618", mart = mart) I have just updated R to 2.15.3 and BiomaRt to 2.14.0 and I am now getting the following error back: Error in `[.data.frame`(result, , attributes) : undefined columns selected This might be coming from the latest version of BiomaRt. Hope this helps, Thomas On 19 Mar 2013, at 20:56, James W. MacDonald wrote: > Hi David, > > Apparently the Biomart server is doing something crazy and returning Alpaca > homologs. You can see this if you first do > > options(error=recover) > dnds <- getBM(attributes = c("ensembl_gene_id", "mmusculus_homolog_dn"," > mmusculus_homolog_ds"), filters = "ensembl_gene_id", values = ensgs, mart = > mart) > > and then you will see something like > >> getBM(attributes = c("ensembl_gene_id", > "mmusculus_homolog_dn","mmusculus_homolog_ds"), filters = > "ensembl_gene_id", values = ensgs, mart = mart) > Error in `[.data.frame`(result, , attributes) : > undefined columns selected > > Enter a frame number, or 0 to exit > > 1: getBM(attributes = c("ensembl_gene_id", "mmusculus_homolog_dn", > "mmusculus_ > 2: result[, attributes] > 3: `[.data.frame`(result, , attributes) > > If you choose 2, you will enter the debugger at that point and can then do > something like > > Browse[1]> head(result) > ensembl_gene_id vpacos_homolog_dn vpacos_homolog_ds > 1 ENSG00000114786 0.1161 0.5425 > 2 ENSG00000125954 NA NA > 3 ENSG00000159403 0.1330 0.6510 > 4 ENSG00000251246 0.0765 0.6598 > 5 ENSG00000255730 NA NA > 6 ENSG00000256349 0.0536 0.6109 > > and then if you grep vpacos in the results from listAttributes, you can see > that these are Alpaca homologs. Or at least the header indicates that. > > So either Steffen Durinck or somebody at Ensembl will have to look into > this, as I am now officially out of my depth. > > Best, > > Jim > > > > > > On Tue, Mar 19, 2013 at 2:22 PM, David Wang <lemon.wang218@gmail.com> wrote: > >> Dear Jim and all, >> Thank you! I have changed the EntrezGene IDs to ensembl IDs to redo >> it. A simple example is as the following: >> dnds <- getBM(attributes = c("ensembl_gene_id", >> "mmusculus_homolog_dn","mmusculus_homolog_ds"), filters = >> "ensembl_gene_id", values = "ENSG00000139618", mart = mart) >> But there is an error as the following: >> Error in `[.data.frame`(result, , attributes) : >> undefined columns selected >> >> Could you give me some suggestion about this? Thank you very much! >> >> On Tue, Mar 19, 2013 at 12:03 PM, James W. MacDonald <jmacdon@uw.edu>wrote: >> >>> Hi David, >>> >>> >>> >>> On 3/19/13 12:26 PM, David Wang wrote: >>> There are certain attribute pages on the Biomart server, and you cannot >>> get data from more than one. >>> >>>> x <- listAttributes(mart, what=c("name","description","**page")) >>>> x[x[,1] %in% c("entrezgene","mmusculus_**homolog_dn","mmusculus_** >>> homolog_ds"),] >>> name description page >>> 47 entrezgene EntrezGene ID feature_page >>> 577 mmusculus_homolog_dn dN homologs >>> 578 mmusculus_homolog_ds dS homologs >>> >>> So the homolog data you want come from the homologs page, but the >>> entrezgene data come from the feature_page, hence you cannot get all three >>> at one time. >>> >>> Best, >>> >>> Jim >>> >>> >>> -- >> David Wang >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Thomas Maurel Bioinformatician - Ensembl Production Team European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge - CB10 1SD - UK [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 354 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6