I am trying to make a custom package for Acidovorax citrulli.
The two files I am using are:
> head(aac_info2)
GID LOCUSTAG GENENAME PROTEIN_ID
1 4666313 Aave_0001 chromosomal replication initiation protein YP_968393.1
2 4666219 Aave_0002 DNA polymerase III subunit beta YP_968394.1
3 4666217 Aave_0003 DNA gyrase subunit B YP_968395.1
4 4666220 Aave_0004 putative transcriptional regulator YP_968396.1
5 4666222 Aave_0005 putative type I restriction enzyme, R subunit YP_968397.1
6 4666226 Aave_0007 DNA polymerase subunit beta YP_968398.1
and
> head(go_aac2)
GID GO EVIDENCE
1 4666313 GO:0005737 IEA
2 4666219 GO:0005737 IEA
3 4666217 GO:0005694 IEA
4 4666220 GO:0005524 IEA
5 4666222 GO:0000166 IEA
6 4666226 GO:0016779 IEA
But I get this error"The 1st column must always be the gene ID 'GID'"
> makeOrgPackage(gene_info=aac_info2,go=go_aac2,version="0.1",maintainer="chen",author="chen",ourputDir=".",tax_id="397945",genus="Acidovorax",species="citrulli",goTable="go")
Error in .makeOrgPackage(data, version = version, maintainer = maintainer, : The 1st column must always be the gene ID 'GID'
Actually when I was first trying, I used locus_tag for my GID, but it gave me the same error. Then I found the Gene IDs and put them in the first column, but I got the same thing.What is wrong?
My second question is that, for my GO file, I actually have way more lines than my annotation file, because one gene have multiple GO IDs. Can this work?
Thanks!!!
sessioninfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
[3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
[5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915
[7] LC_PAPER=en_US.iso885915 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] AnnotationForge_1.14.2 AnnotationDbi_1.34.4 IRanges_2.6.1
[4] S4Vectors_0.10.2 Biobase_2.32.0 BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] DBI_0.4-1 RSQLite_1.0.0 XML_3.98-1.4
I fixed that typo, sorry for making silly mistakes. But now I get another error:
> makeOrgPackage(gene_info=aac_info2,go=go_aac2,version="0.1",maintainer="chen<cheng12@uga.edu>",author="chen<cheng12@uga.edu>",outputDir=".",tax_id="397945",genus="Acidovorax",species="citrulli",goTable="go")
Error in structure(res, levels = lv, names = nm, class = "factor") :
'names' attribute [16058] must be the same length as the vector [2]
This is what makes me wonder about the length of the go file. What caused this error?
I don't know. It's not obvious from the error message. What do you get if you run traceback() right after you get the error?
> traceback()
6: structure(res, levels = lv, names = nm, class = "factor")
5: unlist(unname(lapply(data, "[", "GID")))
4: unique(unlist(unname(lapply(data, "[", "GID"))))
3: makeOrgDbFromDataFrames(data, tax_id, genus, species, dbFileName,
goTable)
2: .makeOrgPackage(data, version = version, maintainer = maintainer,
author = author, outputDir = outputDir, tax_id = tax_id,
genus = genus, species = species, goTable = goTable, verbose = verbose)
1: makeOrgPackage(gene_info = aac_info2, go = go_aac2, version = "0.1",
maintainer = "chen<cheng12@uga.edu>", author = "chen<cheng12@uga.edu>",
outputDir = ".", tax_id = "397945", genus = "Acidovorax",
species = "citrulli", goTable = "go")
The problem is that your GIDs are factors rather than numeric, which implies that you have either done something weird when you read those in, or you have some GIDs that R is somehow interpreting as character, which causes it to convert to factor.
In other words, if you read something into R, and you have a column that appears to contain text, R will by default convert that column to factor. As an example:
So if your GID column is all numbers, R will read it in as numbers. But if there are some things in that column that look like strings, the column will be read in as a character vector and then converted to a factor. And this will blow up, giving the error you are seeing:
You could read in using stringsAsFactor = FALSE, and that will work:
But all those GIDs should be numeric, so if I were you, I would track down the non-numeric looking things and figure out what's up wit dat.
Ahhhh I see! I finally got it made. Thank you very much!
thanks a lot! I been stuck it all day.