Question

makeOrgPackage in AnnotationForge "GID" problem

0

Entering edit mode

samanthapizi • 0

@samanthapizi-12837

Last seen 8.8 years ago

I am trying to make a custom package for Acidovorax citrulli.

The two files I am using are:

> head(aac_info2)
GID LOCUSTAG GENENAME PROTEIN_ID
1 4666313 Aave_0001 chromosomal replication initiation protein YP_968393.1
2 4666219 Aave_0002 DNA polymerase III subunit beta YP_968394.1
3 4666217 Aave_0003 DNA gyrase subunit B YP_968395.1
4 4666220 Aave_0004 putative transcriptional regulator YP_968396.1
5 4666222 Aave_0005 putative type I restriction enzyme, R subunit YP_968397.1
6 4666226 Aave_0007 DNA polymerase subunit beta YP_968398.1

and

> head(go_aac2)
GID GO EVIDENCE
1 4666313 GO:0005737 IEA
2 4666219 GO:0005737 IEA
3 4666217 GO:0005694 IEA
4 4666220 GO:0005524 IEA
5 4666222 GO:0000166 IEA
6 4666226 GO:0016779 IEA

But I get this error"The 1st column must always be the gene ID 'GID'"

> makeOrgPackage(gene_info=aac_info2,go=go_aac2,version="0.1",maintainer="chen",author="chen",ourputDir=".",tax_id="397945",genus="Acidovorax",species="citrulli",goTable="go")

Error in .makeOrgPackage(data, version = version, maintainer = maintainer, : The 1st column must always be the gene ID 'GID'

Actually when I was first trying, I used locus_tag for my GID, but it gave me the same error. Then I found the Gene IDs and put them in the first column, but I got the same thing.What is wrong?

My second question is that, for my GO file, I actually have way more lines than my annotation file, because one gene have multiple GO IDs. Can this work?

Thanks!!!

sessioninfo()

R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
[3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
[5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915
[7] LC_PAPER=en_US.iso885915 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] AnnotationForge_1.14.2 AnnotationDbi_1.34.4 IRanges_2.6.1
[4] S4Vectors_0.10.2 Biobase_2.32.0 BiocGenerics_0.18.0

loaded via a namespace (and not attached):
[1] DBI_0.4-1 RSQLite_1.0.0 XML_3.98-1.4

annotationforge • 4.2k views

ADD COMMENT • link updated 8.8 years ago by James W. MacDonald 68k • written 8.8 years ago by samanthapizi • 0

score 2 · Accepted Answer · 2017-04-13

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

The first argument for makeOrgPackage is the ellipsis argument (...). This means that any argument that doesn't exactly match any of the named arguments will be 'sucked up' by that argument and processed as if they are data.frames containing your data.

The reason this matters, is because one of your arguments is ourputDir=".", which doesn't match any of the arguments (you meant that to be outputDir). Since it doesn't match one of the named arguments exactly, R is trying to process it as if it were a data.frame, and well, you see the result. Fixing that typo should set things right.

As to your second argument, that's to be expected, and shouldn't pose a problem.

ADD COMMENT • link 8.8 years ago James W. MacDonald 68k

0

Entering edit mode

I fixed that typo, sorry for making silly mistakes. But now I get another error:

> makeOrgPackage(gene_info=aac_info2,go=go_aac2,version="0.1",maintainer="chen<cheng12@uga.edu>",author="chen<cheng12@uga.edu>",outputDir=".",tax_id="397945",genus="Acidovorax",species="citrulli",goTable="go")
Error in structure(res, levels = lv, names = nm, class = "factor") :
'names' attribute [16058] must be the same length as the vector [2]

This is what makes me wonder about the length of the go file. What caused this error?

ADD REPLY • link 8.8 years ago samanthapizi • 0

0

Entering edit mode

I don't know. It's not obvious from the error message. What do you get if you run traceback() right after you get the error?

ADD REPLY • link 8.8 years ago James W. MacDonald 68k

0

Entering edit mode

> traceback()
6: structure(res, levels = lv, names = nm, class = "factor")
5: unlist(unname(lapply(data, "[", "GID")))
4: unique(unlist(unname(lapply(data, "[", "GID"))))
3: makeOrgDbFromDataFrames(data, tax_id, genus, species, dbFileName,
goTable)
2: .makeOrgPackage(data, version = version, maintainer = maintainer,
author = author, outputDir = outputDir, tax_id = tax_id,
genus = genus, species = species, goTable = goTable, verbose = verbose)
1: makeOrgPackage(gene_info = aac_info2, go = go_aac2, version = "0.1",
maintainer = "chen<cheng12@uga.edu>", author = "chen<cheng12@uga.edu>",
outputDir = ".", tax_id = "397945", genus = "Acidovorax",
species = "citrulli", goTable = "go")

ADD REPLY • link 8.8 years ago samanthapizi • 0

1

Entering edit mode

The problem is that your GIDs are factors rather than numeric, which implies that you have either done something weird when you read those in, or you have some GIDs that R is somehow interpreting as character, which causes it to convert to factor.

In other words, if you read something into R, and you have a column that appears to contain text, R will by default convert that column to factor. As an example:

> df <- data.frame(first = c(1:5, "a"), second = 1:6)
> df
  first second
1     1      1
2     2      2
3     3      3
4     4      4
5     5      5
6     a      6
> df$first
[1] 1 2 3 4 5 a
Levels: 1 2 3 4 5 a
> df$second
[1] 1 2 3 4 5 6

So if your GID column is all numbers, R will read it in as numbers. But if there are some things in that column that look like strings, the column will be read in as a character vector and then converted to a factor. And this will blow up, giving the error you are seeing:

> df1 <- data.frame(GID = letters, LOCUS = letters)
> df2 <- data.frame(GID = c(letters,LETTERS), GO = 1:52)
> lst <- list(df1,df2)
> unique(unlist(unname(lapply(lst, "[", "GID"))))
Error in structure(res, levels = lv, names = nm, class = "factor") :
  'names' attribute [78] must be the same length as the vector [2]

You could read in using stringsAsFactor = FALSE, and that will work:

> df1 <- data.frame(GID = letters, LOCUS = letters, stringsAsFactors = FALSE)
> df2 <- data.frame(GID = c(letters,LETTERS), GO = 1:52, stringsAsFactors = FALSE)
> lst <- list(df1,df2)
> unique(unlist(unname(lapply(lst, "[", "GID"))))
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z" "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L"
[39] "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

But all those GIDs should be numeric, so if I were you, I would track down the non-numeric looking things and figure out what's up wit dat.

ADD REPLY • link 8.8 years ago James W. MacDonald 68k

0

Entering edit mode

Ahhhh I see! I finally got it made. Thank you very much!

ADD REPLY • link 8.8 years ago samanthapizi • 0

0

Entering edit mode

thanks a lot! I been stuck it all day.

ADD REPLY • link 6.8 years ago P.Fei Song • 0