makeChipPackage(): Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows:
1
0
Entering edit mode
Jaqueline • 0
@b381a3e5
Last seen 1 day ago
Chile

Hi there, I'm attempting to create a chip package that includes both probe IDs from an mRNA expression custom array for salmo salar, and gene IDs of the genes the probes are linked to.

I created the org.Ssalar.eg.db with makeOrgPackageFromNCBI() function.

I have also created a 2 column data frame ("probeFrame") which I have thoroughly check, which includes a probe.id and a gene.id column, as suggested by the use description of the function. I have checked with several commands and everything seems to be in order with the file and it's data.

I also confirmed that all gene IDs from the "probeFrame" file have a match with a gene ID from the "org.Ssalar.eg.db" package.

I have run the command in R version 4.3 and 4.5.1 (which is the one I currently have) and with AnnotationForge version 1.44 and 1.5, and in both scenerios I get the same error message and I've run out of ideas regarding how to solve it so I would appretiate any help anyone can provide :))))))))

This is an example of the exact code I'm using:

makeChipPackage(
  prefix = "SalmonChip",
  probeFrame = probeFrame,
  orgPkgName = "org.Ssalar.eg.db",
  version = "1.0.0",
  maintainer = "name <name@mail.com>",
  author = "name <name@mail.com>",
  outputDir = "R_Analysis",
  tax_id = "8030",
  genus = "Salmo",
  species = "salar")

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 29093, 0

AnnotationForge • 541 views
ADD COMMENT
0
Entering edit mode

Will you run again, and this time right after the error, run traceback() and post the results here.

ADD REPLY
0
Entering edit mode

Thanks for the reply (:

Here is the result (I removed my email, and the probeFrame I created is under a different name "mapping_df", but those are the only changes).

> traceback()
11: stop(gettextf("arguments imply differing number of rows: %s", 
        paste(unique(nrows), collapse = ", ")), domain = NA)
10: data.frame(..., check.names = FALSE)
9: cbind(deparse.level, ...)
8: cbind(...)
7: eval(mc, env)
6: eval(mc, env)
5: eval(mc, env)
4: standardGeneric("cbind")
3: cbind(probeFrame, multiple)
2: makeChipDbFromDataFrame(probeFrame, orgPkgName, tax_id, genus, 
       species, dbFileName, optionalAccessionsFrame)
1: makeChipPackage(prefix = "SalmonChip", probeFrame = mapping_df, 
       orgPkgName = "org.Ssalar.eg.db", version = "1.0.0", maintainer = "Jaqueline <xxxx@mail.com>", 
       author = "Jaqueline <xxxx@mail.com>", outputDir = "R_Analysis", 
       tax_id = "8030", genus = "Salmo", species = "salar", optionalAccessionsFrame = NULL)
ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States

Try changing the column names for your probeFrame to 'probes' and 'genes'.

0
Entering edit mode

Thank you so much for the help!! That solved it alright :DDDDD

How did you figure it was the names? I ask because it was a rather simple solution but that never crossed my mind or others that tried to help me... and in case something similar happens again I would like to try and include this in my list of things to consider and check when an error persists .

Thanks again!!!

ADD REPLY
1
Entering edit mode

By looking at the code, after searching for makeChipPackage at https://code.bioconductor.org, under Code search. That will bring up this page, and you can see that the function is described in the first entry. If you click on the link it will bring up the .R file that contains the function plus some helper functions. This code was written by Marc Carlson, and his MO is to put the helper functions (things like .makeProbesTable that you can figure are internal helper functions because of the prepended dot) at the top, and the exported functions at the bottom. So scroll down to `makeChipPackage'.

You can see that makeChipPackage does some checking of the inputs at the beginning and then there's this part

 ## The file name for the DB
    dbFileName <- file.path(outputDir,paste0(prefix, ".sqlite"))
    ## Then make the DB
    makeChipDbFromDataFrame(probeFrame, orgPkgName, tax_id,
                            genus, species, dbFileName, optionalAccessionsFrame)

    ## choose the appropriate pkgTemplate (schema)
    orgSchema <- .getOrgSchema(orgPkgName)
    chipSchema <- .getChipSchema(orgSchema)

Which is about the only code that appears to do anything to the probeFrame. We need to find makeChipDbFromDataFrame so we scroll up a bit to find it, and eyeball that code. Now here is the trick; you won't find data.frame(..., check.names = FALSE) anywhere in that function, but it's there, right at the bottom of this code block:

 ## 1st connect to the org package
    require(orgPkgName, character.only = TRUE)
    ## Then make final column for the probeFrame            
    multiple <- unlist(lapply(as.character(probeFrame$probes),
                              .testForMultiples,
                              probes=as.character(probeFrame$probes)))
    probeFrame <- cbind(probeFrame, multiple)

Now here's why I say it's tricky. The cbind function is a special type of function called a 'method', which means that it behaves differently based on the input (in this case the input are data.frame). And there are more than one type of method in R, and they work differently depending on the type of method. Most of base R and the tidyverse use what's called S3 methods, and Bioconductor mostly uses S4 methods. It's complicated and boring and you don't need to know much about that, so for purposes of this discussion, we'll just say that cbind is both an S3 and an S4 method, and that an S3 method has the form function.object, so the version of cbind that is meant to operate on a data.frame is called cbind.data.frame. Additionally, not all methods are 'visible', meaning you can't just type the name of the function at an R prompt to see the code. But you can get a listing of all the methods for a function using the methods function:

> methods(cbind)
 [1] cbind,ANY-method                  cbind,Assays-method              
 [3] cbind,DataFrame-method            cbind,DataFrameList-method       
 [5] cbind,DelayedArray-method         cbind,FilterMatrix-method        
 [7] cbind,List-method                 cbind,NaArray-method             
 [9] cbind,RectangularData-method      cbind,Rle-method                 
[11] cbind,RleList-method              cbind,SparseArray-method         
[13] cbind,SparseArraySeed-method      cbind,SummarizedExperiment-method
[15] cbind,Tracks-method               cbind,VCF-method                 
[17] cbind.CompressedMatrix*           cbind.DGEList*                   
[19] cbind.DataFrame                   cbind.EList                      
[21] cbind.EListRaw                    cbind.List                       
[23] cbind.MAList                      cbind.RGList                     
[25] cbind.RectangularData             cbind.data.frame                 
[27] cbind.data.table*                 cbind.grouped_df*                
[29] cbind.gtable*                     cbind.integer64*                 
[31] cbind.ts*                        
see '?methods' for accessing help and source code

That's a lot of methods! You can see that there are three basic forms; cbind.object, cbind.object* (note the asterisk) and cbind,object-method. The first two are S3 methods, the difference being that the second S3 method is not exported, so you can't just type cbind.object to see the function, and the third one is an S4 method. Luckily for us, we already know we want cbind.data.frame, and there is no asterisk, so we can just type the function name to see what's in it.

> cbind.data.frame
function (..., deparse.level = 1) 
data.frame(..., check.names = FALSE)  <---------------- The important part is right here
<bytecode: 0x560a02552130>
<environment: namespace:base>

and you can then infer that this line of code

 probeFrame <- cbind(probeFrame, multiple)

will dispatch on cbind.data.frame, which calls data.frame(..., check.names = FALSE), which is what you see in your error, so we know that's the problem step. And having diagnosed where the error comes from, we can then look at the error itself, which is saying that you are trying to combine two data.frames, one of which has zero rows (arguments imply differing number of rows: 29093, 0). We already know that's not true for 'probeFrame', so it must be true for 'multiple'. And 'multiple' is generated just prior to the cbind call, using this code

 multiple <- unlist(lapply(as.character(probeFrame$probes),
                              .testForMultiples,
                              probes=as.character(probeFrame$probes)))

Which relies on the 'probeFrame' data.frame having a column called 'probes', and won't work if that name doesn't exist. As an example

> fakeo <- data.frame(probe.id = letters, gene.id = LETTERS)
> fakeo$probes
NULL

If the column names are any different, probeFrame$probes will return NULL. And you already said you called your columns 'probe.id' and 'gene.id', so I knew that was the problem.

ADD REPLY
0
Entering edit mode

Hi James, sorry for the late response, despite time passing by I didn't want to not say thanks again, not only for your kind help with my initial issue but also for the thoughtful explanation on how you reached the answer.

I never would have get there by myself, it was a really easy thing to fix but definitely not as easy to figure out.

Have a great week ahead and thanks again !! (:

ADD REPLY
0
Entering edit mode

Glad I could help!

ADD REPLY

Login before adding your answer.

Traffic: 853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6