Boruta and mtry parameter
1
0
Entering edit mode
erwan.scaon ▴ 10
@erwanscaon-20285
Last seen 3.0 years ago

Dear all,

I am using the Boruta package and want to input non-default parameter values for ntree and mtry. From what I read in the vignette, I understood that it is possible to pass these parameters to ranger (random forest implementation internally called by Boruta) :

You can pass arguments to the importance provider by providing it to the Boruta call; for instance, ranger, the default importance provider, makes use of all available CPU threads, won’t always be the optimal choice. Setting num.threads in the Boruta call will cause it to relay this argument to the ranger function, and hence limit the training process parallelism.

Reminder on default parameter values from the reference manual :

Random Forest methods has two main parameters, number of attributes tried at each split and the number of trees in the forest; first one is called mtry in both implementations, but the second ntree in randomForest and num.trees in ranger. To this end, to maintain compatibility, getImpRf* functions still accept ntree parameter relaying it into num.trees. Still, both parameters take the same defaults in both implementations (square root of the number of all attributes and 500 respectively)

My issue is that, while it's working with ntree, I dot not get it to work yet with mtry, see examples below (where I check the first line from Boruta attStats command to inspect and compare results between different calls):

ntree example

# Call with default
set.seed(54); attStats(Boruta(formule, data = data, doTrace = 0))

> meanImp  medianImp     minImp     maxImp  normHits  decision <br>
> Actinobacteria    **4.4267420**  4.2501325 -0.4203045  8.6944647 0.6868687 Confirmed <br>

# Call specifing ntree with the default value (we expect the same output) set.seed(54); attStats(Boruta(formule, data = data, doTrace = 0, num.trees = 500))

> meanImp  medianImp     minImp     maxImp  normHits  decision <br>
> Actinobacteria    **4.4267420**  4.2501325 -0.4203045  8.6944647 0.6868687 Confirmed <br>

# Call with a non-default value for ntree (we expect different output)
set.seed(54); attStats(Boruta(formule, data = data, doTrace = 0, num.trees = 1000))

> meanImp  medianImp    minImp     maxImp  normHits  decision <br>
> Actinobacteria    6.2996109  6.1540756  1.670633 12.0373538 0.6868687 Confirmed <br>

=> It's looking good with ntree

mtry example

# Call with default
set.seed(54); attStats(Boruta(formule, data = data, doTrace = 0))

> meanImp  medianImp     minImp     maxImp  normHits  decision <br>
> Actinobacteria    **4.4267420**  4.2501325 -0.4203045  8.6944647 0.6868687 Confirmed <br>

# Call specifing mtry with the default value (we expect the same output)
# *Ps* : I wasn't quite sure if Boruta was considering a regression or classification task
# nor if it was using ncol or ncol - 1, thus I tested all cases
set.seed(54); attStats(Boruta(formule, data = data, doTrace = 0, mtry = sqrt(ncol(data))))
set.seed(54); attStats(Boruta(formule, data = data, doTrace = 0, mtry = sqrt(ncol(data) - 1)))
set.seed(54); attStats(Boruta(formule, data = data, doTrace = 0, mtry = ncol(data) / 3))
set.seed(54); attStats(Boruta(formule, data = data, doTrace = 0, mtry = (ncol(data) - 1) / 3))

> meanImp  medianImp    minImp     maxImp   normHits  decision <br>
> Actinobacteria    4.3579070  4.2577025  1.341730  7.5233281 0.65656566 Confirmed <br>
> meanImp  medianImp     minImp     maxImp   normHits decision <br>
> Actinobacteria    4.1890915  4.1481305 -0.0775629  8.5946043 0.69696970 Confirmed <br>
> meanImp  medianImp    minImp     maxImp   normHits  decision <br>
> Actinobacteria    4.3579070  4.2577025  1.341730 7.5233281 0.65656566 Confirmed <br>
> meanImp  medianImp     minImp     maxImp   normHits  decision <br>
> Actinobacteria    4.1890915  4.1481305 -0.0775629  8.5946043 0.69696970 Confirmed <br>
# Given that we didn't find the same output, let's try all possible mtry values to find the "default one"
for(i in 0:ncol(data)) {
    print(paste0("mtry set at : ", i))
    set.seed(54); print(attStats(Boruta(formule, data = data, doTrace = 0, mtry = i)))
}

> [1] "mtry set at : 0" <br>
> meanImp  medianImp     minImp     maxImp  normHits  decision <br>
> Actinobacteria    **4.4267420**  4.2501325 -0.4203045  8.6944647 0.6868687 Confirmed <br>
> [1] "mtry set at : 1" <br>
> meanImp   medianImp     minImp     maxImp   normHits  decision <br>
> Actinobacteria    3.75337508  3.65007557  0.6998206 6.5739342 0.64646465 Tentative <br>
> [1] "mtry set at : 2" <br>
> meanImp  medianImp     minImp     maxImp   normHits  decision <br>
> Actinobacteria    4.1890915  4.1481305 -0.0775629 8.5946043 0.69696970 Confirmed <br>
> [1] "mtry set at : 3" <br>
> meanImp  medianImp    minImp     maxImp   normHits  decision <br>
> Actinobacteria    4.3579070  4.2577025  1.341730  7.5233281 0.65656566 Confirmed <br>
> [1] "mtry set at : 4" <br>
> meanImp  medianImp    minImp     maxImp  normHits  decision <br>
> Actinobacteria    4.6081939  4.5531384  1.081664  9.6172220 0.6767677 Confirmed [1] <br>
> "mtry set at : 5" <br>
>  meanImp  medianImp     minImp      maxImp   normHits  decision <br>
> Actinobacteria    4.7811405  4.6350621  0.9283142  9.84832090 0.71717172 Confirmed <br>
> [1] "mtry set at : 6" <br>
> meanImp  medianImp     minImp     maxImp   normHits  decision <br>
> Actinobacteria    4.6548880  4.3611922  1.2079338 9.0864045 0.67676768 Confirmed <br>
> [1] "mtry set at : 7" <br>
> meanImp  medianImp     minImp     maxImp  normHits  decision <br>
> Actinobacteria    4.7861728  4.7295293  1.3558568  9.7099331 0.6767677 Confirmed <br>
> [1] "mtry set at : 8" <br>
> meanImp   medianImp     minImp     maxImp  normHits  decision <br>
> Actinobacteria    4.6571985  4.47381402  0.4249924 11.7667187 0.6666667 Confirmed <br>
> [1] "mtry set at : 9" <br>
> meanImp  medianImp     minImp     maxImp  normHits  decision <br>
> Actinobacteria    4.6967805  4.3565461  0.1421514 10.1955589 0.6464646 Tentative <br>

With mtry = 0 with get the same values. But mtry = 0 just forces ranger to use the default value for mtry, see example below :

(ranger::ranger(response ~ ., data, mtry = 0))$mtry

2

(ranger::ranger(response ~ ., data, mtry = 1))$mtry

> 1

(ranger::ranger(response ~ ., data, mtry = 2))$mtry

> 2

What I am missing ? How can i set a custom value for mtry if I don't understand how Boruta handle this parameter ?

Boruta random forest mtry ranger • 1.3k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States

Boruta isn't a Bioconductor package so the maintainer may not monitor this forum. Ask at the URL provided on the CRAN landing page for Boruta

ADD COMMENT
0
Entering edit mode

I did not take into account that it wasn't a bioconductor package, sorry for that.

The question was forwarded to the author (Miron B. Kursa) of the package, who kindly replied (it did solve my issue).

This question may thus be closed or deleted if it doesn't belong here. Otherwise I will post the answer at some point

ADD REPLY

Login before adding your answer.

Traffic: 616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6