Lightgbm tuned with parallel processing unable to predict #52

DesmondChoy · 2022-09-07T02:19:53Z

Lightgbm model (after using last_fit()) isn't able to predict when parallel processing is used.
I tried using both registerDoParallel and registerDoFuture and both gave me the same error.
I tried it on both multiclass and binary classification problems, but both cases with parallel processing still gave me the error.

Error messages are mentioned below in the reprexes:

Multiclass tuning without parallel processing

No issues. Able to predict, and extract feature importance.

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.2.1
#> Warning: package 'broom' was built under R version 4.2.1
#> Warning: package 'scales' was built under R version 4.2.1
#> Warning: package 'infer' was built under R version 4.2.1
#> Warning: package 'modeldata' was built under R version 4.2.1
#> Warning: package 'parsnip' was built under R version 4.2.1
#> Warning: package 'rsample' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'workflows' was built under R version 4.2.1
#> Warning: package 'workflowsets' was built under R version 4.2.1
library(bonsai)
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.2.1
#> 
#> Attaching package: 'palmerpenguins'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins

split <- penguins |>
  initial_split(strata = species)

penguins_train <- training(split)
penguins_test <- testing(split)
folds <- vfold_cv(penguins_train, strata = species, 3)

recipe_basic <- penguins_train |>
  recipe(species ~ .)

lightgbm_spec <- boost_tree(trees = tune()) |>
  set_engine(
    "lightgbm",
    objective = "multiclass",
    metric = "multi_error",
    num_class = !!length(unique(penguins_train$species))
  ) |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)
#> Warning: package 'lightgbm' was built under R version 4.2.1

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split)

last_fit |>
  extract_workflow() |>
  predict(head(penguins_test))
#> # A tibble: 6 × 1
#>   .pred_class
#>   <fct>      
#> 1 Adelie     
#> 2 Adelie     
#> 3 Adelie     
#> 4 Adelie     
#> 5 Adelie     
#> 6 Adelie

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()

^{Created on 2022-09-07 with reprex v2.0.2}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19042)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2022-09-07
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date (UTC) lib source
#>  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports        1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  bonsai         * 0.2.0      2022-08-31 [1] CRAN (R 4.2.0)
#>  broom          * 1.0.1      2022-08-29 [1] CRAN (R 4.2.1)
#>  class            7.3-20     2022-01-16 [1] CRAN (R 4.2.0)
#>  cli              3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools        0.2-18     2020-11-04 [1] CRAN (R 4.2.0)
#>  colorspace       2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon           1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>  curl             4.3.2      2021-06-23 [1] CRAN (R 4.2.0)
#>  data.table       1.14.2     2021-09-27 [1] CRAN (R 4.2.0)
#>  DBI              1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dials          * 1.0.0      2022-06-14 [1] CRAN (R 4.2.0)
#>  DiceDesign       1.9        2021-02-13 [1] CRAN (R 4.2.0)
#>  digest           0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr          * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate         0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi            1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap          1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  foreach          1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
#>  fs               1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  furrr            0.3.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  future           1.27.0     2022-07-22 [1] CRAN (R 4.2.1)
#>  future.apply     1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  generics         0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  ggplot2        * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  globals          0.16.1     2022-08-28 [1] CRAN (R 4.2.0)
#>  glue             1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  gower            1.0.0      2022-02-03 [1] CRAN (R 4.2.0)
#>  GPfit            1.0-8      2019-02-08 [1] CRAN (R 4.2.0)
#>  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>  hardhat          1.2.0      2022-06-30 [1] CRAN (R 4.2.1)
#>  highr            0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools        0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  httr             1.4.3      2022-05-04 [1] CRAN (R 4.2.0)
#>  infer          * 1.0.2      2022-06-10 [1] CRAN (R 4.2.1)
#>  ipred            0.9-13     2022-06-02 [1] CRAN (R 4.2.1)
#>  iterators        1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
#>  jsonlite         1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr            1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice          0.20-45    2021-09-22 [1] CRAN (R 4.2.0)
#>  lava             1.6.10     2021-09-02 [1] CRAN (R 4.2.0)
#>  lhs              1.1.5      2022-03-22 [1] CRAN (R 4.2.0)
#>  lifecycle        1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  lightgbm       * 3.3.2      2022-01-14 [1] CRAN (R 4.2.1)
#>  listenv          0.8.0      2019-12-05 [1] CRAN (R 4.2.0)
#>  lubridate        1.8.0      2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS             7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
#>  Matrix           1.4-1      2022-03-23 [1] CRAN (R 4.2.0)
#>  mime             0.12       2021-09-28 [1] CRAN (R 4.2.0)
#>  modeldata      * 1.0.0      2022-07-01 [1] CRAN (R 4.2.1)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  nnet             7.3-17     2022-01-16 [1] CRAN (R 4.2.0)
#>  palmerpenguins * 0.1.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  parallelly       1.32.1     2022-07-21 [1] CRAN (R 4.2.1)
#>  parsnip        * 1.0.1      2022-08-18 [1] CRAN (R 4.2.1)
#>  pillar           1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  prodlim          2019.11.13 2019-11-17 [1] CRAN (R 4.2.0)
#>  purrr          * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R6             * 2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp             1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  recipes        * 1.0.1      2022-07-07 [1] CRAN (R 4.2.0)
#>  reprex           2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang            1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown        2.16       2022-08-24 [1] CRAN (R 4.2.1)
#>  rpart            4.1.16     2022-01-24 [1] CRAN (R 4.2.0)
#>  rsample        * 1.1.0      2022-08-08 [1] CRAN (R 4.2.1)
#>  rstudioapi       0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales         * 1.2.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi          1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr          1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  survival         3.4-0      2022-08-09 [1] CRAN (R 4.2.1)
#>  tibble         * 3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  tidymodels     * 1.0.0      2022-07-13 [1] CRAN (R 4.2.1)
#>  tidyr          * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect       1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  timeDate         4021.104   2022-07-19 [1] CRAN (R 4.2.1)
#>  tune           * 1.0.0      2022-07-07 [1] CRAN (R 4.2.0)
#>  utf8             1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs            0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr            2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  workflows      * 1.0.0      2022-07-05 [1] CRAN (R 4.2.1)
#>  workflowsets   * 1.0.0      2022-07-12 [1] CRAN (R 4.2.1)
#>  xfun             0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  xml2             1.3.3      2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml             2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#>  yardstick      * 1.0.0      2022-06-06 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/dchoy/AppData/Local/Programs/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Multiclass tuning with parallel processing (registerDoParallel)

Unable to predict nor extract feature importance.

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.2.1
#> Warning: package 'broom' was built under R version 4.2.1
#> Warning: package 'scales' was built under R version 4.2.1
#> Warning: package 'infer' was built under R version 4.2.1
#> Warning: package 'modeldata' was built under R version 4.2.1
#> Warning: package 'parsnip' was built under R version 4.2.1
#> Warning: package 'rsample' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'workflows' was built under R version 4.2.1
#> Warning: package 'workflowsets' was built under R version 4.2.1
library(bonsai)
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.2.1
#> 
#> Attaching package: 'palmerpenguins'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins
library(doParallel)
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: iterators
#> Loading required package: parallel
library(parallel)

split <- penguins |>
  initial_split(strata = species)

penguins_train <- training(split)
penguins_test <- testing(split)
folds <- vfold_cv(penguins_train, strata = species, 3)

recipe_basic <- penguins_train |>
  recipe(species ~ .)

lightgbm_spec <- boost_tree(trees = tune()) |>
  set_engine(
    "lightgbm",
    objective = "multiclass",
    metric = "multi_error",
    num_class = !!length(unique(penguins_train$species))
  ) |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

all_cores <- detectCores(logical = FALSE)
cl <- makePSOCKcluster(all_cores - 1)
registerDoParallel(cl)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split)

last_fit |>
  extract_workflow() |>
  predict(head(penguins_test))
#> Error in predictor$predict(data = data, start_iteration = start_iteration, : Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()
#> Error in booster$dump_model(num_iteration = num_iteration): Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

^{Created on 2022-09-07 with reprex v2.0.2}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19042)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2022-09-07
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date (UTC) lib source
#>  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports        1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  bonsai         * 0.2.0      2022-08-31 [1] CRAN (R 4.2.0)
#>  broom          * 1.0.1      2022-08-29 [1] CRAN (R 4.2.1)
#>  class            7.3-20     2022-01-16 [1] CRAN (R 4.2.0)
#>  cli              3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools        0.2-18     2020-11-04 [1] CRAN (R 4.2.0)
#>  colorspace       2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon           1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>  data.table       1.14.2     2021-09-27 [1] CRAN (R 4.2.0)
#>  DBI              1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dials          * 1.0.0      2022-06-14 [1] CRAN (R 4.2.0)
#>  DiceDesign       1.9        2021-02-13 [1] CRAN (R 4.2.0)
#>  digest           0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  doParallel     * 1.0.17     2022-02-07 [1] CRAN (R 4.2.0)
#>  dplyr          * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate         0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi            1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap          1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  foreach        * 1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
#>  fs               1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  furrr            0.3.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  future           1.27.0     2022-07-22 [1] CRAN (R 4.2.1)
#>  future.apply     1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  generics         0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  ggplot2        * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  globals          0.16.1     2022-08-28 [1] CRAN (R 4.2.0)
#>  glue             1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  gower            1.0.0      2022-02-03 [1] CRAN (R 4.2.0)
#>  GPfit            1.0-8      2019-02-08 [1] CRAN (R 4.2.0)
#>  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>  hardhat          1.2.0      2022-06-30 [1] CRAN (R 4.2.1)
#>  highr            0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools        0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  infer          * 1.0.2      2022-06-10 [1] CRAN (R 4.2.1)
#>  ipred            0.9-13     2022-06-02 [1] CRAN (R 4.2.1)
#>  iterators      * 1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
#>  jsonlite         1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr            1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice          0.20-45    2021-09-22 [1] CRAN (R 4.2.0)
#>  lava             1.6.10     2021-09-02 [1] CRAN (R 4.2.0)
#>  lhs              1.1.5      2022-03-22 [1] CRAN (R 4.2.0)
#>  lifecycle        1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  lightgbm         3.3.2      2022-01-14 [1] CRAN (R 4.2.1)
#>  listenv          0.8.0      2019-12-05 [1] CRAN (R 4.2.0)
#>  lubridate        1.8.0      2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS             7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
#>  Matrix           1.4-1      2022-03-23 [1] CRAN (R 4.2.0)
#>  modeldata      * 1.0.0      2022-07-01 [1] CRAN (R 4.2.1)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  nnet             7.3-17     2022-01-16 [1] CRAN (R 4.2.0)
#>  palmerpenguins * 0.1.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  parallelly       1.32.1     2022-07-21 [1] CRAN (R 4.2.1)
#>  parsnip        * 1.0.1      2022-08-18 [1] CRAN (R 4.2.1)
#>  pillar           1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  prodlim          2019.11.13 2019-11-17 [1] CRAN (R 4.2.0)
#>  purrr          * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R6               2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp             1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  recipes        * 1.0.1      2022-07-07 [1] CRAN (R 4.2.0)
#>  reprex           2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang            1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown        2.16       2022-08-24 [1] CRAN (R 4.2.1)
#>  rpart            4.1.16     2022-01-24 [1] CRAN (R 4.2.0)
#>  rsample        * 1.1.0      2022-08-08 [1] CRAN (R 4.2.1)
#>  rstudioapi       0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales         * 1.2.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi          1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr          1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  survival         3.4-0      2022-08-09 [1] CRAN (R 4.2.1)
#>  tibble         * 3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  tidymodels     * 1.0.0      2022-07-13 [1] CRAN (R 4.2.1)
#>  tidyr          * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect       1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  timeDate         4021.104   2022-07-19 [1] CRAN (R 4.2.1)
#>  tune           * 1.0.0      2022-07-07 [1] CRAN (R 4.2.0)
#>  utf8             1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs            0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr            2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  workflows      * 1.0.0      2022-07-05 [1] CRAN (R 4.2.1)
#>  workflowsets   * 1.0.0      2022-07-12 [1] CRAN (R 4.2.1)
#>  xfun             0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml             2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#>  yardstick      * 1.0.0      2022-06-06 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/dchoy/AppData/Local/Programs/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Multiclass tuning with parallel processing (registerDoFuture)

Unable to predict nor extract feature importance.

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.2.1
#> Warning: package 'broom' was built under R version 4.2.1
#> Warning: package 'scales' was built under R version 4.2.1
#> Warning: package 'infer' was built under R version 4.2.1
#> Warning: package 'modeldata' was built under R version 4.2.1
#> Warning: package 'parsnip' was built under R version 4.2.1
#> Warning: package 'rsample' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'workflows' was built under R version 4.2.1
#> Warning: package 'workflowsets' was built under R version 4.2.1
library(bonsai)
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.2.1
#> 
#> Attaching package: 'palmerpenguins'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins
library(doFuture)
#> Warning: package 'doFuture' was built under R version 4.2.1
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: future
#> Warning: package 'future' was built under R version 4.2.1
library(parallel)

split <- penguins |>
  initial_split(strata = species)

penguins_train <- training(split)
penguins_test <- testing(split)
folds <- vfold_cv(penguins_train, strata = species, 3)

recipe_basic <- penguins_train |>
  recipe(species ~ .)

lightgbm_spec <- boost_tree(trees = tune()) |>
  set_engine(
    "lightgbm",
    objective = "multiclass",
    metric = "multi_error",
    num_class = !!length(unique(penguins_train$species))
  ) |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

all_cores <- detectCores(logical = FALSE)
registerDoFuture()
cl <- makeCluster(all_cores)
plan(cluster, workers = cl)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split)

last_fit |>
  extract_workflow() |>
  predict(head(penguins_test))
#> Error in predictor$predict(data = data, start_iteration = start_iteration, : Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()
#> Error in booster$dump_model(num_iteration = num_iteration): Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

^{Created on 2022-09-07 with reprex v2.0.2}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19042)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2022-09-07
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date (UTC) lib source
#>  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports        1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  bonsai         * 0.2.0      2022-08-31 [1] CRAN (R 4.2.0)
#>  broom          * 1.0.1      2022-08-29 [1] CRAN (R 4.2.1)
#>  class            7.3-20     2022-01-16 [1] CRAN (R 4.2.0)
#>  cli              3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools        0.2-18     2020-11-04 [1] CRAN (R 4.2.0)
#>  colorspace       2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon           1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>  data.table       1.14.2     2021-09-27 [1] CRAN (R 4.2.0)
#>  DBI              1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dials          * 1.0.0      2022-06-14 [1] CRAN (R 4.2.0)
#>  DiceDesign       1.9        2021-02-13 [1] CRAN (R 4.2.0)
#>  digest           0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  doFuture       * 0.12.2     2022-04-26 [1] CRAN (R 4.2.1)
#>  dplyr          * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate         0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi            1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap          1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  foreach        * 1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
#>  fs               1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  furrr            0.3.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  future         * 1.27.0     2022-07-22 [1] CRAN (R 4.2.1)
#>  future.apply     1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  generics         0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  ggplot2        * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  globals          0.16.1     2022-08-28 [1] CRAN (R 4.2.0)
#>  glue             1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  gower            1.0.0      2022-02-03 [1] CRAN (R 4.2.0)
#>  GPfit            1.0-8      2019-02-08 [1] CRAN (R 4.2.0)
#>  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>  hardhat          1.2.0      2022-06-30 [1] CRAN (R 4.2.1)
#>  highr            0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools        0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  infer          * 1.0.2      2022-06-10 [1] CRAN (R 4.2.1)
#>  ipred            0.9-13     2022-06-02 [1] CRAN (R 4.2.1)
#>  iterators        1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
#>  jsonlite         1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr            1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice          0.20-45    2021-09-22 [1] CRAN (R 4.2.0)
#>  lava             1.6.10     2021-09-02 [1] CRAN (R 4.2.0)
#>  lhs              1.1.5      2022-03-22 [1] CRAN (R 4.2.0)
#>  lifecycle        1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  lightgbm         3.3.2      2022-01-14 [1] CRAN (R 4.2.1)
#>  listenv          0.8.0      2019-12-05 [1] CRAN (R 4.2.0)
#>  lubridate        1.8.0      2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS             7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
#>  Matrix           1.4-1      2022-03-23 [1] CRAN (R 4.2.0)
#>  modeldata      * 1.0.0      2022-07-01 [1] CRAN (R 4.2.1)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  nnet             7.3-17     2022-01-16 [1] CRAN (R 4.2.0)
#>  palmerpenguins * 0.1.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  parallelly       1.32.1     2022-07-21 [1] CRAN (R 4.2.1)
#>  parsnip        * 1.0.1      2022-08-18 [1] CRAN (R 4.2.1)
#>  pillar           1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  prodlim          2019.11.13 2019-11-17 [1] CRAN (R 4.2.0)
#>  purrr          * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R6               2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp             1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  recipes        * 1.0.1      2022-07-07 [1] CRAN (R 4.2.0)
#>  reprex           2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang            1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown        2.16       2022-08-24 [1] CRAN (R 4.2.1)
#>  rpart            4.1.16     2022-01-24 [1] CRAN (R 4.2.0)
#>  rsample        * 1.1.0      2022-08-08 [1] CRAN (R 4.2.1)
#>  rstudioapi       0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales         * 1.2.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi          1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr          1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  survival         3.4-0      2022-08-09 [1] CRAN (R 4.2.1)
#>  tibble         * 3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  tidymodels     * 1.0.0      2022-07-13 [1] CRAN (R 4.2.1)
#>  tidyr          * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect       1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  timeDate         4021.104   2022-07-19 [1] CRAN (R 4.2.1)
#>  tune           * 1.0.0      2022-07-07 [1] CRAN (R 4.2.0)
#>  utf8             1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs            0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr            2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  workflows      * 1.0.0      2022-07-05 [1] CRAN (R 4.2.1)
#>  workflowsets   * 1.0.0      2022-07-12 [1] CRAN (R 4.2.1)
#>  xfun             0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml             2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#>  yardstick      * 1.0.0      2022-06-06 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/dchoy/AppData/Local/Programs/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Binary class tuning with parallel processing (registerDoFuture)

Unable to predict nor extract feature importance.

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.2.1
#> Warning: package 'broom' was built under R version 4.2.1
#> Warning: package 'scales' was built under R version 4.2.1
#> Warning: package 'infer' was built under R version 4.2.1
#> Warning: package 'modeldata' was built under R version 4.2.1
#> Warning: package 'parsnip' was built under R version 4.2.1
#> Warning: package 'rsample' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'workflows' was built under R version 4.2.1
#> Warning: package 'workflowsets' was built under R version 4.2.1
library(bonsai)
library(doFuture)
#> Warning: package 'doFuture' was built under R version 4.2.1
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: future
#> Warning: package 'future' was built under R version 4.2.1
library(parallel)

split <- modeldata::bivariate_test |>
  initial_split()

data_train <- training(split)
data_test <- testing(split)
folds <- vfold_cv(data_train, 3)

recipe_basic <- data_train |>
  recipe(Class ~ .)

lightgbm_spec <- boost_tree(trees = tune(),) |>
  set_engine("lightgbm") |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

all_cores <- detectCores(logical = FALSE)
registerDoFuture()
cl <- makeCluster(all_cores)
plan(cluster, workers = cl)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split)

last_fit |>
  extract_workflow() |>
  predict(head(data_test))
#> Error in predictor$predict(data = data, start_iteration = start_iteration, : Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()
#> Error in booster$dump_model(num_iteration = num_iteration): Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

^{Created on 2022-09-07 with reprex v2.0.2}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19042)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2022-09-07
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date (UTC) lib source
#>  assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports      1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  bonsai       * 0.2.0      2022-08-31 [1] CRAN (R 4.2.0)
#>  broom        * 1.0.1      2022-08-29 [1] CRAN (R 4.2.1)
#>  class          7.3-20     2022-01-16 [1] CRAN (R 4.2.0)
#>  cli            3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools      0.2-18     2020-11-04 [1] CRAN (R 4.2.0)
#>  colorspace     2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon         1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>  data.table     1.14.2     2021-09-27 [1] CRAN (R 4.2.0)
#>  DBI            1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dials        * 1.0.0      2022-06-14 [1] CRAN (R 4.2.0)
#>  DiceDesign     1.9        2021-02-13 [1] CRAN (R 4.2.0)
#>  digest         0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  doFuture     * 0.12.2     2022-04-26 [1] CRAN (R 4.2.1)
#>  dplyr        * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate       0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi          1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  foreach      * 1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
#>  fs             1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  furrr          0.3.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  future       * 1.27.0     2022-07-22 [1] CRAN (R 4.2.1)
#>  future.apply   1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  generics       0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  ggplot2      * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  globals        0.16.1     2022-08-28 [1] CRAN (R 4.2.0)
#>  glue           1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  gower          1.0.0      2022-02-03 [1] CRAN (R 4.2.0)
#>  GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.2.0)
#>  gtable         0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>  hardhat        1.2.0      2022-06-30 [1] CRAN (R 4.2.1)
#>  highr          0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools      0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  infer        * 1.0.2      2022-06-10 [1] CRAN (R 4.2.1)
#>  ipred          0.9-13     2022-06-02 [1] CRAN (R 4.2.1)
#>  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
#>  jsonlite       1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr          1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice        0.20-45    2021-09-22 [1] CRAN (R 4.2.0)
#>  lava           1.6.10     2021-09-02 [1] CRAN (R 4.2.0)
#>  lhs            1.1.5      2022-03-22 [1] CRAN (R 4.2.0)
#>  lifecycle      1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  lightgbm       3.3.2      2022-01-14 [1] CRAN (R 4.2.1)
#>  listenv        0.8.0      2019-12-05 [1] CRAN (R 4.2.0)
#>  lubridate      1.8.0      2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS           7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
#>  Matrix         1.4-1      2022-03-23 [1] CRAN (R 4.2.0)
#>  modeldata    * 1.0.0      2022-07-01 [1] CRAN (R 4.2.1)
#>  munsell        0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  nnet           7.3-17     2022-01-16 [1] CRAN (R 4.2.0)
#>  parallelly     1.32.1     2022-07-21 [1] CRAN (R 4.2.1)
#>  parsnip      * 1.0.1      2022-08-18 [1] CRAN (R 4.2.1)
#>  pillar         1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  prodlim        2019.11.13 2019-11-17 [1] CRAN (R 4.2.0)
#>  purrr        * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R6             2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp           1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  recipes      * 1.0.1      2022-07-07 [1] CRAN (R 4.2.0)
#>  reprex         2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang          1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown      2.16       2022-08-24 [1] CRAN (R 4.2.1)
#>  rpart          4.1.16     2022-01-24 [1] CRAN (R 4.2.0)
#>  rsample      * 1.1.0      2022-08-08 [1] CRAN (R 4.2.1)
#>  rstudioapi     0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales       * 1.2.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi        1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr        1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  survival       3.4-0      2022-08-09 [1] CRAN (R 4.2.1)
#>  tibble       * 3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  tidymodels   * 1.0.0      2022-07-13 [1] CRAN (R 4.2.1)
#>  tidyr        * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect     1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  timeDate       4021.104   2022-07-19 [1] CRAN (R 4.2.1)
#>  tune         * 1.0.0      2022-07-07 [1] CRAN (R 4.2.0)
#>  utf8           1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs          0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr          2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  workflows    * 1.0.0      2022-07-05 [1] CRAN (R 4.2.1)
#>  workflowsets * 1.0.0      2022-07-12 [1] CRAN (R 4.2.1)
#>  xfun           0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml           2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#>  yardstick    * 1.0.0      2022-06-06 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/dchoy/AppData/Local/Programs/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

The text was updated successfully, but these errors were encountered:

simonpcouch · 2022-09-08T13:16:25Z

Thanks for the issue!

Do you see this issue when using LightGBM without the bonsai wrapper? Does the issue persist if you use a different parallel backend?

Some notes on parellel processing with tune here, though you seem to have applied those notes. :)

DesmondChoy · 2022-09-12T15:39:30Z

I have tried registerDoParallel and registerDoFuture in my reprex above, and both resulted in the same error. Are there any other parallel backends I should try?

And no, I have not tried LightGBM without bonsai wrapper. Taking a look at related open issues in its GitHub though, I don't see this issue being raised.

simonpcouch · 2022-09-14T18:04:31Z

Thanks for the additional info! This one was fun to track down.

Just spent a bit more time with this. It makes sense that these backends would lead to downstream failures with LightGBM, as they, by default, spin up new R sessions with each core, fit the models there, and then pass the models back to the calling session. LightGBM objects can't be passed between R sessions without proper serialization (see our work on bundle and ?lightgbm::saveRDS.lgb.Booster()), and these parallel backends don't have any way of knowing how to properly serialize the models before passing them back to the calling session. This results in each resampled fit losing access to its Booster instance.

This should be fine, as we don't need any of the resampled fits to have access to that instance anymore once we have training_grid_results. The tricky bit here is that last_fit then uses the tune_grid machinery internally, and thus parallelizes the train/test fit when it can. If we disable parallelism there with control_last_fit(), the output of last_fit can predict/plot as expected:

library(tidymodels)
library(bonsai)
library(palmerpenguins)
#> 
#> Attaching package: 'palmerpenguins'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins
library(doParallel)
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: iterators
#> Loading required package: parallel
library(parallel)

split <- penguins |>
  initial_split(strata = species)

penguins_train <- training(split)
penguins_test <- testing(split)
folds <- vfold_cv(penguins_train, strata = species, 3)

recipe_basic <- penguins_train |>
  recipe(species ~ .)

lightgbm_spec <- boost_tree(trees = tune()) |>
  set_engine(
    "lightgbm",
    objective = "multiclass",
    metric = "multi_error",
    num_class = !!length(unique(penguins_train$species))
  ) |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

all_cores <- detectCores(logical = FALSE)
cl <- makePSOCKcluster(all_cores - 1)
registerDoParallel(cl)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)

ctrl <- control_last_fit()
ctrl$allow_par <- FALSE

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split, control = ctrl)

last_fit |>
  extract_workflow() |>
  predict(head(penguins_test))
#> # A tibble: 6 × 1
#>   .pred_class
#>   <fct>      
#> 1 Adelie     
#> 2 Adelie     
#> 3 Adelie     
#> 4 Adelie     
#> 5 Adelie     
#> 6 Adelie

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()

^{Created on 2022-09-14 by the reprex package (v2.0.1)}

Unfortunately, control_last_fit() doesn't accept (or supply a non-default) allow_par argument, so we need to alter that slot manually.

So, that's a patch for now, though I think we ought to be able to handle this without that workaround. I'll open a linked issue here in a moment!

simonpcouch · 2022-09-22T12:37:26Z

Going to close in favor of tidymodels/tune#539. Thanks for documenting this!

github-actions · 2023-01-11T01:42:51Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch mentioned this issue Sep 14, 2022

accept allow_par in control_last_fit() tidymodels/tune#539

Closed

simonpcouch closed this as completed Sep 22, 2022

github-actions bot locked and limited conversation to collaborators Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lightgbm tuned with parallel processing unable to predict #52

Lightgbm tuned with parallel processing unable to predict #52

DesmondChoy commented Sep 7, 2022 •

edited

Loading

simonpcouch commented Sep 8, 2022

DesmondChoy commented Sep 12, 2022

simonpcouch commented Sep 14, 2022

simonpcouch commented Sep 22, 2022

github-actions bot commented Jan 11, 2023

Lightgbm tuned with parallel processing unable to predict #52

Lightgbm tuned with parallel processing unable to predict #52

Comments

DesmondChoy commented Sep 7, 2022 • edited Loading

Multiclass tuning without parallel processing

Multiclass tuning with parallel processing (registerDoParallel)

Multiclass tuning with parallel processing (registerDoFuture)

Binary class tuning with parallel processing (registerDoFuture)

simonpcouch commented Sep 8, 2022

DesmondChoy commented Sep 12, 2022

simonpcouch commented Sep 14, 2022

simonpcouch commented Sep 22, 2022

github-actions bot commented Jan 11, 2023

DesmondChoy commented Sep 7, 2022 •

edited

Loading