Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightgbm tuned with parallel processing unable to predict #52

Closed
DesmondChoy opened this issue Sep 7, 2022 · 5 comments
Closed

Lightgbm tuned with parallel processing unable to predict #52

DesmondChoy opened this issue Sep 7, 2022 · 5 comments

Comments

@DesmondChoy
Copy link

DesmondChoy commented Sep 7, 2022

Lightgbm model (after using last_fit()) isn't able to predict when parallel processing is used.
I tried using both registerDoParallel and registerDoFuture and both gave me the same error.
I tried it on both multiclass and binary classification problems, but both cases with parallel processing still gave me the error.

Error messages are mentioned below in the reprexes:

Multiclass tuning without parallel processing

No issues. Able to predict, and extract feature importance.

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.2.1
#> Warning: package 'broom' was built under R version 4.2.1
#> Warning: package 'scales' was built under R version 4.2.1
#> Warning: package 'infer' was built under R version 4.2.1
#> Warning: package 'modeldata' was built under R version 4.2.1
#> Warning: package 'parsnip' was built under R version 4.2.1
#> Warning: package 'rsample' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'workflows' was built under R version 4.2.1
#> Warning: package 'workflowsets' was built under R version 4.2.1
library(bonsai)
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.2.1
#> 
#> Attaching package: 'palmerpenguins'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins

split <- penguins |>
  initial_split(strata = species)

penguins_train <- training(split)
penguins_test <- testing(split)
folds <- vfold_cv(penguins_train, strata = species, 3)

recipe_basic <- penguins_train |>
  recipe(species ~ .)

lightgbm_spec <- boost_tree(trees = tune()) |>
  set_engine(
    "lightgbm",
    objective = "multiclass",
    metric = "multi_error",
    num_class = !!length(unique(penguins_train$species))
  ) |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)
#> Warning: package 'lightgbm' was built under R version 4.2.1

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split)

last_fit |>
  extract_workflow() |>
  predict(head(penguins_test))
#> # A tibble: 6 × 1
#>   .pred_class
#>   <fct>      
#> 1 Adelie     
#> 2 Adelie     
#> 3 Adelie     
#> 4 Adelie     
#> 5 Adelie     
#> 6 Adelie

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()

Created on 2022-09-07 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19042)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2022-09-07
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date (UTC) lib source
#>  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports        1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  bonsai         * 0.2.0      2022-08-31 [1] CRAN (R 4.2.0)
#>  broom          * 1.0.1      2022-08-29 [1] CRAN (R 4.2.1)
#>  class            7.3-20     2022-01-16 [1] CRAN (R 4.2.0)
#>  cli              3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools        0.2-18     2020-11-04 [1] CRAN (R 4.2.0)
#>  colorspace       2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon           1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>  curl             4.3.2      2021-06-23 [1] CRAN (R 4.2.0)
#>  data.table       1.14.2     2021-09-27 [1] CRAN (R 4.2.0)
#>  DBI              1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dials          * 1.0.0      2022-06-14 [1] CRAN (R 4.2.0)
#>  DiceDesign       1.9        2021-02-13 [1] CRAN (R 4.2.0)
#>  digest           0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr          * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate         0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi            1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap          1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  foreach          1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
#>  fs               1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  furrr            0.3.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  future           1.27.0     2022-07-22 [1] CRAN (R 4.2.1)
#>  future.apply     1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  generics         0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  ggplot2        * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  globals          0.16.1     2022-08-28 [1] CRAN (R 4.2.0)
#>  glue             1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  gower            1.0.0      2022-02-03 [1] CRAN (R 4.2.0)
#>  GPfit            1.0-8      2019-02-08 [1] CRAN (R 4.2.0)
#>  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>  hardhat          1.2.0      2022-06-30 [1] CRAN (R 4.2.1)
#>  highr            0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools        0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  httr             1.4.3      2022-05-04 [1] CRAN (R 4.2.0)
#>  infer          * 1.0.2      2022-06-10 [1] CRAN (R 4.2.1)
#>  ipred            0.9-13     2022-06-02 [1] CRAN (R 4.2.1)
#>  iterators        1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
#>  jsonlite         1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr            1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice          0.20-45    2021-09-22 [1] CRAN (R 4.2.0)
#>  lava             1.6.10     2021-09-02 [1] CRAN (R 4.2.0)
#>  lhs              1.1.5      2022-03-22 [1] CRAN (R 4.2.0)
#>  lifecycle        1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  lightgbm       * 3.3.2      2022-01-14 [1] CRAN (R 4.2.1)
#>  listenv          0.8.0      2019-12-05 [1] CRAN (R 4.2.0)
#>  lubridate        1.8.0      2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS             7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
#>  Matrix           1.4-1      2022-03-23 [1] CRAN (R 4.2.0)
#>  mime             0.12       2021-09-28 [1] CRAN (R 4.2.0)
#>  modeldata      * 1.0.0      2022-07-01 [1] CRAN (R 4.2.1)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  nnet             7.3-17     2022-01-16 [1] CRAN (R 4.2.0)
#>  palmerpenguins * 0.1.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  parallelly       1.32.1     2022-07-21 [1] CRAN (R 4.2.1)
#>  parsnip        * 1.0.1      2022-08-18 [1] CRAN (R 4.2.1)
#>  pillar           1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  prodlim          2019.11.13 2019-11-17 [1] CRAN (R 4.2.0)
#>  purrr          * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R6             * 2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp             1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  recipes        * 1.0.1      2022-07-07 [1] CRAN (R 4.2.0)
#>  reprex           2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang            1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown        2.16       2022-08-24 [1] CRAN (R 4.2.1)
#>  rpart            4.1.16     2022-01-24 [1] CRAN (R 4.2.0)
#>  rsample        * 1.1.0      2022-08-08 [1] CRAN (R 4.2.1)
#>  rstudioapi       0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales         * 1.2.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi          1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr          1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  survival         3.4-0      2022-08-09 [1] CRAN (R 4.2.1)
#>  tibble         * 3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  tidymodels     * 1.0.0      2022-07-13 [1] CRAN (R 4.2.1)
#>  tidyr          * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect       1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  timeDate         4021.104   2022-07-19 [1] CRAN (R 4.2.1)
#>  tune           * 1.0.0      2022-07-07 [1] CRAN (R 4.2.0)
#>  utf8             1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs            0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr            2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  workflows      * 1.0.0      2022-07-05 [1] CRAN (R 4.2.1)
#>  workflowsets   * 1.0.0      2022-07-12 [1] CRAN (R 4.2.1)
#>  xfun             0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  xml2             1.3.3      2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml             2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#>  yardstick      * 1.0.0      2022-06-06 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/dchoy/AppData/Local/Programs/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Multiclass tuning with parallel processing (registerDoParallel)

Unable to predict nor extract feature importance.

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.2.1
#> Warning: package 'broom' was built under R version 4.2.1
#> Warning: package 'scales' was built under R version 4.2.1
#> Warning: package 'infer' was built under R version 4.2.1
#> Warning: package 'modeldata' was built under R version 4.2.1
#> Warning: package 'parsnip' was built under R version 4.2.1
#> Warning: package 'rsample' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'workflows' was built under R version 4.2.1
#> Warning: package 'workflowsets' was built under R version 4.2.1
library(bonsai)
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.2.1
#> 
#> Attaching package: 'palmerpenguins'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins
library(doParallel)
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: iterators
#> Loading required package: parallel
library(parallel)

split <- penguins |>
  initial_split(strata = species)

penguins_train <- training(split)
penguins_test <- testing(split)
folds <- vfold_cv(penguins_train, strata = species, 3)

recipe_basic <- penguins_train |>
  recipe(species ~ .)

lightgbm_spec <- boost_tree(trees = tune()) |>
  set_engine(
    "lightgbm",
    objective = "multiclass",
    metric = "multi_error",
    num_class = !!length(unique(penguins_train$species))
  ) |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

all_cores <- detectCores(logical = FALSE)
cl <- makePSOCKcluster(all_cores - 1)
registerDoParallel(cl)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split)

last_fit |>
  extract_workflow() |>
  predict(head(penguins_test))
#> Error in predictor$predict(data = data, start_iteration = start_iteration, : Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()
#> Error in booster$dump_model(num_iteration = num_iteration): Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

Created on 2022-09-07 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19042)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2022-09-07
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date (UTC) lib source
#>  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports        1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  bonsai         * 0.2.0      2022-08-31 [1] CRAN (R 4.2.0)
#>  broom          * 1.0.1      2022-08-29 [1] CRAN (R 4.2.1)
#>  class            7.3-20     2022-01-16 [1] CRAN (R 4.2.0)
#>  cli              3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools        0.2-18     2020-11-04 [1] CRAN (R 4.2.0)
#>  colorspace       2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon           1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>  data.table       1.14.2     2021-09-27 [1] CRAN (R 4.2.0)
#>  DBI              1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dials          * 1.0.0      2022-06-14 [1] CRAN (R 4.2.0)
#>  DiceDesign       1.9        2021-02-13 [1] CRAN (R 4.2.0)
#>  digest           0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  doParallel     * 1.0.17     2022-02-07 [1] CRAN (R 4.2.0)
#>  dplyr          * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate         0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi            1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap          1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  foreach        * 1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
#>  fs               1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  furrr            0.3.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  future           1.27.0     2022-07-22 [1] CRAN (R 4.2.1)
#>  future.apply     1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  generics         0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  ggplot2        * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  globals          0.16.1     2022-08-28 [1] CRAN (R 4.2.0)
#>  glue             1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  gower            1.0.0      2022-02-03 [1] CRAN (R 4.2.0)
#>  GPfit            1.0-8      2019-02-08 [1] CRAN (R 4.2.0)
#>  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>  hardhat          1.2.0      2022-06-30 [1] CRAN (R 4.2.1)
#>  highr            0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools        0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  infer          * 1.0.2      2022-06-10 [1] CRAN (R 4.2.1)
#>  ipred            0.9-13     2022-06-02 [1] CRAN (R 4.2.1)
#>  iterators      * 1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
#>  jsonlite         1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr            1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice          0.20-45    2021-09-22 [1] CRAN (R 4.2.0)
#>  lava             1.6.10     2021-09-02 [1] CRAN (R 4.2.0)
#>  lhs              1.1.5      2022-03-22 [1] CRAN (R 4.2.0)
#>  lifecycle        1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  lightgbm         3.3.2      2022-01-14 [1] CRAN (R 4.2.1)
#>  listenv          0.8.0      2019-12-05 [1] CRAN (R 4.2.0)
#>  lubridate        1.8.0      2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS             7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
#>  Matrix           1.4-1      2022-03-23 [1] CRAN (R 4.2.0)
#>  modeldata      * 1.0.0      2022-07-01 [1] CRAN (R 4.2.1)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  nnet             7.3-17     2022-01-16 [1] CRAN (R 4.2.0)
#>  palmerpenguins * 0.1.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  parallelly       1.32.1     2022-07-21 [1] CRAN (R 4.2.1)
#>  parsnip        * 1.0.1      2022-08-18 [1] CRAN (R 4.2.1)
#>  pillar           1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  prodlim          2019.11.13 2019-11-17 [1] CRAN (R 4.2.0)
#>  purrr          * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R6               2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp             1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  recipes        * 1.0.1      2022-07-07 [1] CRAN (R 4.2.0)
#>  reprex           2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang            1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown        2.16       2022-08-24 [1] CRAN (R 4.2.1)
#>  rpart            4.1.16     2022-01-24 [1] CRAN (R 4.2.0)
#>  rsample        * 1.1.0      2022-08-08 [1] CRAN (R 4.2.1)
#>  rstudioapi       0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales         * 1.2.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi          1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr          1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  survival         3.4-0      2022-08-09 [1] CRAN (R 4.2.1)
#>  tibble         * 3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  tidymodels     * 1.0.0      2022-07-13 [1] CRAN (R 4.2.1)
#>  tidyr          * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect       1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  timeDate         4021.104   2022-07-19 [1] CRAN (R 4.2.1)
#>  tune           * 1.0.0      2022-07-07 [1] CRAN (R 4.2.0)
#>  utf8             1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs            0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr            2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  workflows      * 1.0.0      2022-07-05 [1] CRAN (R 4.2.1)
#>  workflowsets   * 1.0.0      2022-07-12 [1] CRAN (R 4.2.1)
#>  xfun             0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml             2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#>  yardstick      * 1.0.0      2022-06-06 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/dchoy/AppData/Local/Programs/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Multiclass tuning with parallel processing (registerDoFuture)

Unable to predict nor extract feature importance.

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.2.1
#> Warning: package 'broom' was built under R version 4.2.1
#> Warning: package 'scales' was built under R version 4.2.1
#> Warning: package 'infer' was built under R version 4.2.1
#> Warning: package 'modeldata' was built under R version 4.2.1
#> Warning: package 'parsnip' was built under R version 4.2.1
#> Warning: package 'rsample' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'workflows' was built under R version 4.2.1
#> Warning: package 'workflowsets' was built under R version 4.2.1
library(bonsai)
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.2.1
#> 
#> Attaching package: 'palmerpenguins'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins
library(doFuture)
#> Warning: package 'doFuture' was built under R version 4.2.1
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: future
#> Warning: package 'future' was built under R version 4.2.1
library(parallel)

split <- penguins |>
  initial_split(strata = species)

penguins_train <- training(split)
penguins_test <- testing(split)
folds <- vfold_cv(penguins_train, strata = species, 3)

recipe_basic <- penguins_train |>
  recipe(species ~ .)

lightgbm_spec <- boost_tree(trees = tune()) |>
  set_engine(
    "lightgbm",
    objective = "multiclass",
    metric = "multi_error",
    num_class = !!length(unique(penguins_train$species))
  ) |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

all_cores <- detectCores(logical = FALSE)
registerDoFuture()
cl <- makeCluster(all_cores)
plan(cluster, workers = cl)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split)

last_fit |>
  extract_workflow() |>
  predict(head(penguins_test))
#> Error in predictor$predict(data = data, start_iteration = start_iteration, : Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()
#> Error in booster$dump_model(num_iteration = num_iteration): Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

Created on 2022-09-07 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19042)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2022-09-07
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date (UTC) lib source
#>  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports        1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  bonsai         * 0.2.0      2022-08-31 [1] CRAN (R 4.2.0)
#>  broom          * 1.0.1      2022-08-29 [1] CRAN (R 4.2.1)
#>  class            7.3-20     2022-01-16 [1] CRAN (R 4.2.0)
#>  cli              3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools        0.2-18     2020-11-04 [1] CRAN (R 4.2.0)
#>  colorspace       2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon           1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>  data.table       1.14.2     2021-09-27 [1] CRAN (R 4.2.0)
#>  DBI              1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dials          * 1.0.0      2022-06-14 [1] CRAN (R 4.2.0)
#>  DiceDesign       1.9        2021-02-13 [1] CRAN (R 4.2.0)
#>  digest           0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  doFuture       * 0.12.2     2022-04-26 [1] CRAN (R 4.2.1)
#>  dplyr          * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate         0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi            1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap          1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  foreach        * 1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
#>  fs               1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  furrr            0.3.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  future         * 1.27.0     2022-07-22 [1] CRAN (R 4.2.1)
#>  future.apply     1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  generics         0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  ggplot2        * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  globals          0.16.1     2022-08-28 [1] CRAN (R 4.2.0)
#>  glue             1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  gower            1.0.0      2022-02-03 [1] CRAN (R 4.2.0)
#>  GPfit            1.0-8      2019-02-08 [1] CRAN (R 4.2.0)
#>  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>  hardhat          1.2.0      2022-06-30 [1] CRAN (R 4.2.1)
#>  highr            0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools        0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  infer          * 1.0.2      2022-06-10 [1] CRAN (R 4.2.1)
#>  ipred            0.9-13     2022-06-02 [1] CRAN (R 4.2.1)
#>  iterators        1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
#>  jsonlite         1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr            1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice          0.20-45    2021-09-22 [1] CRAN (R 4.2.0)
#>  lava             1.6.10     2021-09-02 [1] CRAN (R 4.2.0)
#>  lhs              1.1.5      2022-03-22 [1] CRAN (R 4.2.0)
#>  lifecycle        1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  lightgbm         3.3.2      2022-01-14 [1] CRAN (R 4.2.1)
#>  listenv          0.8.0      2019-12-05 [1] CRAN (R 4.2.0)
#>  lubridate        1.8.0      2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS             7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
#>  Matrix           1.4-1      2022-03-23 [1] CRAN (R 4.2.0)
#>  modeldata      * 1.0.0      2022-07-01 [1] CRAN (R 4.2.1)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  nnet             7.3-17     2022-01-16 [1] CRAN (R 4.2.0)
#>  palmerpenguins * 0.1.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  parallelly       1.32.1     2022-07-21 [1] CRAN (R 4.2.1)
#>  parsnip        * 1.0.1      2022-08-18 [1] CRAN (R 4.2.1)
#>  pillar           1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  prodlim          2019.11.13 2019-11-17 [1] CRAN (R 4.2.0)
#>  purrr          * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R6               2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp             1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  recipes        * 1.0.1      2022-07-07 [1] CRAN (R 4.2.0)
#>  reprex           2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang            1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown        2.16       2022-08-24 [1] CRAN (R 4.2.1)
#>  rpart            4.1.16     2022-01-24 [1] CRAN (R 4.2.0)
#>  rsample        * 1.1.0      2022-08-08 [1] CRAN (R 4.2.1)
#>  rstudioapi       0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales         * 1.2.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi          1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr          1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  survival         3.4-0      2022-08-09 [1] CRAN (R 4.2.1)
#>  tibble         * 3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  tidymodels     * 1.0.0      2022-07-13 [1] CRAN (R 4.2.1)
#>  tidyr          * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect       1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  timeDate         4021.104   2022-07-19 [1] CRAN (R 4.2.1)
#>  tune           * 1.0.0      2022-07-07 [1] CRAN (R 4.2.0)
#>  utf8             1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs            0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr            2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  workflows      * 1.0.0      2022-07-05 [1] CRAN (R 4.2.1)
#>  workflowsets   * 1.0.0      2022-07-12 [1] CRAN (R 4.2.1)
#>  xfun             0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml             2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#>  yardstick      * 1.0.0      2022-06-06 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/dchoy/AppData/Local/Programs/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Binary class tuning with parallel processing (registerDoFuture)

Unable to predict nor extract feature importance.

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.2.1
#> Warning: package 'broom' was built under R version 4.2.1
#> Warning: package 'scales' was built under R version 4.2.1
#> Warning: package 'infer' was built under R version 4.2.1
#> Warning: package 'modeldata' was built under R version 4.2.1
#> Warning: package 'parsnip' was built under R version 4.2.1
#> Warning: package 'rsample' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'workflows' was built under R version 4.2.1
#> Warning: package 'workflowsets' was built under R version 4.2.1
library(bonsai)
library(doFuture)
#> Warning: package 'doFuture' was built under R version 4.2.1
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: future
#> Warning: package 'future' was built under R version 4.2.1
library(parallel)

split <- modeldata::bivariate_test |>
  initial_split()

data_train <- training(split)
data_test <- testing(split)
folds <- vfold_cv(data_train, 3)

recipe_basic <- data_train |>
  recipe(Class ~ .)

lightgbm_spec <- boost_tree(trees = tune(),) |>
  set_engine("lightgbm") |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

all_cores <- detectCores(logical = FALSE)
registerDoFuture()
cl <- makeCluster(all_cores)
plan(cluster, workers = cl)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split)

last_fit |>
  extract_workflow() |>
  predict(head(data_test))
#> Error in predictor$predict(data = data, start_iteration = start_iteration, : Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()
#> Error in booster$dump_model(num_iteration = num_iteration): Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

Created on 2022-09-07 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19042)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.utf8
#>  ctype    English_Singapore.utf8
#>  tz       Asia/Kuala_Lumpur
#>  date     2022-09-07
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date (UTC) lib source
#>  assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports      1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  bonsai       * 0.2.0      2022-08-31 [1] CRAN (R 4.2.0)
#>  broom        * 1.0.1      2022-08-29 [1] CRAN (R 4.2.1)
#>  class          7.3-20     2022-01-16 [1] CRAN (R 4.2.0)
#>  cli            3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  codetools      0.2-18     2020-11-04 [1] CRAN (R 4.2.0)
#>  colorspace     2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>  crayon         1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>  data.table     1.14.2     2021-09-27 [1] CRAN (R 4.2.0)
#>  DBI            1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dials        * 1.0.0      2022-06-14 [1] CRAN (R 4.2.0)
#>  DiceDesign     1.9        2021-02-13 [1] CRAN (R 4.2.0)
#>  digest         0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  doFuture     * 0.12.2     2022-04-26 [1] CRAN (R 4.2.1)
#>  dplyr        * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
#>  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate       0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi          1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  foreach      * 1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
#>  fs             1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  furrr          0.3.1      2022-08-15 [1] CRAN (R 4.2.1)
#>  future       * 1.27.0     2022-07-22 [1] CRAN (R 4.2.1)
#>  future.apply   1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  generics       0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  ggplot2      * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  globals        0.16.1     2022-08-28 [1] CRAN (R 4.2.0)
#>  glue           1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  gower          1.0.0      2022-02-03 [1] CRAN (R 4.2.0)
#>  GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.2.0)
#>  gtable         0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>  hardhat        1.2.0      2022-06-30 [1] CRAN (R 4.2.1)
#>  highr          0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools      0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  infer        * 1.0.2      2022-06-10 [1] CRAN (R 4.2.1)
#>  ipred          0.9-13     2022-06-02 [1] CRAN (R 4.2.1)
#>  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
#>  jsonlite       1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr          1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice        0.20-45    2021-09-22 [1] CRAN (R 4.2.0)
#>  lava           1.6.10     2021-09-02 [1] CRAN (R 4.2.0)
#>  lhs            1.1.5      2022-03-22 [1] CRAN (R 4.2.0)
#>  lifecycle      1.0.1      2021-09-24 [1] CRAN (R 4.2.0)
#>  lightgbm       3.3.2      2022-01-14 [1] CRAN (R 4.2.1)
#>  listenv        0.8.0      2019-12-05 [1] CRAN (R 4.2.0)
#>  lubridate      1.8.0      2021-10-07 [1] CRAN (R 4.2.0)
#>  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS           7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
#>  Matrix         1.4-1      2022-03-23 [1] CRAN (R 4.2.0)
#>  modeldata    * 1.0.0      2022-07-01 [1] CRAN (R 4.2.1)
#>  munsell        0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  nnet           7.3-17     2022-01-16 [1] CRAN (R 4.2.0)
#>  parallelly     1.32.1     2022-07-21 [1] CRAN (R 4.2.1)
#>  parsnip      * 1.0.1      2022-08-18 [1] CRAN (R 4.2.1)
#>  pillar         1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  prodlim        2019.11.13 2019-11-17 [1] CRAN (R 4.2.0)
#>  purrr        * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R6             2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp           1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  recipes      * 1.0.1      2022-07-07 [1] CRAN (R 4.2.0)
#>  reprex         2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang          1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown      2.16       2022-08-24 [1] CRAN (R 4.2.1)
#>  rpart          4.1.16     2022-01-24 [1] CRAN (R 4.2.0)
#>  rsample      * 1.1.0      2022-08-08 [1] CRAN (R 4.2.1)
#>  rstudioapi     0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  scales       * 1.2.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi        1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr        1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  survival       3.4-0      2022-08-09 [1] CRAN (R 4.2.1)
#>  tibble       * 3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  tidymodels   * 1.0.0      2022-07-13 [1] CRAN (R 4.2.1)
#>  tidyr        * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
#>  tidyselect     1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  timeDate       4021.104   2022-07-19 [1] CRAN (R 4.2.1)
#>  tune         * 1.0.0      2022-07-07 [1] CRAN (R 4.2.0)
#>  utf8           1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs          0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr          2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  workflows    * 1.0.0      2022-07-05 [1] CRAN (R 4.2.1)
#>  workflowsets * 1.0.0      2022-07-12 [1] CRAN (R 4.2.1)
#>  xfun           0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml           2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#>  yardstick    * 1.0.0      2022-06-06 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/dchoy/AppData/Local/Programs/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@simonpcouch
Copy link
Contributor

Thanks for the issue!

Do you see this issue when using LightGBM without the bonsai wrapper? Does the issue persist if you use a different parallel backend?

Some notes on parellel processing with tune here, though you seem to have applied those notes. :)

@DesmondChoy
Copy link
Author

I have tried registerDoParallel and registerDoFuture in my reprex above, and both resulted in the same error. Are there any other parallel backends I should try?

And no, I have not tried LightGBM without bonsai wrapper. Taking a look at related open issues in its GitHub though, I don't see this issue being raised.

@simonpcouch
Copy link
Contributor

Thanks for the additional info! This one was fun to track down.

Just spent a bit more time with this. It makes sense that these backends would lead to downstream failures with LightGBM, as they, by default, spin up new R sessions with each core, fit the models there, and then pass the models back to the calling session. LightGBM objects can't be passed between R sessions without proper serialization (see our work on bundle and ?lightgbm::saveRDS.lgb.Booster()), and these parallel backends don't have any way of knowing how to properly serialize the models before passing them back to the calling session. This results in each resampled fit losing access to its Booster instance.

This should be fine, as we don't need any of the resampled fits to have access to that instance anymore once we have training_grid_results. The tricky bit here is that last_fit then uses the tune_grid machinery internally, and thus parallelizes the train/test fit when it can. If we disable parallelism there with control_last_fit(), the output of last_fit can predict/plot as expected:

library(tidymodels)
library(bonsai)
library(palmerpenguins)
#> 
#> Attaching package: 'palmerpenguins'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins
library(doParallel)
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: iterators
#> Loading required package: parallel
library(parallel)

split <- penguins |>
  initial_split(strata = species)

penguins_train <- training(split)
penguins_test <- testing(split)
folds <- vfold_cv(penguins_train, strata = species, 3)

recipe_basic <- penguins_train |>
  recipe(species ~ .)

lightgbm_spec <- boost_tree(trees = tune()) |>
  set_engine(
    "lightgbm",
    objective = "multiclass",
    metric = "multi_error",
    num_class = !!length(unique(penguins_train$species))
  ) |>
  set_mode("classification")

lightgbm_wflow <- workflow(preprocessor = recipe_basic,
                           spec = lightgbm_spec)

all_cores <- detectCores(logical = FALSE)
cl <- makePSOCKcluster(all_cores - 1)
registerDoParallel(cl)

training_grid_results <- lightgbm_wflow |>
  tune_grid(resamples = folds,
            grid = 5)

ctrl <- control_last_fit()
ctrl$allow_par <- FALSE

last_fit <- lightgbm_wflow |>
  finalize_workflow(select_best(training_grid_results, "roc_auc")) |>
  last_fit(split, control = ctrl)

last_fit |>
  extract_workflow() |>
  predict(head(penguins_test))
#> # A tibble: 6 × 1
#>   .pred_class
#>   <fct>      
#> 1 Adelie     
#> 2 Adelie     
#> 3 Adelie     
#> 4 Adelie     
#> 5 Adelie     
#> 6 Adelie

last_fit |>
  extract_fit_engine() |>
  lightgbm::lgb.importance() |>
  lightgbm::lgb.plot.importance()

Created on 2022-09-14 by the reprex package (v2.0.1)

Unfortunately, control_last_fit() doesn't accept (or supply a non-default) allow_par argument, so we need to alter that slot manually.

So, that's a patch for now, though I think we ought to be able to handle this without that workaround. I'll open a linked issue here in a moment!

@simonpcouch
Copy link
Contributor

Going to close in favor of tidymodels/tune#539. Thanks for documenting this!

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants