Optuna: Automatic Hyperparameter Optimisation

Next — Today I Learnt About Data Science | Issue #82

Aug 02, 2023

Hi there!

🧹 Some housekeeping: I’m travelling to KDD Conference in Long Beach, CA. I’m presenting our research on end-to-end inventory prediction and optimisation. Thus, there will be no letter next week. The next Next will hit your inbox on August 16, 2023. Let’s dive in.

Last week, we learnt about hyperparameter tuning. To recap, there are three basic ways to optimise hyperparameters.

First, we can do a full grid-search, i.e. try every combination of the hyperparameter arrays in the search space. Second, we can do a random search, i.e. search a random subset of full factorial options possible. Third, there is Bayesian search which uses information from results of past trials to decide future combinations to try.

I first learnt about Optuna last summer, when we had to fine tune our models during my internship. Optuna is a favourite tool of Kagglers; it is the most common tuning package (LightGBM is the most commonly used model). If so many Kagglers are using it, it must have a really efficient searching algorithm. After all, Kaggle competition winners are almost always in 1% of each other.

Once I started using Optuna, I realised why everyone liked it so much. Apart from the efficient algorithm (as detailed in their research paper), their API was surprisingly easy to use. Additionally, it comes with a dashboard that could visualise the experiment trials while they were happening!

In the following figure, Optuna’s API is briefly described.

What makes Optuna special?

👨‍💻 Define-by-run API

Optuna uses a define-by-run API where the search space is constructed dynamically by calling suggest methods inside the objective function, rather than requiring full static specification upfront (define-and-run). This allows for very flexible and modular definition of search spaces, including conditional hyperparameters and hierarchies.

For example, the number of layers in a neural network can be sampled, and then size of each layer can be dynamically sampled in a loop based on that. This modular approach wasn't possible in traditional define-and-run frameworks like Hyperopt. In fact, define-and-run is borrowed from Deep-Learning vocabulary where networks once defined couldn’t be altered during computation.

🔬 Surrogate Model Construction

Optuna uses a Gaussian Process (GP) as a surrogate model to predict the loss function value for different hyperparameters (the algorithm uses the terminology “regret”). Essentially, it approximates the unknown loss function with a probability distribution. This makes the optimisation process a lot smoother, and knowing when to terminate trials become a lot simpler.

🚂 Acquisition Function

Optuna implements smart algorithms to choose the next iteration. This step involves the use of an acquisition function, such as Expected Improvement (EI), to decide where to evaluate the objective function next. It strikes a balance between exploring new regions of the hyperparameter space and exploiting known good areas.

🛝 Sampling Strategy

Optuna applies a Tree-structured Parzen Estimator (TPE) sampler, a form of Bayesian Optimisation. This method computes the next set of hyperparameters to sample based on the history of past evaluations. It's a sort of informed guess, guiding the search towards promising regions.

It also has the ability to use CMA-ES (Covariance Matrix Adaptation Evolution Strategy), which samples new configurations based on a multivariate normal distribution computed from correlations between good configurations. Furthermore, it can choose a combination of the two as well — again, automatically with no input from the user.

🌲 Pruning

Optuna uses aggressive early stopping of unpromising trials to focus computational effort. The key algorithm is Asynchronous Successive Halving, where trials are terminated early based on provisional rankings. This scales well to parallel and distributed settings since workers don't have to synchronize at every step.

The combination of efficient sampling and pruning allows Optuna to highly optimize the use of computational resources. Sampling finds promising areas quickly, pruning eliminates poor performers cheaply. Personally, I’ve only used MedianPruner which prunes the trial branch if the trial’s best intermediate result is worse than median of intermediate results of previous n trials at the same step.

🎬 Versatile Models and Flexible API

Optuna's standout feature is its intuitive and user-friendly API, which is designed with simplicity and flexibility in mind. Even newcomers to the field of machine learning can easily set up and run hyperparameter optimisation without being bogged down by complex configurations.

The API's integration with popular machine learning frameworks like TensorFlow, PyTorch, LightGBM, XGBoost, and Scikit-learn further streamlines the process, allowing me to focus more on modeling and less on the intricacies of tuning. It is seriously mind-boggling how many models it supports out of the box. Its well-designed architecture provides a smooth experience, whether you're aiming for a quick prototype or scaling up to production-level code.

Best part? In case the code fails midway, say after 15 trials while you set up for 100 trials, you start from 16th and not all over again! This is unique to Optuna that I’ve not seen in other packages.

💻 Optuna Dashboard

Another remarkable aspect of Optuna is its sophisticated dashboard, which offers a comprehensive visualization interface for the optimization process. This dashboard enables us to monitor the progress of the search in real-time, offering insights into the performance of different hyperparameters, convergence behavior, and more.

Such visualizations not only make the tuning process more transparent but also provide valuable insights that can guide further experimentation.

📑 Conclusion

With its ability to perform intelligent searching through hyperparameter space, coupled with the efficiency of its pruning techniques, Optuna has become an indispensable tool for many ML practitioners.

The landscape of hyperparameter tuning is rich and varied, offering methods that range from brute force grid search to elegant optimization algorithms like those employed by Optuna. When data is king, and time is often limited, tools like Optuna stand as a beacon for efficiency, creativity, and innovation.

🖍️ Miscellaneous

Hope you enjoyed today’s letter! See you on August 16.

Harsh

Next — Today I Learnt About Data Science