Particle Swarm Optimization
This story will cover the fundamentals of Metaheuristics and will give you an introduction to the basic concept of the Particle Swarm Optimization (PSO) algorithm.
What are Metaheuristics?¹
A metaheuristic is a high-level abstraction of established sets of mathematical rules with some randomness that leads a search process to find (near)-optimal solutions to a problem.¹
Metaheuristics are used to solve optimization problems as an alternative to exact methods like Newton’s Method or Gradient Descent. Their advantage over such exact methods is their low computational costs. The search process of metaheuristic algorithms, like PSO, are based on two main principles.
- Exploration (or Diversification)
- Exploitation (or Intensification)
Through Exploration the algorithms wants to make sure the whole search space is covered. The algorithm is “exploring” the search space and therefore increasing the chances to find the optimal solution. In contrast to that is Exploitation which describes the process of using already known information and directing the search to promising solutions. The metaheuristic algorithm convergences towards a minimum but it’s not guaranteed to be the overall optimum. One challenge in applying algorithms like PSO is to avoid Exploitation towards local minima. However, over extensive Exploration is not the solution for that challenge because now convergence speed drops. It is always a tradeoff between Exploration and Exploitation.
Another key element of Metaheuristics are their Hyperparameters. Different Metaheuristics have different Hyperparameters. Because of the impact on convergence speed and accuracy of the solution it’s mostly useful (or necessary) to perform a study on the Hyperparameters of your algorithm. if you are interested in a specific Hyperparameter Tuning method have a look at one of my other stories, covering Deep Random Search.
Deep Random SearchA greedy alternative for your Hyperparameter Tuningmedium.com
Particle Swarm Optimization²
Let’s start with the obvious analogy. PSO works similar to a swarm of birds searching for a food source. Every bird has its own memory and can remember the best solution he has found so far. This is also called the cognitive component of the update rule. But the birds can communicate with each other and they can exchange their best found solution. This is the social componentof the update rule.
You can now define a weight or a degree of trust for each component. Either you want your particles more isolated from each other and increase the impact of the cognitive component or you want the opposite and therefore increase the impact of the social component. In addition to the weights for each component there is another Hyperparameter. In each iteration of PSO a random scaling factor (between 0 and 2 e.g.) is multiplied with each component. Most Metaheuristics have this probabilistic aspect.
Figure 1 shows the position update rule of the algorithm. In an analogy to classical mechanics, the next position depends on the current position and the velocity (and a timestep, but you can set this to 1 at the beginning, it’s just another scaling factor). The position is a solution vector for your optimization problem (arguments of the function you want to optimize).
Figure 2 shows the velocity update rule of the algorithm. The weight c and the random scaling factor r need to be defined for both cognitive and social component. The cognitive component is calculated via the difference of the current position and the individual best known solution of the particle XP. The social component is calculated via the difference of the current position and the global or swarms best known position XG.
There is a third component in the velocity update rule called inertia. Again, as an analogy to classical mechanics, this component affects wether a particle can change its velocity quickly or rather slowly.
And that’s basically it. You need the position and velocity update rule, define your Hyperparameters and are ready to go!
What is this useful for?
Metaheuristic optimization can be used in many different fields of application, e.g. mathematics (optimization problems) or engineering (controller tuning). One advantage over gradient based optimization algorithms is the fact that the function of interest has no need to be differentiable at any point in the search space. You also save the computational costs of calculating the gradients. In general, Metaheuristics are considered faster than classical approaches but again, there is no guarantee they find the global optimum.
Let’s go with an example and analyze a typical benchmark function, the sphere function. I won’t go much into detail here but to compare different algorithms there are many so called benchmark functions which are used to test an optimization algorithm and measure its performance. The sphere function has a global minimum of 0.
I defined a PSO algorithm with 50 particles. The sphere function has 10 dimensions in this example. After 5000 iterations (about 5 seconds in this case) the algorithm stopped and the best solution found was 1e-150. Figure 5 shows the result of the search for the first 50 iterations and two selected dimensions. It clearly shows how the particles are exploring the search space and later converging towards 0 (Exploration vs. Exploitation).
Summary
Particle Swarm Optimization is a metaheuristic optimization method. It uses the concept of exploration and exploitation. These class of algorithms can be faster than classical approaches like Newton’s method or Gradient Descent. But you have to be aware that Metaheuristics might not find the global optimum and get stuck in a local minimum.
The way how PSO works is similar to a swarm of birds searching for a food source. Each bird memorizes his own so far best spotted source. All birds can communicate within the swarm and can share their most promising spots with each other. There is a tradeoff between the impact of the social and cognitive component, always depending on your problem.
PSO has many Hyperparameters which need to be fitted to your problem. One tuning method e.g. could be a Deep Random Search.
Thank you for reading this far! If you have any thoughts, annotations or ideas about this story feel free to leave a comment. Also, if you want me to write a story about a specific topic regarding this story (or any other topic) just send me a message or leave a comment.
References
- Blondin, M. J. (2020). Controller Tuning Optimization Methods for Multi-Constraints and Nonlinear Systems: A Metaheuristic Approach. Switzerland: Springer Switzerland. Retrieved on 15th January 2021 from https://link.springer.com/book/10.1007%2F978-3-030-64541-0
- Venter, G. & Sobieski, J. (2002). Particle Swarm Optimization. 43rd AIAA: Structures, Structural Dynamics, and Materials Conference, Denver, Colorado