Optimizing Simulated Annealing Using C # or Python – Visual Studio Magazine
The data science lab
Optimizing Simulated Annealing Using C # or Python
Dr. James McCaffrey of Microsoft Research shows how to implement simulated annealing for the traveling salesman problem (finding the best order of a set of discrete elements).
The goal of a combinatorial optimization problem is to find the best order of a set of discrete elements. A classic combinatorial optimization challenge is the traveling salesperson problem (TSP). The goal of TSP is to find the order in which to visit a set of cities so that the total distance traveled is minimized.
One of the oldest and simplest techniques for solving combinatorial optimization problems is called simulated annealing. This article shows how to implement simulated annealing for the traveling salesperson problem using C # or Python.
A good way to see where this article is going is to take a look at the screenshot of a demo program in Figure 1. The demo sets up a synthetic problem where there are 20 cities, labeled from 0 to 19. The distance between cities is designed so that the best route starts at city 0 and then visits each city in order. The total distance of the optimal route is 19.0.
The demo configures the simulated annealing parameters of max_iter = 2500, start_temperature = 10000.0 and alpha = 0.99. Simulated annealing is an iterative process and max_iter is the maximum number of processing loop executions. The start_temperature and alpha variables control how the annealing process explores possible solution paths.
The demo sets up a random initial estimate of the best route as [ 7, 11, 6 . . 17, 3 ]. After iterating 2500 times, the demo reports the best route found as [ 0, 1, 2, 3, 4, 5, 6, 7, 9, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ]. The total distance required to visit cities in that order is 21.5 and therefore the solution is close, but not as good as, the optimal solution (the order of cities 8 and 9 is reversed).
This article assumes you have intermediate or above knowledge of a C family programming language, preferably C # or Python, but does not assume that you know anything about simulated annealing. The full source code for the demo program is presented in this article, and the code is also available in the accompanying download file.
Understanding simulated annealing
Suppose you have a combinatorial optimization problem with only five elements, and where the best order / permutation is [B, A, D, C, E]. A primitive approach would be to start with a random permutation such that [C, A, E, B, D] then repeatedly trade pairs of randomly selected items until you find the best order. For example, if you exchange the items at  and , the new permutation is [A, C, E, B, D].
This approach works if there are only a few items in the permutation of the problem, but fails even for a moderate number of items. For the problem of the demo with n = 20 cities, there are 20! possible permutations = 20 * 19 * 18 *. . * 1 = 2,432,902,008 176,640,000. That’s a lot of permutations to look at.
Expressed as a pseudo-code, the simulated annealing is:
make a random initial guess set initial temperature loop many times swap two randomly selected elements of the guess compute error of proposed solution if proposed solution is better than curr solution then accept the proposed solution else if proposed solution is worse then accept the worse solution anyway with small probability else don't accept the proposed solution end-if reduce temperature slightly end-loop return best solution found
By swapping two randomly selected items from the current solution, an adjacent proposed solution is created. The adjacent proposed solution will be similar to the current guess, so the search process is not random. A good current solution will likely lead to a good new proposed solution.
By sometimes accepting a proposed solution that is worse than the current hypothesis, you avoid getting trapped in a good but not optimal solution. The probability of accepting a worse solution is given by:
accept_p = exp((err - adj_err) / curr_temperature)
where exp () is the mathematical function exp (), err is the error associated with the current estimate, adj_err is the error associated with the proposed adjacent solution, and curr_temperature is a value such as 1000.0. In simulated annealing, the temperature value starts with a high value, for example 10000.0, and then slowly decreases with each iteration. At the start of the algorithm, when the temperature is high, accept_p will be large (close to 1) and the algorithm will often go to a worse solution. This allows many permutations to be examined. Later in the algorithm, when the temperature is low, accept_p will be small (close to 0) and the algorithm will rarely go to a worse solution.
Calculating the probability of acceptance is surprisingly tricky. In the demo program, the acceptance probability is calculated when the adjacent error is greater (worse) than the current error, and therefore the numerator term (err – adj_err) is less than 0. When the value of the current temperature decreases, dividing by the the current temperature makes the fraction more negative, which is smaller. Taking exp () of smaller values ââresults in smaller values. In short, as the temperature value decreases, the probability of acceptance decreases. Here is an example :
err adj_err temp accept_p 50 60 10000 0.999000 50 60 1000 0.990050 50 60 100 0.904837 50 60 10 0.367879 50 60 1 0.000045
It is very easy to botch the calculation of the probability of acceptance. For example, if you have a combinatorial optimization problem where the goal is to maximize a value rather than minimize an error, you should reverse the order of the terms in the numerator.
The temperature reduction is controlled by the alpha value. Suppose temperature = 1000.0 and alpha = 0.90. After the first iteration of processing, the temperature becomes 1000.0 * 0.90 = 900.0. After the next iteration, the temperature becomes 900.0 * 0.90 = 810.0. After the third iteration, the temperature becomes 810.0 * 0.90 = 729.0. Etc. Smaller alpha values ââdecrease the temperature more quickly.
There are other ways to lower the temperature. Instead of using an alpha reduction factor, a common approach is to multiply a fixed starting temperature by the fraction of the remaining iterations. When there are many iterations left, the temperature will be high. When there are few iterations left, the temperature will be low.
The demonstration program
The complete Python code for the demo program, with some minor edits to save space, is shown in List 1. I draw using two spaces rather than the standard four spaces. The backslash character is used for line continuation to break up long statements.
List 1: The simulated annealing program for TSP Python
# tsp_annealing.py # traveling salesman problem # using classical simulated annealing # Python 3.7.6 (Anaconda3 2020.02) import numpy as np def total_dist(route): d = 0.0 # total distance between cities n = len(route) for i in range(n-1): if route[i] < route[i+1]: d += (route[i+1] - route[i]) * 1.0 else: d += (route[i] - route[i+1]) * 1.5 return d def error(route): n = len(route) d = total_dist(route) min_dist = n-1 return d - min_dist def adjacent(route, rnd): n = len(route) result = np.copy(route) i = rnd.randint(n); j = rnd.randint(n) tmp = result[i] result[i] = result[j]; result[j] = tmp return result def solve(n_cities, rnd, max_iter, start_temperature, alpha): # solve using simulated annealing curr_temperature = start_temperature soln = np.arange(n_cities, dtype=np.int64) rnd.shuffle(soln) print("Initial guess: ") print(soln) err = error(soln) iteration = 0 interval = (int)(max_iter / 10) while iteration < max_iter and err > 0.0: adj_route = adjacent(soln, rnd) adj_err = error(adj_route) if adj_err < err: # better route so accept soln = adj_route; err = adj_err else: # adjacent is worse accept_p = np.exp((err - adj_err) / curr_temperature) p = rnd.random() if p < accept_p: # accept anyway soln = adj_route; err = adj_err # else don't accept if iteration % interval == 0: print("iter = %6d | curr error = %7.2f | temperature = %10.4f " % (iteration, err, curr_temperature)) if curr_temperature < 0.00001: curr_temperature = 0.00001 else: curr_temperature *= alpha iteration += 1 return soln def main(): print("nBegin TSP simulated annealing demo ") num_cities = 20 print("nSetting num_cities = %d " % num_cities) print("nOptimal solution is 0, 1, 2, . . " + str(num_cities-1)) print("Optimal solution has total distance = %0.1f " % (num_cities-1)) rnd = np.random.RandomState(4) max_iter = 2500 start_temperature = 10000.0 alpha = 0.99 print("nSettings: ") print("max_iter = %d " % max_iter) print("start_temperature = %0.1f " % start_temperature) print("alpha = %0.2f " % alpha) print("nStarting solve() ") soln = solve(num_cities, rnd, max_iter, start_temperature, alpha) print("Finished solve() ") print("nBest solution found: ") print(soln) dist = total_dist(soln) print("nTotal distance = %0.1f " % dist) print("nEnd demo ") if __name__ == "__main__": main()
The demo program is structured so that all the control logic is in a main () function. The main () starts by configuring the parameters for the simulated annealing:
import numpy as np def main(): print("Begin TSP simulated annealing demo ") num_cities = 20 print("Setting num_cities = %d " % num_cities) print("Optimal solution is 0, 1, 2, . . " + str(num_cities-1)) print("Optimal solution has total distance = %0.1f " % (num_cities-1)) rnd = np.random.RandomState(4) max_iter = 2500 start_temperature = 10000.0 alpha = 0.99 . . .
The program uses a local RandomState object rather than the global numpy object. This makes it easier to reproduce the executions of the program. The starting value of 4 was used only because it gave representative results. Finding good values ââfor the max_iter, start_temperature, and alpha variables is a matter of trial and error. Simulated annealing is often very sensitive to these values ââwhich makes tuning difficult. The difficulty of setting parameters is the greatest weakness of simulated annealing.
Then the demo echoes the values ââof the parameters:
print("nSettings: ") print("max_iter = %d " % max_iter) print("start_temperature = %0.1f " % start_temperature) print("alpha = %0.2f " % alpha)
The demo passes the simulated annealing parameters to a solve () function defined by the program:
print("Starting solve() ") soln = solve(num_cities, rnd, max_iter, start_temperature, alpha) print("Finished solve() ")
The demo ends by displaying the result:
print("Best solution found: ") print(soln) dist = total_dist(soln) print("Total distance = %0.1f " % dist) print("End demo ") if __name__ == "__main__": main()
Calculating the distance traveled
The demo defines a function that calculates the total distance traveled for a specified route that visits all 20 cities:
def total_dist(route): d = 0.0 # total distance between cities n = len(route) for i in range(n-1): if route[i] < route[i+1]: d += (route[i+1] - route[i]) * 1.0 else: d += (route[i] - route[i+1]) * 1.5 return d
For any pair of departure cities (A, B), if A is less than B, the distance between the two cities is 1.0 * (B – A). For example, the distance between city 4 and city 11 is 7.0. If B is less than A, the distance between cities is 1.5 * (A – B). For example, the distance between city 8 and city 1 is 10.5.
The distance of a route is the sum of the distances between the towns in the route. For example, the sub-route [4, 11, 5, 6] at a distance 7.0 + 9.5 + 1.0 = 17.5. In a scenario without a demonstration, you would create a lookup table with actual distances.
If there are cities, the demonstration design produces a problem where the optimal solution is [0, 1, 2, . . (n-1)] and the optimal minimum distance is 1.0 * (n-1). Therefore, it is possible to define an error function:
def error(route): n = len(route) d = total_dist(route) min_dist = n-1 return d - min_dist
In a scenario without a demo, you will not know the optimal solution. In this situation, you can set the error of a route as the distance of the route. A large distance equals a large error because the goal is to minimize the total distance traveled.
Creating adjacent routes
The demo program defines a function that accepts a route and calculates an adjacent route:
def adjacent(route, rnd): n = len(route) result = np.copy(route) i = rnd.randint(n); j = rnd.randint(n) tmp = result[i] result[i] = result[j]; result[j] = tmp return result
The function selects two random indices, then swaps the values ââof those indices. For example, if n = 6 and route = [5, 1, 3, 2, 0, 4], and the two selected random indices are  and , then the adjacent route that is generated is [3, 1, 5, 2, 0, 4]. The adjacent () function accepts a random numpy object and uses it to generate random indices. It is possible to use the global random object numpy, but using a local random object makes the program more modular.
As implemented, the adjacent () function does not prevent random indices [i] and [j] to be equal. When this happens, the adjacent road will be the same as the entry road. This approach is simpler than trying to make sure that the randomly chosen clues are different. If you really want two different indices, you can use the numpy.random.choice () function with the replace parameter set to False.