# Optimizing Simulated Annealing Using C # or Python – Visual Studio Magazine

The data science lab

### Optimizing Simulated Annealing Using C # or Python

Dr. James McCaffrey of Microsoft Research shows how to implement simulated annealing for the traveling salesman problem (finding the best order of a set of discrete elements).

The goal of a combinatorial optimization problem is to find the best order of a set of discrete elements. A classic combinatorial optimization challenge is the traveling salesperson problem (TSP). The goal of TSP is to find the order in which to visit a set of cities so that the total distance traveled is minimized.

One of the oldest and simplest techniques for solving combinatorial optimization problems is called simulated annealing. This article shows how to implement simulated annealing for the traveling salesperson problem using C # or Python.

A good way to see where this article is going is to take a look at the screenshot of a demo program in Figure 1. The demo sets up a synthetic problem where there are 20 cities, labeled from 0 to 19. The distance between cities is designed so that the best route starts at city 0 and then visits each city in order. The total distance of the optimal route is 19.0.

The demo configures the simulated annealing parameters of max_iter = 2500, start_temperature = 10000.0 and alpha = 0.99. Simulated annealing is an iterative process and max_iter is the maximum number of processing loop executions. The start_temperature and alpha variables control how the annealing process explores possible solution paths.

The demo sets up a random initial estimate of the best route as [ 7, 11, 6 . . 17, 3 ]. After iterating 2500 times, the demo reports the best route found as [ 0, 1, 2, 3, 4, 5, 6, 7, 9, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ]. The total distance required to visit cities in that order is 21.5 and therefore the solution is close, but not as good as, the optimal solution (the order of cities 8 and 9 is reversed). [Click on image for larger view.] Figure 1: Simulated annealing to solve the traveling salesman problem.

This article assumes you have intermediate or above knowledge of a C family programming language, preferably C # or Python, but does not assume that you know anything about simulated annealing. The full source code for the demo program is presented in this article, and the code is also available in the accompanying download file.

Understanding simulated annealing
Suppose you have a combinatorial optimization problem with only five elements, and where the best order / permutation is [B, A, D, C, E]. A primitive approach would be to start with a random permutation such that [C, A, E, B, D] then repeatedly trade pairs of randomly selected items until you find the best order. For example, if you exchange the items at  and , the new permutation is [A, C, E, B, D].

This approach works if there are only a few items in the permutation of the problem, but fails even for a moderate number of items. For the problem of the demo with n = 20 cities, there are 20! possible permutations = 20 * 19 * 18 *. . * 1 = 2,432,902,008 176,640,000. That’s a lot of permutations to look at.

Expressed as a pseudo-code, the simulated annealing is:

```make a random initial guess
set initial temperature
loop many times
swap two randomly selected elements of the guess
compute error of proposed solution
if proposed solution is better than curr solution then
accept the proposed solution
else if proposed solution is worse then
accept the worse solution anyway with small probability
else
don't accept the proposed solution
end-if
reduce temperature slightly
end-loop
return best solution found
```

By swapping two randomly selected items from the current solution, an adjacent proposed solution is created. The adjacent proposed solution will be similar to the current guess, so the search process is not random. A good current solution will likely lead to a good new proposed solution.

By sometimes accepting a proposed solution that is worse than the current hypothesis, you avoid getting trapped in a good but not optimal solution. The probability of accepting a worse solution is given by:

```accept_p = exp((err - adj_err) / curr_temperature)
```

where exp () is the mathematical function exp (), err is the error associated with the current estimate, adj_err is the error associated with the proposed adjacent solution, and curr_temperature is a value such as 1000.0. In simulated annealing, the temperature value starts with a high value, for example 10000.0, and then slowly decreases with each iteration. At the start of the algorithm, when the temperature is high, accept_p will be large (close to 1) and the algorithm will often go to a worse solution. This allows many permutations to be examined. Later in the algorithm, when the temperature is low, accept_p will be small (close to 0) and the algorithm will rarely go to a worse solution.

Calculating the probability of acceptance is surprisingly tricky. In the demo program, the acceptance probability is calculated when the adjacent error is greater (worse) than the current error, and therefore the numerator term (err – adj_err) is less than 0. When the value of the current temperature decreases, dividing by the the current temperature makes the fraction more negative, which is smaller. Taking exp () of smaller values â€‹â€‹results in smaller values. In short, as the temperature value decreases, the probability of acceptance decreases. Here is an example :

```err   adj_err  temp     accept_p
50    60       10000    0.999000
50    60        1000    0.990050
50    60         100    0.904837
50    60          10    0.367879
50    60           1    0.000045```

It is very easy to botch the calculation of the probability of acceptance. For example, if you have a combinatorial optimization problem where the goal is to maximize a value rather than minimize an error, you should reverse the order of the terms in the numerator.

The temperature reduction is controlled by the alpha value. Suppose temperature = 1000.0 and alpha = 0.90. After the first iteration of processing, the temperature becomes 1000.0 * 0.90 = 900.0. After the next iteration, the temperature becomes 900.0 * 0.90 = 810.0. After the third iteration, the temperature becomes 810.0 * 0.90 = 729.0. Etc. Smaller alpha values â€‹â€‹decrease the temperature more quickly.

There are other ways to lower the temperature. Instead of using an alpha reduction factor, a common approach is to multiply a fixed starting temperature by the fraction of the remaining iterations. When there are many iterations left, the temperature will be high. When there are few iterations left, the temperature will be low.

The demonstration program
The complete Python code for the demo program, with some minor edits to save space, is shown in List 1. I draw using two spaces rather than the standard four spaces. The backslash character is used for line continuation to break up long statements.

List 1: The simulated annealing program for TSP Python

```# tsp_annealing.py
# traveling salesman problem
# using classical simulated annealing
# Python 3.7.6 (Anaconda3 2020.02)

import numpy as np

def total_dist(route):
d = 0.0  # total distance between cities
n = len(route)
for i in range(n-1):
if route[i] < route[i+1]:
d += (route[i+1] - route[i]) * 1.0
else:
d += (route[i] - route[i+1]) * 1.5
return d

def error(route):
n = len(route)
d = total_dist(route)
min_dist = n-1
return d - min_dist

n = len(route)
result = np.copy(route)
i = rnd.randint(n); j = rnd.randint(n)
tmp = result[i]
result[i] = result[j]; result[j] = tmp
return result

def solve(n_cities, rnd, max_iter,
start_temperature, alpha):
# solve using simulated annealing
curr_temperature = start_temperature
soln = np.arange(n_cities, dtype=np.int64)
rnd.shuffle(soln)
print("Initial guess: ")
print(soln)

err = error(soln)
iteration = 0
interval = (int)(max_iter / 10)
while iteration < max_iter and err > 0.0:

if adj_err < err:  # better route so accept
accept_p =
p = rnd.random()
if p < accept_p:  # accept anyway
# else don't accept

if iteration % interval == 0:
print("iter = %6d | curr error = %7.2f |
temperature = %10.4f " %
(iteration, err, curr_temperature))

if curr_temperature < 0.00001:
curr_temperature = 0.00001
else:
curr_temperature *= alpha
iteration += 1

return soln

def main():
print("nBegin TSP simulated annealing demo ")

num_cities = 20
print("nSetting num_cities = %d " % num_cities)
print("nOptimal solution is 0, 1, 2, . . " +
str(num_cities-1))
print("Optimal solution has total distance = %0.1f "
% (num_cities-1))
rnd = np.random.RandomState(4)
max_iter = 2500
start_temperature = 10000.0
alpha = 0.99

print("nSettings: ")
print("max_iter = %d " % max_iter)
print("start_temperature = %0.1f "
% start_temperature)
print("alpha = %0.2f " % alpha)

print("nStarting solve() ")
soln = solve(num_cities, rnd, max_iter,
start_temperature, alpha)
print("Finished solve() ")

print("nBest solution found: ")
print(soln)
dist = total_dist(soln)
print("nTotal distance = %0.1f " % dist)

print("nEnd demo ")

if __name__ == "__main__":
main()```

The demo program is structured so that all the control logic is in a main () function. The main () starts by configuring the parameters for the simulated annealing:

```import numpy as np
def main():
print("Begin TSP simulated annealing demo ")
num_cities = 20
print("Setting num_cities = %d " % num_cities)
print("Optimal solution is 0, 1, 2, . . " +
str(num_cities-1))
print("Optimal solution has total distance = %0.1f "
% (num_cities-1))
rnd = np.random.RandomState(4)
max_iter = 2500
start_temperature = 10000.0
alpha = 0.99
. . .```

The program uses a local RandomState object rather than the global numpy object. This makes it easier to reproduce the executions of the program. The starting value of 4 was used only because it gave representative results. Finding good values â€‹â€‹for the max_iter, start_temperature, and alpha variables is a matter of trial and error. Simulated annealing is often very sensitive to these values â€‹â€‹which makes tuning difficult. The difficulty of setting parameters is the greatest weakness of simulated annealing.

Then the demo echoes the values â€‹â€‹of the parameters:

```  print("nSettings: ")
print("max_iter = %d " % max_iter)
print("start_temperature = %0.1f " % start_temperature)
print("alpha = %0.2f " % alpha)```

The demo passes the simulated annealing parameters to a solve () function defined by the program:

```  print("Starting solve() ")
soln = solve(num_cities, rnd, max_iter,
start_temperature, alpha)
print("Finished solve() ")```

The demo ends by displaying the result:

```  print("Best solution found: ")
print(soln)
dist = total_dist(soln)
print("Total distance = %0.1f " % dist)
print("End demo ")
if __name__ == "__main__":
main()```

Calculating the distance traveled
The demo defines a function that calculates the total distance traveled for a specified route that visits all 20 cities:

```def total_dist(route):
d = 0.0  # total distance between cities
n = len(route)
for i in range(n-1):
if route[i] < route[i+1]:
d += (route[i+1] - route[i]) * 1.0
else:
d += (route[i] - route[i+1]) * 1.5
return d```

For any pair of departure cities (A, B), if A is less than B, the distance between the two cities is 1.0 * (B – A). For example, the distance between city 4 and city 11 is 7.0. If B is less than A, the distance between cities is 1.5 * (A – B). For example, the distance between city 8 and city 1 is 10.5.

The distance of a route is the sum of the distances between the towns in the route. For example, the sub-route [4, 11, 5, 6] at a distance 7.0 + 9.5 + 1.0 = 17.5. In a scenario without a demonstration, you would create a lookup table with actual distances.

If there are cities, the demonstration design produces a problem where the optimal solution is [0, 1, 2, . . (n-1)] and the optimal minimum distance is 1.0 * (n-1). Therefore, it is possible to define an error function:

```def error(route):
n = len(route)
d = total_dist(route)
min_dist = n-1
return d - min_dist```

In a scenario without a demo, you will not know the optimal solution. In this situation, you can set the error of a route as the distance of the route. A large distance equals a large error because the goal is to minimize the total distance traveled.

The demo program defines a function that accepts a route and calculates an adjacent route:

```def adjacent(route, rnd):
n = len(route)
result = np.copy(route)
i = rnd.randint(n); j = rnd.randint(n)
tmp = result[i]
result[i] = result[j]; result[j] = tmp
return result```

The function selects two random indices, then swaps the values â€‹â€‹of those indices. For example, if n = 6 and route = [5, 1, 3, 2, 0, 4], and the two selected random indices are  and , then the adjacent route that is generated is [3, 1, 5, 2, 0, 4]. The adjacent () function accepts a random numpy object and uses it to generate random indices. It is possible to use the global random object numpy, but using a local random object makes the program more modular.

As implemented, the adjacent () function does not prevent random indices [i] and [j] to be equal. When this happens, the adjacent road will be the same as the entry road. This approach is simpler than trying to make sure that the randomly chosen clues are different. If you really want two different indices, you can use the numpy.random.choice () function with the replace parameter set to False.