/High-performance combinatorial optimization based on classical mechanics (via Qpute.com)

## High-performance combinatorial optimization based on classical mechanics (via Qpute.com)

### bSB and dSB algorithms

The Ising problem is to find a spin configuration minimizing the Ising energy, defined as

$EIsing=−12∑i=1N∑j=1NJi,jsisj−∑i=1Nhisi$

(1)where si denotes the ith spin taking 1 or −1, N is the number of spins, Ji,j is the coupling coefficient between the ith and jth spins (Ji,j = Jj,i and Ji,i = 0), and hi is the local field on the ith spin. Since introducing an ancillary spin reduces the Ising problem to the one without local fields (see section S1), here, we focus on the Ising problem with no local fields (hi = 0).

To solve the Ising problem, QbM uses quantum-mechanical nonlinear oscillators called Kerr-nonlinear parametric oscillators (KPOs), each of which can generate a Schrödinger cat state, i.e., a quantum superposition of two oscillating states, via a quantum-mechanical bifurcation (31). Such a KPO has recently been realized experimentally using superconducting circuits (45, 46). However, the realization of a large-scale QbM will require a long time. On the other hand, it was found that a classical-mechanical model corresponding to QbM, referred to as a classical bifurcation machine (CbM), also works as an Ising machine (31, 33). In this case, we can efficiently simulate such a classical machine using present digital computers, instead of building real machines. [This is not the case for QbM, because QbM can also be used as a universal quantum computer (47, 48), which is classically unsimulatable.] This simulation approach paves the way for large-scale Ising machines with all-to-all connectivity.

By modifying the equations of motion for CbM such that computational costs are reduced and the symplectic Euler method (49) is applicable, we obtain the following Hamiltonian equations of motion for aSB (30)

$x·i=∂HaSB∂yi=a0yi$

(2)

$y·i=−∂HaSB∂xi=−[xi2+a0−a(t)]xi+c0∑j=1NJi,jxj$

(3)

$HaSB=a02∑i=1Nyi2+VaSB$

(4)

$VaSB=∑i=1N(xi44+a0−a(t)2xi2)−c02∑i=1N∑j=1NJi,jxixj$

(5)where xi and yi are, respectively, the position and momentum of a particle corresponding to the ith spin, dots denote time derivatives, a(t) is a control parameter increased from zero, a0 and c0 are positive constants, and VaSB is the potential energy in aSB.

To qualitatively explain the operation principle of aSB, we show an example of the dynamics in aSB in Fig. 1 (A and B), where the ferromagnetic two-spin Ising problem (J1,2 = J2,1 = 1) with solutions s1 = s2 = ± 1 is solved as the simplest problem. All positions and momenta are randomly set around zero at the initial time. The initial potential has a single local minimum at the origin (top and middle of Fig. 1B), so the particles circulate around the origin (Fig. 1A and middle of Fig. 1B). When a(t) becomes sufficiently large, bifurcations occur, that is, multiple local minima of the potential appear. Then, the particles adiabatically follow one of the minima. Consequently, each xi bifurcates to a positive or negative value (Fig. 1A and bottom of Fig. 1B). Since two local minima corresponding to the two solutions have lower energies and appear earlier than the other two local minima, the particles successfully find one of the solutions. Last, the sign of xi, sgn(xi), gives the ith spin, si, for the solution of the Ising problem. It has been empirically found that aSB works well for much larger-scale and more complex problems (30).

This aSB relies on the fact that the second term in VaSB is approximately proportional to the Ising energy at the final time (30). In this approximation, analog errors arise from the use of continuous variables (positions) instead of discrete variables (spins). These analog errors in aSB may degrade solution accuracy and result in approximate solutions. Such analog errors in different dynamical approaches have also been discussed (38, 40).

To suppress analog errors, we introduce perfectly inelastic walls at xi = ± 1. That is, at each time, we replace xi with its sign, sgn(xi) = ± 1, and set yi = 0 if ∣xi∣ > 1. These walls force positions to be exactly equal to 1 or −1 when a(t) becomes sufficiently large. Moreover, we drop the fourth-order terms in VaSB, because the inelastic walls can play a role similar to the nonlinear potential walls. We thus obtain the following equations

$x·i=∂HbSB∂yi=a0yi$

(6)

$y·i=−∂HbSB∂xi=−[a0−a(t)]xi+c0∑j=1NJi,jxj$

(7)

$HbSB=a02∑i=1Nyi2+VbSB$

(8)

(9)This artificial dynamical system is the basis of the bSB. In bSB, we solve Eqs. 6 and 7 by the symplectic Euler method, where time is discretized with a time step Δt, together with the updating for the inelastic walls (see Methods for a detailed algorithm). (If we solve these equations by the standard Euler method, instead of the symplectic Euler method, then solution accuracy becomes lower. See section S2.)

Similar modification to the above walls has been proposed for SimCIM (39), but this algorithm is based on the gradient method, like the Hopfield-Tank model (42), and also uses stochastic processes. In contrast, bSB is based on a classical-mechanical system conserving energy except for the inelastic walls and adiabatic changes of energy and also uses deterministic processes except for initial value setting. As a result, the performance of bSB is quite different from that of SimCIM (see section S3 for the comparison between bSB and SimCIM).

In bSB, it is sufficient to increase a(t) to a0. Then, the final potential has only the second term related to the Ising energy. Consequently, the following condition is satisfied for all i at the final time (see section S4)

$ΔEi=2si∑j=1NJi,jsj≥0$

(10)where si = sgn (xi) is the sign of xi and ΔEi represents the change in the Ising energy for a flip of si. Note that Eq. 10 is a sufficient condition to show that the spin configuration is a local minimum of the Ising problem. Hence, solutions obtained by bSB are at least local minima in the Ising problem. In contrast, this is not necessarily guaranteed in aSB because of its nonlinear potential terms. (This means that solutions obtained by aSB can sometimes be improved by a naïve local search based on sequential spin flips.) This is another reason why bSB should achieve higher accuracy than aSB. Throughout this work, we linearly increase a(t) from 0 to a0 and set a0 to 1.

Here, we show an example of the bSB dynamics using the same two-spin problem as above. The initial potential has a single local minimum at the origin (top of Fig. 1D) and particles circulate around the origin (Fig. 1C and middle of Fig. 1D), as in aSB. In bSB, however, stable points suddenly jump from the origin to the walls at xi = ± 1, which prevents adiabatic evolution. Instead, particles move toward walls in a ballistic manner (Fig. 1C and bottom of Fig. 1D). This ballistic (nonadiabatic) behavior in bSB leads to fast convergence to a local minimum of VbSB and, consequently, to fast optimization.

For further improvement, we introduce another variant of SB by discretizing xj to sgn(xj) in the second term in Eq. 7

$x·i=∂HdSB∂yi=a0yi$

(11)

$y·i=−∂HdSB∂xi=−[a0−a(t)]xi+c0∑j=1NJi,jsgn(xj)$

(12)

$HdSB=a02∑i=1Nyi2+VdSB$

(13)

(14)The discrete version of bSB, namely, dSB (see Methods for a detailed algorithm), is expected to achieve higher accuracy than that of bSB, because analog errors are further suppressed by discretization. We found that this discretization is effective for bSB but not for other algorithms such as SimCIM and aSB (see sections S3 and S5).

Note that the singularity on the boundaries between positive and negative regions has been intentionally neglected. This leads to a violation of conservation of energy across boundaries and, hence, to escape from local minima over potential barriers, as shown below. In this sense, dSB goes beyond naïve algorithms based on classical-mechanical systems conserving energy (except adiabatic change), such as aSB and bSB. We also increase a(t) to a0 for the convergence to a local minimum of the Ising problem at the final time, as in bSB (see section S4).

Figure 1 (E and F) shows an example of the dSB dynamics using the same two-spin problem as above. Unlike aSB and bSB, the particles go back and forth between two local minima through the potential barriers (Fig. 1E and middle of Fig. 1F). This is similar to quantum tunneling, as depicted by the inset in Fig. 1I. This tunneling-like behavior is possible due to the above-mentioned neglect of the singularity on the boundaries; otherwise, the potential walls on the boundaries prevent this tunneling. In contrast, conservation of energy prevents such tunneling in aSB and bSB, as suggested by the insets in Fig. 1 (G and H). Thus, it is expected that this tunneling-like behavior will help dSB to escape local minima of the potential, and hence, dSB will outperform aSB and bSB in terms of solution accuracy.

Note that both bSB and dSB maintain the advantage of aSB over SA, namely, high parallelizability. Therefore, they are expected to realize both high speed and high accuracy simultaneously.

### Performance for a 2000-spin Ising problem with all-to-all connectivity

To compare the performance of bSB and dSB with that of aSB, we solved a 2000-spin Ising problem with all-to-all connectivity. This problem was named K2000 and previously solved by aSB (30), a CIM (11), and a recently developed digital chip called STATICA (27), which is based on the above-mentioned SCA. This problem can be regarded as a 2000-node MAX-CUT problem (11, 30); so, here, we evaluate performance using cut values (see section S6 for the definition of the cut value and the relation between MAX-CUT and the Ising problem). The best cut value for K2000 is estimated to be 33,337 (see section S7).

The lines and symbols in Fig. 2A show average and maximum cut values, respectively, for 1000 trials as functions of the number of time steps, Nstep. (Throughout the paper, Nstep denotes the total number of time steps for each trial and the values of cost functions (cut values or Ising energies) as functions of Nstep are final values, not intermediate values, in each trail.) The results clearly show that both bSB and dSB outperform aSB in terms of both speed and accuracy. In addition, only dSB obtained the best value. On the other hand, best values obtained by bSB and aSB become lower for larger Nstep. This result suggests that the best values may be obtained accidentally by nonadiabatic processes in bSB and aSB. For large Nstep, dynamics becomes more adiabatic and the chance to obtain better solutions by nonadiabatic processes may be lost.

We implemented 2048-spin-size bSB and dSB machines (bSBM and dSBM) using single FPGAs (see section S8 for details) and solved K2000 by them. Figure 2B shows the comparison between our machines and the above three other machines (11, 27, 30), where the lines and the crosses show the average values of our machines and the others, respectively, for 100 trials, and the bars show the maximum and minimum values among the 100 trials. (The bars for our machines are shown at only typical computation times.) Only our dSBM obtained the best value in a short time (2 ms), thereby simultaneously realizing both high speed and high accuracy. Also, our bSBM is remarkably fast, about three times faster than STATICA (27), the previously fastest machine for K2000. Note that the results by STATICA for K2000 are predicted values by a simulator, and a real STATICA chip is still 512-spin size (27). On the other hand, in this work, we have implemented faster real machines.

### Benchmarking using time-to-solution and time-to-target

To evaluate the computation speed more quantitatively, here, we introduce two metrics: time-to-solution (TTS) and time-to-target (TTT). TTS is a standard metric for evaluating Ising machine speeds (14, 23, 28, 29), defined as the computation time for finding an optimal or best known value with 99% probability. TTT uses a target value, instead of an optimal value, as a good approximate solution. In this work, we define the target as 99% of the optimal or best known value. TTS and TTT are formulated as Tcom log (1 − 0.99)/ log (1 − PS) (14, 23), where Tcom is the computation time per trial and PS is the success probability for finding the optimal (TTS) or target (TTT) value. PS is estimated from experimental results with many trials. When PS > 0.99, TTS and TTT are defined as Tcom.

In the following, we compare TTS and TTT of our 2048-spin-size bSBM and dSBM with those of other recently developed machines shown in Fig. 3A. Since the bSBM can quickly find good approximate solutions and the dSBM can find optimal solutions of large-scale problems, we use the bSBM and dSBM for evaluations of TTT and TTS, respectively. TTS and TTT of other machines are cited or estimated from the data in the literature (see section S9 for details), because we could not use such machines for the present work. This limits the range of instances that can be used for this benchmarking. Also note that some machines are not the latest ones, as mentioned below.

Figure 3B shows the results of TTT. For K2000, the TTT of our bSBM (0.26 ms) is much shorter than those of STATICA (27) (1.50 ms, a predicted value by a simulator) and the CIM (11) (1.1 s). As a 2000-spin-size instance with sparse connectivity, we also solved G22, which is one of the well-known MAX-CUT benchmark instances called G-set and was solved by the CIM (11). For G22, the TTT of our bSBM is two orders of magnitude shorter than that of the CIM. These results demonstrate that our bSBM can find good approximate solutions faster than other recently developed machines of the same spin size. (TTTs of our machines for other G-set instances are provided in table S2.)

Next, we show the results of TTS in Fig. 3C. We start with the same two instances, namely, K2000 and G22. The TTS of our dSBM for them are 1.3 and 2.7 s, respectively. While TTS for K2000 has not been reported so far, TTS for G22 was evaluated with a SimCIM implemented on a FPGA (29), which is based on a recently proposed algorithm called chaotic amplitude control (CAC) (29, 40). The TTS of the SimCIM is estimated at 12 s (see section S9). Thus, our dSBM has achieved a shorter TTS for G22 than that of the state-of-the-art machine. (TTSs of our machines for other G-set instances and the comparison between them and those of the SimCIM are provided in table S2 and fig. S6, respectively.)

We also solved other various instances of the Ising problem (MAX-CUT) by dSBM to compare it with other machines shown in Fig. 3A. For small-scale problems, we can simultaneously perform multiple trials using the 2048-spin-size machine by a block-diagonal structure of the J matrix, as done using a CIM (14). This so-called batch processing improves the success probability PS by selecting the best result among multiple trials, while Tcom is defined as the computation time per batch. In the limit as the number of trials per batch Nbatch goes to infinity, PS may exceed 0.99, and then TTS and TTT become Tcom from the above definitions. In this sense, the TTS and TTT are well defined even for batch processing.

As shown in Fig. 3C, for two small-scale problems with 60 spins (all-to-all connectivity) and 200 spins (sparse connectivity), our dSBM achieved much shorter TTSs than those of a QA and a CIM (14). (This QA is not the latest version.) For 100-spin and 700-spin problems with all-to-all connectivity, the TTSs of the dSBM are much shorter than those of the SimCIM with CAC (29). These short TTSs of dSBM compared to the SimCIM come not from computation speed or implementations but the algorithmic advantage of dSB over CAC. That is, dSB needs fewer matrix-vector multiplications (MVMs), which is the most computation-intensive part in both algorithms, to obtain solutions than does the SimCIM with CAC. [The numbers of MVMs to solutions of the 100-spin and 700-spin problems are 9.4 × 10 and 8.1 × 104, respectively, for dSB but 5.6 × 103 and 7.8 × 105, respectively, for CAC (29).] For two 1024-spin problems (all-to-all connectivity) with different ranges of Ji,j, the dSBM achieved shorter TTSs than those of a Digital Annealer (DA) (23), which is based on an FPGA implementation of “Digital Annealer’s algorithm” developed from SA and outperformed CPU implementations of SA (25) and parallel tempering (50). (This DA is not the latest version.) Last, for 60-spin and 100-spin problems with all-to-all connectivity, the TTSs of the dSBM are comparable to those of another state-of-the-art machine based on an FPGA implementation of the above-mentioned RBM’s stochastic sampling (28), the size of which, however, is limited to 200 spins. Overall, we conclude that our bSBM and dSBM have achieved remarkably high performance for the present benchmark instances compared to the recently developed machines.

This is a syndicated post. Read the original post at Source link .