**Morten Hjorth-Jensen**[1, 2]

**Department of Physics and Astronomy and FRIB/NSCL Laboratory, Michigan State University, USA**

**Department of Physics and Center for Computing in Science Education, University of Oslo, Norway**

#### Xth Tastes of Nuclear Physics, November 30-December 3, 2020.

## What is this talk about?

The main aim is to give you a short and pedestrian introduction to how we can use Machine Learning methodsto solve quantum mechanical many-body problems and how we can use such techniques in nuclear experiments. And why this could be of interest.

The hope is that after this talk you have gotten the basic ideas to get you started. Peeping into `https://github.com/mhjensenseminars/MachineLearningTalk`, you'll find a Jupyter notebook, slides, codes etc that will allow you to reproduce the simulations discussed here, and perhaps run your own very first calculations.

These slides at https://mhjensenseminars.github.io/MachineLearningTalk/doc/pub/WorkshopTastesNP/html/WorkshopTastesNP-reveal.html.

## More material

More in depth notebooks and lecture notes are at

- Making a professional Monte Carlo code for quantum mechanical simulations
`https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/pub/notebook1/ipynb/notebook1.ipynb` - From Variational Monte Carlo to Boltzmann Machines
`https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/pub/notebook2/ipynb/notebook2.ipynb` - Nuclear Talent course on Machine Learning in Nuclear Experiment and Theory, June 22 - July 3, 2020
- Machine Learning course

Feel free to try them out and please don't hesitate to ask if something is unclear.

## Why? Basic motivation

How can we avoid the dimensionality curse? Many possibilities

- smarter basis functions
- resummation of specific correlations
- stochastic sampling of high-lying states (stochastic FCI, CC and SRG/IMSRG)
- many more

Machine Learning and Quantum Computing hold also great promise in tackling the ever increasing dimensionalities. The hot field is now **Quantum Machine Learning**. Here we will focus on Machine Learning.

## Overview

- Short intro to Machine Learning
- Variational Monte Carlo (Markov Chain Monte Carlo, \( \mathrm{MC}^2 \)) and many-body problems, solving quantum mechanical problems in a stochastic way. It will serve as our motivation for switching to Machine Learning.
- From Variational Monte Carlo to Boltzmann Machines and Deep Learning
- Machine Learning and Experiment

## Machine Learning and Nuclear Physics

Machine learning is an extremely rich field, in spite of its young age. Theincreases we have seen during the last three decades in computationalcapabilities have been followed by developments of methods andtechniques for analyzing and handling large date sets, relying heavilyon statistics, computer science and mathematics. The field is rathernew and developing rapidly.

Popular software packages written in Python for ML are

and more. These are all freely available at their respective GitHub sites. They encompass communities of developers in the thousands or more. And the numberof code developers and contributors keeps increasing.

## Lots of room for creativity

Not all thealgorithms and methods can be given a rigorous mathematicaljustification, opening up thereby for experimentingand trial and error and thereby exciting new developments.

A solid command of linear algebra, multivariate theory, probability theory, statistical data analysis, optimization algorithms, understanding errors and Monte Carlo methods is important in order to understand many of the various algorithms and methods.

**Job market, a personal statement**: A familiarity with ML is almost becoming a prerequisite for many of the most exciting employment opportunities. And add quantum computing and there you are!

## Types of Machine Learning

The approaches to machine learning are many, but are often split into two main categories. In *supervised learning* we know the answer to a problem,and let the computer deduce the logic behind it. On the other hand, *unsupervised learning*is a method for finding patterns and relationship in data sets without any prior knowledge of the system.Some authours also operate with a third category, namely *reinforcement learning*. This is a paradigm of learning inspired by behavioural psychology, where learning is achieved by trial-and-error, solely from rewards and punishment.

Another way to categorize machine learning tasks is to consider the desired output of a system.Some of the most common tasks are:

- Classification: Outputs are divided into two or more classes. The goal is to produce a model that assigns inputs into one of these classes. An example is to identify digits based on pictures of hand-written ones. Classification is typically supervised learning.
- Regression: Finding a functional relationship between an input data set and a reference data set. The goal is to construct a function that maps input data to continuous output values.
- Clustering: Data are divided into groups with certain common traits, without knowing the different groups beforehand. It is thus a form of unsupervised learning.

## A simple perspective on the interface between ML and Physics

## ML in Nuclear Physics, Examples

The large amount of degrees of freedom pertain to both theory and experiment in nuclear physics. With increasingly complicated experiments that produce large amounts data, automated classification of events becomes increasingly important. Here, deep learning methods offer a plethora of interesting research avenues.

- Reconstruction of particle trajectories or classification of events are typical examples where ML methods are being used. However, since these data can often be extremely noisy, the precision necessary for discovery in physics requires algorithmic improvements. Research along such directions, interfacing nuclear physics with AI/ML is expected to play a significant role in physics discoveries related to new facilities. The treatment of corrupted data in imaging and image processing is also a relevant topic.
- Design of detectors represents an important area of applications for ML/AI methods in nuclear physics.
- Many of the above classification problems have also have direct application in theoretical nuclear physics (including Lattice QCD calculations).

## More examples

- An important application of AI/L methods is to improve the estimation of bias or uncertainty due to the introduction of or lack of physical constraints in various theoretical models.
- In theory, we expect to use AI/ML algorithms and methods to improve our knowledged about correlations of physical model parameters in data for quantum many-body systems. Deep learning methods like Boltzmann machines and various types of Recurrent Neural networks show great promise in circumventing the exploding dimensionalities encountered in quantum mechanical many-body studies.
- Merging a frequentist approach (the standard path in ML theory) with a Bayesian approach, has the potential to infer better probabilitity distributions and error estimates. As an example, methods for fast Monte-Carlo- based Bayesian computation of nuclear density functionals show great promise in providing a better understanding
- Machine Learning and Quantum Computing is a very interesting avenue to explore.

## Selected References

- Mehta et al. and Physics Reports (2019).
- Machine Learning and the Physical Sciences by Carleo et al
- Ab initio solution of the many-electron Schrödinger equation with deep neural networks by Pfau et al..
- Machine Learning and the Deuteron by Kebble and Rios
- Variational Monte Carlo calculations of \( A\le 4 \) nuclei with an artificial neural-network correlator ansatz by Adams et al.
- Unsupervised Learning for Identifying Events in Active Target Experiments by Solli et al.
- Report from the A.I. For Nuclear Physics Workshop by Bedaque et al.

## What are the basic ingredients?

Almost every problem in ML and data science starts with the same ingredients:

- The dataset \( \mathbf{x} \) (could be some observable quantity of the system we are studying)
- A model which is a function of a set of parameters \( \mathbf{\alpha} \) that relates to the dataset, say a likelihood function \( p(\mathbf{x}\vert \mathbf{\alpha}) \) or just a simple model \( f(\mathbf{\alpha}) \)
- A so-called
**loss/cost/risk**function \( \mathcal{C} (\mathbf{x}, f(\mathbf{\alpha})) \) which allows us to decide how well our model represents the dataset.

We seek to minimize the function \( \mathcal{C} (\mathbf{x}, f(\mathbf{\alpha})) \) by finding the parameter values which minimize \( \mathcal{C} \). This leads to various minimization algorithms. It may surprise many, but at the heart of all machine learning algortihms there is an optimization problem.

## Neural network types

An artificial neural network (NN), is a computational model that consists of layers of connected neurons, or *nodes*. It is supposed to mimic a biological nervous system by letting each neuron interact with other neuronsby sending signals in the form of mathematical functions between layers. A wide variety of different NNs havebeen developed, but most of them consist of an input layer, an output layer and eventual layers in-between, called*hidden layers*. All layers can contain an arbitrary number of nodes, and each connection between two nodesis associated with a weight variable.

## Nuclear Physics Experiments Argon-46

Two- and three-dimensional representations of two events from theArgon-46 experiment. Each row is one event in two projections,where the color intensity of each point indicates higher charge valuesrecorded by the detector. The bottom row illustrates a carbon event witha large fraction of noise, while the top row shows a proton eventalmost free of noise. See Unsupervised Learning for Identifying Events in Active Target Experiments by Solli et al. for more detials.

## Why Machine Learning?

The traditional Monte Carlo event selection process does not have awell-defined method to quantify the effectiveness of the eventselection.

In addition, the selection task normally produces a binary result only, eithera **good** or **bad** fit to the event of interest. A **bad**fit is then assumed to be a different event type, and is removed fromthe analysis.

In a broader perspective, anunsupervised classification algorithm would offer the possibility to*discover* rare events which may not be expected or areoverlooked. These events would likely be filtered out using thetraditional methods. From a practical point of view, compared tosupervised learning, it also avoids the necessary labeling task of thelearning set events, which is error prone and time consuming.

## Why Machine Learning for Experimental Analysis?

The \( \chi^2 \) approach used in the traditional analysis performed onthe $^{46}$Ar data is extremely expensive from a computational standbecause it involves the simulation of thousands of tracks for eachrecorded event.

These events are in turn simulated for each iteration of the MonteCarlo fitting sequence. Even though the reaction of interest in theabove experiment had the largest cross section (elastic scattering),the time spent on Monte Carlo fitting of *all* of the eventsproduced in the experiment was the largest computational bottleneck inthe analysis. In the case of an experiment where the reaction ofinterest would represent less than a few percent of the total crosssection, this procedure would become highly inefficient andprohibitive. Adding to this the large amount of data produced in thisexperiment (with even larger data sets expected in futureexperiments), the analysis simply begs for more efficient analysistools.

## More arguments

The computationally expensive fitting procedurewould be applied to every event, instead of the few percent of theevents that are of interest for the analysis. An unsupervised MLalgorithm able to separate the data without *a priori* knowledgeof the different types of events increases the efficiency of theanalysis tremendously, and allows the downstream analysis toconcentrate on the fitting efforts only on events of interest. Inaddition, the clustering allows for more exploration of the data,potentially enabling new discovery of unexpected reaction types.

## The first theoretical system: electrons in a harmonic oscillator trap in two dimensions

The Hamiltonian of the quantum dot is given by

$$ \hat{H} = \hat{H}_0 + \hat{V}, $$

where \( \hat{H}_0 \) is the many-body HO Hamiltonian, and \( \hat{V} \) is theinter-electron Coulomb interactions. In dimensionless units,

$$ \hat{V}= \sum_{i < j}^N \frac{1}{r_{ij}},$$

with \( r_{ij}=\sqrt{\mathbf{r}_i^2 - \mathbf{r}_j^2} \).

This leads to the separable Hamiltonian, with the relative motion part given by (\( r_{ij}=r \))

$$ \hat{H}_r=-\nabla^2_r + \frac{1}{4}\omega^2r^2+ \frac{1}{r},$$

plus a standard Harmonic Oscillator problem for the center-of-mass motion.This system has analytical solutions in two and three dimensions (M. Taut 1993 and 1994).

## Quantum Monte Carlo Motivation

Given a hamiltonian \( H \) and a trial wave function \( \Psi_T \), the variational principle states that the expectation value of \( \langle H \rangle \), defined through

$$ \langle E \rangle = \frac{\int d\boldsymbol{R}\Psi^{\ast}_T(\boldsymbol{R})H(\boldsymbol{R})\Psi_T(\boldsymbol{R})} {\int d\boldsymbol{R}\Psi^{\ast}_T(\boldsymbol{R})\Psi_T(\boldsymbol{R})},$$

is an upper bound to the ground state energy \( E_0 \) of the hamiltonian \( H \), that is

$$ E_0 \le \langle E \rangle.$$

In general, the integrals involved in the calculation of various expectation values are multi-dimensional ones. Traditional integration methods such as the Gauss-Legendre will not be adequate for say the computation of the energy of a many-body system.

## Quantum Monte Carlo Motivation

**Basic steps.**

Choose a trial wave function\( \psi_T(\boldsymbol{R}) \).

$$ P(\boldsymbol{R},\boldsymbol{\alpha})= \frac{\left|\psi_T(\boldsymbol{R},\boldsymbol{\alpha})\right|^2}{\int \left|\psi_T(\boldsymbol{R},\boldsymbol{\alpha})\right|^2d\boldsymbol{R}}.$$

This is our model, or likelihood/probability distribution function (PDF). It depends on some variational parameters \( \boldsymbol{\alpha} \).The approximation to the expectation value of the Hamiltonian is now

$$ \langle E[\boldsymbol{\alpha}] \rangle = \frac{\int d\boldsymbol{R}\Psi^{\ast}_T(\boldsymbol{R},\boldsymbol{\alpha})H(\boldsymbol{R})\Psi_T(\boldsymbol{R},\boldsymbol{\alpha})} {\int d\boldsymbol{R}\Psi^{\ast}_T(\boldsymbol{R},\boldsymbol{\alpha})\Psi_T(\boldsymbol{R},\boldsymbol{\alpha})}.$$

## Quantum Monte Carlo Motivation

**Define a new quantity.**

$$ E_L(\boldsymbol{R},\boldsymbol{\alpha})=\frac{1}{\psi_T(\boldsymbol{R},\boldsymbol{\alpha})}H\psi_T(\boldsymbol{R},\boldsymbol{\alpha}),$$

called the local energy, which, together with our trial PDF yields

$$ \langle E[\boldsymbol{\alpha}] \rangle=\int P(\boldsymbol{R})E_L(\boldsymbol{R},\boldsymbol{\alpha}) d\boldsymbol{R}\approx \frac{1}{N}\sum_{i=1}^NE_L(\boldsymbol{R_i},\boldsymbol{\alpha})$$

with \( N \) being the number of Monte Carlo samples.

## Energy derivatives

To find the derivatives of the local energy expectation value as function of the variational parameters, we can use the chain rule and the hermiticity of the Hamiltonian.

Let us define (with the notation \( \langle E[\boldsymbol{\alpha}]\rangle =\langle E_L\rangle \))

$$\bar{E}_{\alpha_i}=\frac{d\langle E_L\rangle}{d\alpha_i},$$

as the derivative of the energy with respect to the variational parameter \( \alpha_i \)We define also the derivative of the trial function (skipping the subindex \( T \)) as

$$\bar{\Psi}_{i}=\frac{d\Psi}{d\alpha_i}.$$

## Derivatives of the local energy

The elements of the gradient of the local energy are then (using the chain rule and the hermiticity of the Hamiltonian)

$$\bar{E}_{i}= 2\left( \langle \frac{\bar{\Psi}_{i}}{\Psi}E_L\rangle -\langle \frac{\bar{\Psi}_{i}}{\Psi}\rangle\langle E_L \rangle\right).$$

From a computational point of view it means that you need to compute the expectation values of

$$\langle \frac{\bar{\Psi}_{i}}{\Psi}E_L\rangle,$$

and

$$\langle \frac{\bar{\Psi}_{i}}{\Psi}\rangle\langle E_L\rangle$$

These integrals are evaluted using MC intergration (with all itspossible error sources). We can then use methods like stochasticgradient or other minimization methods to find the optimal variationalparameters (I don't discuss this topic here, but these methods arevery important in ML).

## Why Boltzmann machines?

What is known as restricted Boltzmann Machines (RMB) have received alot of attention lately. One of the major reasons is that they can bestacked layer-wise to build deep neural networks that capturecomplicated statistics.

The original RBMs had just one visible layer and a hidden layer, butrecently so-called Gaussian-binary RBMs have gained quite somepopularity in imaging since they are capable of modeling continuousdata that are common to natural images.

Furthermore, they have been used to solve complicated quantummechanical many-particle problems or classical statistical physicsproblems like the Ising and Potts classes of models.

## A standard BM setup

A standard BM network is divided into a set of observable and visible units \( \hat{x} \) and a set of unknown hidden units/nodes \( \hat{h} \).

Additionally there can be bias nodes for the hidden and visible layers. These biases are normally set to \( 1 \).

BMs are stackable, meaning we can train a BM which serves as input to another BM. We can construct deep networks for learning complex PDFs. The layers can be trained one after another, a feature which makes them popular in deep learning

However, they are often hard to train. This leads to the introduction of so-called restricted BMs, or RBMS.Here we take away all lateral connections between nodes in the visible layer as well as connections between nodes in the hidden layer. The network is illustrated in the figure below.

## The structure of the RBM network

## The network

**The network layers**:

- A function \( \mathbf{x} \) that represents the visible layer, a vector of \( M \) elements (nodes). This layer represents both what the RBM might be given as training input, and what we want it to be able to reconstruct. This might for example be the pixels of an image, the spin values of the Ising model, or coefficients representing speech.
- The function \( \mathbf{h} \) represents the hidden, or latent, layer. A vector of \( N \) elements (nodes). Also called "feature detectors".

## Joint distribution

The restricted Boltzmann machine is described by a Boltzmann distribution

$$\begin{align}P_{rbm}(\mathbf{x},\mathbf{h}) = \frac{1}{Z} e^{-\frac{1}{T_0}E(\mathbf{x},\mathbf{h})},\tag{1}\end{align}$$

where \( Z \) is the normalization constant or partition function, defined as

$$\begin{align}Z = \int \int e^{-\frac{1}{T_0}E(\mathbf{x},\mathbf{h})} d\mathbf{x} d\mathbf{h}.\tag{2}\end{align}$$

It is common to ignore \( T_0 \) by setting it to one.

## Defining different types of RBMs

There are different variants of RBMs, and the differences lie in the types of visible and hidden units we choose as well as in the implementation of the energy function \( E(\mathbf{x},\mathbf{h}) \).

**Binary-Binary RBM:**

RBMs were first developed using binary units in both the visible and hidden layer. The corresponding energy function is defined as follows:

$$\begin{align}E(\mathbf{x}, \mathbf{h}) = - \sum_i^M x_i a_i- \sum_j^N b_j h_j - \sum_{i,j}^{M,N} x_i w_{ij} h_j,\tag{3}\end{align}$$

where the binary values taken on by the nodes are most commonly 0 and 1.

**Gaussian-Binary RBM:**

Another variant is the RBM where the visible units are Gaussian while the hidden units remain binary:

$$\begin{align}E(\mathbf{x}, \mathbf{h}) = \sum_i^M \frac{(x_i - a_i)^2}{2\sigma_i^2} - \sum_j^N b_j h_j - \sum_{i,j}^{M,N} \frac{x_i w_{ij} h_j}{\sigma_i^2}. \tag{4}\end{align}$$

## Representing the wave function

The wavefunction should be a probability amplitude depending on \( \boldsymbol{x} \). The RBM model is given by the joint distribution of \( \boldsymbol{x} \) and \( \boldsymbol{h} \)

$$\begin{align}F_{rbm}(\mathbf{x},\mathbf{h}) = \frac{1}{Z} e^{-\frac{1}{T_0}E(\mathbf{x},\mathbf{h})}.\tag{5}\end{align}$$

To find the marginal distribution of \( \boldsymbol{x} \) we set:

$$\begin{align}F_{rbm}(\mathbf{x}) &= \sum_\mathbf{h} F_{rbm}(\mathbf{x}, \mathbf{h}) \tag{6}\\&= \frac{1}{Z}\sum_\mathbf{h} e^{-E(\mathbf{x}, \mathbf{h})}.\tag{7}\end{align}$$

Now this is what we use to represent the wave function, calling it a neural-network quantum state (NQS)

$$\begin{align}\Psi (\mathbf{x}) &= F_{rbm}(\mathbf{x}) \tag{8}\\&= \frac{1}{Z}\sum_{\boldsymbol{h}} e^{-E(\mathbf{x}, \mathbf{h})} \tag{9}\\&= \frac{1}{Z} \sum_{\{h_j\}} e^{-\sum_i^M \frac{(x_i - a_i)^2}{2\sigma^2} + \sum_j^N b_j h_j + \sum_{i,j}^{M,N} \frac{x_i w_{ij} h_j}{\sigma^2}} \tag{10}\\&= \frac{1}{Z} e^{-\sum_i^M \frac{(x_i - a_i)^2}{2\sigma^2}} \prod_j^N (1 + e^{b_j + \sum_i^M \frac{x_i w_{ij}}{\sigma^2}}). \tag{11}\\\tag{12}\end{align}$$

## Choose the cost/loss function

Now we don't necessarily have training data (unless we generate it byusing some other method). However, what we do have is the variationalprinciple which allows us to obtain the ground state wave function byminimizing the expectation value of the energy of a trial wavefunction(corresponding to the untrained NQS). Similarly to the traditionalvariational Monte Carlo method then, it is the local energy we wish tominimize. The gradient to use for the stochastic gradient descentprocedure is

$$\begin{align}\frac{\partial \langle E_L \rangle}{\partial \theta_i}= 2(\langle E_L \frac{1}{\Psi}\frac{\partial \Psi}{\partial \theta_i} \rangle - \langle E_L \rangle \langle \frac{1}{\Psi}\frac{\partial \Psi}{\partial \theta_i} \rangle ),\tag{13}\end{align}$$

where the local energy is given by

$$\begin{align}E_L = \frac{1}{\Psi} \hat{\mathbf{H}} \Psi.\tag{14}\end{align}$$

## Running the codes

You can find the codes for the simple two-electron case at the Github repository `https://github.com/mhjensenseminars/MachineLearningTalk/tree/master/doc/Programs/MLcpp/src`.

The trial wave function are based on the product of a Slater determinant with either only Hermitian polynomials or Gaussian orbitals, with and without a Pade-Jastrow factor (PJ).

## Energy as function of iterations, \( N=2 \) electrons

## Energy as function of iterations, no Physics info \( N=2 \) electrons

## Onebody densities \( N=6 \), \( \hbar\omega=1.0 \) a.u.

## Onebody densities \( N=6 \), \( \hbar\omega=0.1 \) a.u.

## Onebody densities \( N=30 \), \( \hbar\omega=1.0 \) a.u.

## Onebody densities \( N=30 \), \( \hbar\omega=0.1 \) a.u.

## Or using Deep Learning Neural Networks

Machine Learning and the Deuteron by Kebble and Rios and Variational Monte Carlo calculations of \( A\le 4 \) nuclei with an artificial neural-network correlator ansatz by Adams et al.

**Adams et al**:

$$\begin{align}H_{LO} &=-\sum_i \frac{{\vec{\nabla}_i^2}}{2m_N}+\sum_{i < j} {\left(C_1 + C_2\, \vec{\sigma_i}\cdot\vec{\sigma_j}\right)e^{-r_{ij}^2\Lambda^2 / 4 }}\nonumber\\&+D_0 \sum_{i < j < k} \sum_{\text{cyc}}{e^{-\left(r_{ik}^2+r_{ij}^2\right)\Lambda^2/4}}\,,\tag{15}\end{align}$$

where \( m_N \) is the mass of the nucleon, \( \vec{\sigma_i} \) is the Paulimatrix acting on nucleon \( i \), and \( \sum_{\text{cyc}} \) stands for thecyclic permutation of \( i \), \( j \), and \( k \). The low-energy constants\( C_1 \) and \( C_2 \) are fit to the deuteron binding energy and to theneutron-neutron scattering length

## Replacing the Jastrow factor with Neural Networks

An appealing feature of the ANN ansatz is that it is more general than the more conventional product of two-and three-body spin-independent Jastrow functions

$$\begin{align}|\Psi_V^J \rangle = \prod_{i < j < k} \Big( 1-\sum_{\text{cyc}} u(r_{ij}) u(r_{jk})\Big) \prod_{i < j} f(r_{ij}) | \Phi\rangle\,,\tag{16}\end{align}$$

which is commonly used for nuclear Hamiltonians that do not contain tensor and spin-orbit terms.The above function is replaced by a four-layer Neural Network.

## Conclusions and where do we stand

- Lots of experimental analysis coming, see for example Unsupervised Learning for Identifying Events in Active Target Experiments by Solli et al. as well references and examples in Report from the A.I. For Nuclear Physics Workshop by Bedaque et al..
- Extension of the work of G. Carleo and M. Troyer, Science
**355**, Issue 6325, pp. 602-606 (2017) gives excellent results for two-electron systems as well as good agreement with standard VMC calculations for many electrons. - Promising results with neural Networks as well. Next step is to use trial wave function in final Green's function Monte Carlo calculations.
- Minimization problem can be tricky.
- Anti-symmetry dealt with multiplying the trail wave function with either a simple or an optimized Slater determinant.
- Extend to more fermions. How do we deal with the antisymmetry of the multi-fermion wave function?
- Here we also used standard Hartree-Fock theory to define an optimal Slater determinant. Takes care of the antisymmetry. What about constructing an anti-symmetrized network function?
- Use thereafter ML to determine the correlated part of the wafe function (including a standard Jastrow factor).
- Can we use ML to find out which correlations are relevant and thereby diminish the dimensionality problem in say CC or SRG theories?
- And many more exciting research avenues

## What are the Machine Learning calculations here based on?

This work is inspired by the idea of representing the wave function witha restricted Boltzmann machine (RBM), presented recently by G. Carleo and M. Troyer, Science **355**, Issue 6325, pp. 602-606 (2017). Theynamed such a wave function/network a *neural network quantum state* (NQS). In their article they apply it to the quantum mechanicalspin lattice systems of the Ising model and Heisenberg model, withencouraging results. See also the recent work by Adams et al..

Thanks to Daniel Bazin (MSU), Jane Kim (MSU), Julie Butler (MSU), Dean Lee (MSU), Sean Liddick (MSU), Michelle Kuchera (Davidson College), Vilde Flugsrud (UiO), Even Nordhagen (UiO), Bendik Samseth (UiO) and Robert Solli (UiO) for many discussions and interpretations.