Google Deepmind recently figured out a way for an LLM to break ground in mathematics.
FunSearch: Making new discoveries in mathematical sciences using Large Language Models
The actual journal article is worth a read. They worked on two mathematics problems: the “cap set problem” and the “online bin packing problem.” I’m fuzzy on the description of the first one, but the second problem is relatively easy to describe.
Imagine you are a shipping company, and you have a bunch of shipping containers. You are handed packages, and you have to pack them into the shipping container. However, you don’t know the size of the next package. All you can do is try to see where this package would fit best. Eventually, you have to ship the container, and any empty space in the container is lost money. But if you aren’t selective enough, you’ll have a bunch of awkward spaces in the container.
So what these researchers did was create a skeleton program.
"""Finds good assignment for online 1d bin
packing."""
import numpy as np
import utils_packing
# Function to be executed by FunSearch.
def main(problem):
"""Runs `solve` on online 1d bin packing instance,
and evaluates the output."""
bins = problem.bins
# Packs `problem.items` into `bins` online.
for item in problem.items:
# Extract bins that have space to fit item.
valid_bin_indices
=vutils_packing.get_valid_bin_indices(item,bins)
best_index = solve(item,bins[valid_bin_indices])
# Add item to the selected bin.
bins[valid_bin_indices[best_index]] -= item
return evaluate(bins, problem)
def evaluate(bins, problem):
"""Returns the negative of the number of bins
required to pack items in `problem`."""
if utils_packing.is_valid_packing(bins, problem):
return -utils_packing.count_used_bins(bins,problem)
else:
return None
def solve(item, bins):
"""Selects the bin with the highest value according
to `heuristic`."""
scores = heuristic(item, bins)
return np.argmax(scores)
# Function to be evolved by FunSearch.
def heuristic(item, bins):
"""Returns priority with which we want to add
`item` to each bin."""
return -(bins - item)
But then they gave an example of an algorithm (the “# Function to be evolved by FunSearch.” section above) and asked the LLM for other suggestions for that algorithm. When the LLM did, they ran the algorithm (if it was valid), ranked its score and stored the program in a database.
They repeated that process with a couple of extra (very important) techniques until they eventually found a solution that beat all other known solutions.
This is so fucking cool. Let me list the ways:
- Graduate students would spend years of their lives trying to do something like this. This is so much cheaper and more efficient.
- There have been similar advances using Neural Networks, but the problem with Neural Networks is all we can say is, “It’s better, but we have no idea what the Neural Network is doing.” In this case, because they got the program back, they can read it to try to develop the principles about why it was better. For example, here’s the winning program:
def heuristic(item: float, bins: np.ndarray) -> np.ndarray:
"""Online bin packing heuristic discovered with FunSearch."""
score = 1000 * np.ones(bins.shape)
# Penalize bins with large capacities.
score -= bins * (bins-item)
# Extract index of bin with best fit.
index = np.argmin(bins)
# Scale score of best fit bin by item size.
score[index] *= item
# Penalize best fit bin if fit is not tight.
score[index] -= (bins[index] - item)**4
return score
Which the authors described as
Instead of packing items into bins with the least capacity (such as best
fit), the FunSearch heuristics assign items to least capacity bins only if
the fit is very tight after placing the item. Otherwise, the item is typically placed in another bin, which would leave more space after the
item is placed. This strategy avoids leaving small gaps in bins that are
unlikely to ever be filled.
- If we can give AI a framework for solving a scientific problem, what else could it solve for us?
I think that one of the most important aspects of the paper was their description of the kind of problem that using this generative AI is good for:
We note that FunSearch, at present, works best for problems having
the following characteristics: (1) availability of an efficient evaluator;
(2) a ‘rich’ scoring feedback quantifying the improvements (as opposed
to a binary signal) and (3) ability to provide a skeleton with an isolated
part to be evolved. For example, the problem of generating proofs
for theorems falls outside this scope, because it is unclear how
to provide a rich enough scoring signal.
I’m intensely interested in how to use AI to tackle scientific problems, and I would add a few other criteria from my own experience:
- The method of evaluating the code must be runnable in a “reasonable” amount of time.
- The problem space has to be confined enough that there aren’t so many free parameters it overwhelms the algorithm.
I’m specifically thinking of how these algorithms compare to the modelling I did during my Ph.D. in photonic crystals.
To characterize a photonic crystal design, the simulations would take days on high-performance computing clusters to run, and there were so many different things you could tweak. Each of the holes in the photonic crystal could be individually tuned, along with the properties of the light and how it couples into the material. It was possible to get a single evaluation of how good the photonic crystal was at doing the job, but the likelihood of the LLM generating code to produce a better design still seems really far away.
References:
Romera-Paredes, B., Barekatain, M., Novikov, A. et al. Mathematical discoveries from program search with large language models. Nature (2023). https://doi.org/10.1038/s41586-023-06924-6