Slow-Fast Auditory Streams For Audio Recognition
Language agents, which use a large language model (LLM) capable of in-context
learning to interact with an external environment, have recently emerged as a
promising approach to control tasks. We present the first language-agent
approach to formal theorem-proving. Our method, COPRA, uses a high-capacity,
black-box LLM (GPT-4) as part of a policy for a stateful backtracking search.
During the search, the policy can select proof tactics and retrieve lemmas and
definitions from an external database. Each selected tactic is executed in the
underlying proof framework, and the execution feedback is used to build the
prompt for the next policy invocation. The search also tracks selected
information from its history and uses it to reduce hallucinations and
unnecessary LLM queries.
We evaluate COPRA on the miniF2F benchmark for Lean and a set of Coq tasks
from the Compcert project. On these benchmarks, COPRA is significantly better
than one-shot invocations of GPT-4, as well as state-of-the-art models
fine-tuned on proof data, at finding correct proofs quickly.