AI as a Statistical Process

March 27, 2024

The Policy-Relevant History of Artifical Intelligence

Artificial Intelligence (AI) is a broad statistical process used to process data. Media coverage of AI has largely focused on applications related to text and image models, such as ChatGPT or DALLE respectively. A common misconception is that these are the only applications of AI. There are diverse applications of these techniques across industries, including manufacturing, product design, construction, medicine, and agriculture.^{^[1]^[2]^[3]}

A useful policy lens for AI is to look at similar but less complex statistical processes throughout history. Each of the most popular AI tools follow directly from techniques in calculus, linear algebra, and statistics. The academic predecessors of AI are directly relevant to policy. They are an intuitive demonstration of both the limitations and potential of machine learning. They provide a historical example of the pitfalls and successes of regulating statistical processes.

Following the historical approach to statistical processes, we recommend a targeted, consequence-based approach to AI policy.

Statistical Inference

When AI is used in modern times, it almost always refers to a statistical technique called a machine learning model, which falls under a family of techniques called statistical inference models. These techniques attempt to use pre-existing data to predict some future occurrence. Statistical inference models are used to predict the outcome of elections^{^[4]}, inflation and unemployment^{^[5]}, the future price of individual stocks^{^[6]}, or the spread of viruses^{^[7]}. They are ubiquitous in science whenever the goal is to predict future outcomes.

There are many commonalities between these simpler models and more complex machine learning models.

Data: All statistical models rely on some form of data. A model that has mostly incorrect data is likely to result in incorrect predictions. A model that has illegally obtained data will almost always have that data influence its predictions to some degree.
Compute constraints: For more complex models, including physics models prior to the widespread use of machine learning, computations may cost billions. For example, the Global Forecast, the system used by the National Oceanic and Atmospheric Administration (NOAA) to generate global weather forecasts, had a 2020 upgrade^{^[8]} costing half a billion dollars and capable of 24 quadrillion calculations per second.
Responsibility: The results of statistical models can be incorrect due to data, methodology, or incorrect assumptions. These errors can influence decisions made by individual people. In the case of algorithmic trading algorithms, robotics, or other automated processes, they can directly cause financial or physical damage.

Moreover, there is a direct academic lineage from these broader statistical inference techniques to modern machine learning models, such as transformers^{^[9]}, the type of machine learning model used for ChatGPT, Google Gemini, Anthropic’s Claude, and other language models. These similarities make the regulatory approach to other statistical models a useful guide to AI regulation.

Regulatory Approach to Statistical Inference

Broadly, the regulatory approach to the error or misuse of statistical inference is to focus on the point of harm. This is already an effective approach to regulating machine learning in areas where existing laws account for high degrees of automated decision-making, such as finance or cybersecurity. Firms using algorithmic trading take responsibility for both the legal compliance of their algorithms and the economic losses that it may cause. Laws assign responsibility both to the supplier and users of cybersecurity software. Whether that software utilizes machine learning is as relevant as whether it uses any other statistical and/or automated technique.

This philosophy can be described as outcome-focused, rather than technique-focused. It is historically successful for several reasons:

Applications of statistical models such as machine learning models are too varied for a “one size fits all” approach. There is insufficient commonality between these applications for unified legislation or regulatory action to achieve its goals effectively, while these variations significantly increase the likelihood that regulatory action causes unexpected damage.
Many regulations in specific industries already affect the use of AI. Regulators within those industries are best equipped to write complementary regulations which benefit the field as a whole, rather than unnecessarily clash in ways that harm workers or users.
Restrictions on the distribution of statistical models such as machine learning models means restricting the distribution of software, which is protected by the first amendment.^{^[10]}
Specific outcomes require domain knowledge to accurately assess. The potential outcomes of the same statistical process in medicine, robotics, finance, or other areas have wildly different costs and benefits that require expertise to evaluate. Due to this knowledge problem, constraints on the process level are likely to harm specific fields while not effectively addressing the underlying issue.

This approach to policy asks three important questions before considering a policy:

Does it address a specific, empirically measurable problem?
Does it draw from existing expertise within the field it is regulating?
Is it integrating existing regulations and practical constraints within this field to create a complementary regulatory framework?

This procedure avoids placing anti-competitive barriers to entry in the research and development phases, while still allowing more than enough regulatory tools at the deployment endpoint to address misuses of machine learning processes.