Representations and abstractions

2025-06-18 / 14 min read

This is a continuation of ideas I've written about relevant to the Abstraction and Reasoning Corpus (ARC). If you are interested in discussing or working on ARC, please reach out!

The terms "representation" and "abstraction" are used a lot in AI research. They are not particularly well-defined, and sometimes they are used interchangeably. I'm going to try to define these in a more formal way, and then explore how they connect with various research and ideas in the context of ARC.

What is a representation? #

Assume we have some data (information, bits, bytes). We want to store it in some format. Given the context, let's use an ARC grid as an example.

A reasonable starting representation of this data is a 2D arrayor matrix of numbers representing the color values (and this is the format that ARC puzzles are provided in):

There are two things we care about in a representation: both its structure/schema, and the specific data of this instance:

  • Representation schema: The blueprint or formal definition of the data structure, including its components, types, and constraints.
  • Representation instance: The actual data that conforms to that schema.

In the above case, the schema is something like int[][] and the instance is this specific instantiation of it along with all its actual values.

We can come up with many (probably an infinite number of) possible representation schemas for storing the above data.

Transforming representations #

Generally we are starting with some raw source representation. This might be the pixels from a digital camera or spectral samples from a microphone in the natural case, and arrays of numbers, or plain text in the digital case.

We can transform our source representation into other representations, and this is common in CS. For example, we could represent our above grid with a data structure like:

grid(10, 10, [                 
  rect(blue,   [2, 3], [0, 7]),
  rect(red,    [2, 2], [3, 8]),
  rect(green,  [2, 2], [6, 8]),
  rect(yellow, [1, 4], [9, 6]),
])                             

This is a perfectly lossless representation of the same information. In general, we can just think of this as functions that encode source representations into new representations:

So there is an effectively infinite world of representations that we can map to. What can we say about them, and what does this have to do with abstractions?

The bases of abstractive power #

Lossiness #

Lossiness is clearly a property of a representation, and is defined in relation to some other (source) representation. A representation is lossless if:

For some source representation , and another representation , a decoder can be defined such that .

If the new transformed representation allows full reconstruction of the source representation, it is lossless. It may only allow partial reconstruction, at which point it is lossy. We could formalize the extent of loss in terms of how much information is preserved or lost, in terms of bits.

A lossy representation of our above grid might be:

[blue, red, green, yellow]

We have preserved some important information: the ordering (left to right) and the color of the shapes, but not the actual shapes, positions, or the overall grid dimensions.

Selectivity #

We can say something qualitative about which information is kept, and which is discarded, in a representation.

We can define this nicely in terms of invariances. What features or properties of the source representation is our target representation invariant to? A representation is invariant to some feature in the source representation if, when we change that information, we end up with the same target representation.

In the example above, we have discarded a lot of information. The representation is invariant to the position and shape of the objects, but we retain some information about the relation (order) between the objects, as well as their color (and the very concept of objects themselves).

This gives us our first hint of the idea of an abstraction. Rather than calling this an abstraction, we will still call it a representation, but by being selective about the information, we have imbued the representation with abstractive power.

Utility #

The main point of data structures in CS is to make it easier to process that data in some specific way. We can formalize this in terms of computational complexity for representations:

We can define the utility of a representation for a task as the degree to which it reduces the computational cost (e.g., time, memory, energy) of performing .

Our lossy representation of the grid above has high utility for some tasks, for example, if we need to calculate:

  • The number of distinct objects
  • Whether a blue object exists
  • The color of the second object from the left

And it has low utility for others:

  • The number of square objects
  • The area of the 3rd object

In this case, our selectivity has led to utility (for some tasks) and removed it for others. This generally seems to be the case: by discarding some information and highlighting others, we can simplify the computational cost of some tasks.

Structural redundancy #

There are two sides to a representation that I separated out earlier: there is the schema (structure), and there is the actual data (the instance).

For many representational structures, there are multiple possible specific sequences of bits (instances) under the same schema that can be produced and mapped back to the same source representation.

Consider a representation of our ARC grids where we store a list of coordinate color pairs, such as:

((0, 0), black)
((0, 1), black)
((0, 9), blue)
...

With 100 elements in that list, there are possible orderings of the elements. We have effectively added additional, either random (if randomly ordered) or redundant (if following some particular order like rows x columns) information.

We could use a set instead, which ignores the ordering, although we can't represent a set in text without exposing some notion of order! We can define a canonical representation to solve this, where for example, we might say that the canonical representation of a set is always in ascending order, for example:

Multiple uncanonical representations:

{0, 2, 1}
{2, 0, 1}
{1, 2, 0}

All map to a single canonical representation:

{0, 1, 2}

This concept can be important when thinking about things like analogical reasoning, or in the case of ARC and program synthesis, it is practically very important to find structural similarities across multiple source representations that can simplify the learning process itself.

Computability & cost #

Does there actually exist an encode function that can get from our source representation to some new target representation?

This isn't actually always the case. For example, I could be trying to produce a representation of some data that is defined as the shortest program (e.g., Python) that reproduces the source data. This amounts to computing the Kolmogorov complexity of the data, which is technically uncomputable.

Let's assume it is computable. How much does it cost (in terms of compute time/memory) to encode our source representation into some new representation? This is something we should care about.

There is a trade-off here: we can expect our transformation function to do some of the heavy lifting, but probably not all of it.

Multi-task and future task utility #

So far, we have considered a single task, where we are considering the abstractive power of a representation in terms of its utility for that particular task. This leads to a somewhat meaningless conclusion: if we are trying to perform some task , and we have an abstraction that is created by some encode function over a source representation, we are effectively trying to do:

But there is no reason for the encode function to not just do all the work, and could just be the identity function such that . By the definitions so far, this new representation has almost perfect utility for our task: the representation is the answer, and the utility is maximized.

We expect good abstractions to be reusable. I mean this in a general sense, although through experience programmers reach a very similar conclusion and "bad" programming abstractions that do too much (or sometimes too little) quickly become a problem in a codebase.

Perhaps the conclusion here is that you don't need abstractions if you only need to perform one task. And perhaps, that is one of the reasons things like end-to-end deep learning in a supervised (single-task) setting are so useful, while such a technique has problems when considering an extremely broad problem set like ARC.

So, a powerful representation is one that:

  • Is not too expensive to compute.
  • Has high utility across multiple tasks: it makes computation easier not just for a single, known task , but for a whole family of tasks .

But what about future tasks we don't yet know about? This is exactly the setup in ARC again, where the test time problems have never been seen before. This is harder to define but gets at the broad idea of generalization in machine learning, particularly out of domain generalization where these representations and abstractions have a better chance of being generally useful across multiple tasks, and improving the efficiency with which we can learn and perform new unknown tasks.

What is an abstraction? #

An abstraction is a representation that has abstractive power.

It's a fundamentally relative concept: relative to the tasks at hand. Given this, it doesn't seem to make a lot of sense to talk about abstractions as "things." Representations are things; abstractive-ness (or abstractive power) is a task-dependent property of a representation that primarily captures its utility in performing a set of tasks.

But if we have to actually define what an abstraction is, we could say the following:

An abstraction is a new, simpler representation created by a purposeful, lossy transformation that discards irrelevant details to reveal a more general structure, thereby increasing its utility for a specific scope of tasks.

How does this relate to ARC? #

This framing of abstractions ties together a few pieces of research and connects well with some of the key ideas in ARC.

It integrates nicely with Francois Chollet's definition of intelligence. His focus on "skill-acquisition efficiency over a scope of tasks" captures our multi/future-task utility objective nicely. An intelligent agent, in this view, is one that excels at building a set of generalized representations that minimizes the cumulative cost of solving a long and varied sequence of future tasks.

While above I described utility in terms of how much easier it is computationally to actually perform a task, what we have in practice for any inductive approach to ARC is a need to reduce the computational cost of finding/searching for the function that performs a task (or, the task is simply a learning problem itself). This is very much at the core of efforts such as DreamCoder, where the primary motivation is to build up a bank of functions (framed as transformations, but might sometimes be considered representations) that are useful to some tasks and therefore might be useful to others, and reduce the search.

It highlights the motivations behind some of the other "representational" approaches too, such as Graphs, Constraints, and Search for the Abstraction and Reasoning Corpus where there are clearly better representations of the ARC grids that ultimately reduce the depth of search required over transformation functions. In this paper, the authors also note that:

defining multiple abstraction processes improves ARGA's ability to correctly capture object information from the input images," and the value of having multiple different representations.

There is some connection to compression here too: small representations (that are well compressed) are likely, but not guaranteed, to have improved utility in solving the actual tasks. This is a motivation behind the paper Object-centric Models and the MDL Principle with the intuition that "the model that best describes the data is the model that compresses them the most."

A lot of the representation work in ARC has been hand-coded. The obvious question is, how do you learn good representations (or, representations with high abstractive power across future tasks)? Representation learning is an established field, and for example, LLMs clearly learn some form of representations and abstractions. In Omni-ARC, a motivating idea behind the work is to train a model across multiple tasks in the hope of extracting better, more general representations.

In Towards Efficient Neurally-Guided Program Induction for ARC-AGI, I took note of the following statement from the authors, which I believe highlights that there is an important problem in terms of understanding the data (possibly through abstractions) as well as through program search:

ARC-AGI is fundamentally not a grid-to-grid problem... most of the steps involved in solving an ARC-AGI task are not strictly grid-to-grid transformations... [but] instead can be thought of as a sequence of intermediate program states which themselves are lists of various kinds of intermediate objects and values.

One other challenge with ARC, and specifically ARC-2, is the need for composability: for example, abstractions of abstractions, or tasks that require applying and then combining the result of multiple abstractions.

Some of the ARC problems highlight the importance of having the right abstraction and the fact that there are many possible interpretations of the grids. While many representation approaches have been object-oriented, sometimes that is not helpful. In 28a6681f the transformation is to treat the blue cells like water and let them flow down with gravity.

▼▼▼▼▼

Any kind of object-centric representation of this grid is actually probably unhelpful! The transformation is simpler in pixel space. Water is a liquid, and liquids are best simulated by particles, so the useful representation for solving this task is probably just the cells themselves.

Another example is 2d0172a1, where a lossy abstraction over the input grid is extremely useful. The problem is relatively simple: I have some objects and what I care about is their topology and relations to other objects. That is all the information I need to know in order to produce the output grid; I don't care about the exact shape of the paths or objects in the input, as this is not useful for the task.

▼▼▼▼▼

The topological abstraction here is easy to spot for a human; you just see it, and that makes solving the task relatively easy.

reply via email twitter

< lewish