A
non-final (with minor things missing) of the paper published as: Chater, N. (1999). The search for
simplicity: A fundamental cognitive principle? Quarterly Journal of Experimental Psychology, 52A, 273-302.
The search for simplicity:
A fundamental cognitive
principle?
Nick Chater*
Department of Psychology,
University of Warwick.
E-mail : nick.chater@warwick.ac.uk
*Please address correspondence concerning this article to Nick Chater, Department of Psychology, University of Warwick, Coventry CV4 7AL, UK.
It is proposed
that the cognitive system imposes patterns on the world according to a
simplicity principle: choose the pattern which provides the briefest
representation of the available information. The simplicity principle is
normatively justified--patterns which support simple representations provide
good explanations and predictions, on the basis of which the agent can make
decisions and actions. Moreover, the
simplicity principle appears to be consistent with empirical data from many psychological
domains, from perceptual organization, to similarity, reasoning, memory and
scientific thinking. Thus, the simplicity principle promises to serve as the
starting point for the rational analysis of a wide range of cognitive
processes, in Anderson’s (1990, 1991a) sense. The simplicity principle also
provides a framework for integrating a wide range of existing psychological
proposals.
The cognitive system must cope with a
world which is immensely complex but which is, nonetheless, highly patterned.
The patterns are crucial. In a completely random world, prediction, explanation
and understanding would be impossible--there would be no patterns on which
prediction could be based, to which explanations could refer, or the
comprehension of which could amount of understanding. Even more fundamentally,
there would be no basis to choose one action rather than another, without any
patterns relating actions to consequences.
The
ability to find patterns in the world is therefore of central importance throughout
cognition. Without the ability to find such patterns an agent might as well be
in a random world: it would be able to predict, explain and understand nothing;
and it would have no basis on which to choose its actions. By contrast, the
cognitive systems of people and animals appear to be conspicuously successful
in coping with the world. Somehow, cognitive processes are able to find
patterns successfully.
How
is this success achieved? Any proposal must meet two adequacy criteria. (1) It
must be normatively justified--without such normative justification, the
success of the method of finding patterns is mysterious; (2) It must be descriptively
correct--it must accord with empirical data--at least to some
approximation. A theory which is both normatively justified and descriptively
correct provides a rational anlaysis of a cognitive process (for
discussion of this concept see, for example, Anderson, 1990, 1991a, 1991b;
Anderson & Schooler, 1991; Oaksford & Chater, 1994, 1995a; Oaksford
& Chater, in press). So explaining how the cognitive system successfully
finds patterns requires providing a rational analysis of the cognitive systems
pattern finding capabilities.
I
propose that patterns are found by following a fundamental principle: choose
the pattern that provides the simplest explanation of the available
data. Moreover, I suggest that this principle applies at all levels of
cognition, from the organization of perceptual input, to scientific inquiry.
Thus, the simplicity principle can be used as a starting point for detailed
rational analyses of a wide range of cognitive processes.
The
idea that cognition involves a search for simplicity has a long lineage, in
both the discussion of normative and descriptive issues. On the normative side,
the injunction to favor simple scientific theories can be traced to William of
Ockham1 (1290?-1349?) and is endorsed by Newton (see Li &
Vitányi, 1993: p. 277)). Simplicity was also assigned fundamental
importance in early positivist epistemology (e.g., Mach, 1960/1883), and it
remains a standard principle in modern philosophy of science (e.g., Sober,
1975). Simplicity is also recognized as important in statistics. If a straight
line and a cubic fit the same data equally well, then the straight line is the
preferred model because it is simpler--it contains fewer adjustable parameters.2 Moreover, a
preference for simple explanations is a standard methodological principle in
informal scientific discourse--for a prominent psychological example, see
Pylyshyn’s (1984) discussion of the importance of having fewer model parameters
than data points in cognitive modelling. But although the preference for simple
patterns has been widely recognized, simplicity has typically remained a
largely intuitive notion. Over the last thirty years, however, a rich and
important theory of simplicity, Kolmogorov complexity, has been developed and widely applied by
mathematicians (Chaitin, 1966; Kolmogorov, 1965; Solomonoff, 1964; for an
overview see the excellent textbook by Li & Vitányi, 1993),
statisticians (Rissanen, 1987, 1989; Wallace, & Freeman,1987) and computer
scientists (Quinlan & Rivest, 1989; Wallace & Boulton, 1968). This
theory allows rigorous normative justifications to be given for why choosing
the simplest pattern leads to the best explanations and predictions; and also
allows the more concrete formulation of the psychological proposal that
cognition seeks to find the simplest pattern. I will outline this account of
simplicity and its potential application to cognition below.
Simplicity also has also been
frequently viewed as important from the point of view of describing,
rather than justifying, cognitive processes. Mach (1959/1886), one of the
strongest advocates of simplicity of a normative principle in science, also proposed
that perceptual system seeks to find the simplest representations of sensory
input. This viewpoint is echoed in the proposal in the Gestalt tradition that
perceptual organization is chosen to maximize “prägnanz” (Koffka,
1962/1935), a notion closely related to simplicity which aims to integrate the
range of specific Gestalt principles of perceptual organization (good form,
good continuation, and so on). Moreover, Hochberg and McAllister (1953)
explicitly identified the goal of perceptual organization as maximizing
simplicity, and this work was followed by a variety of related proposals, where
simplicity is measured in different ways (Buffart, Leeuwenberg & Restle,
1981; Garner, 1962, 1974; Leeuwenberg, 1969, 1971). Moving from perception to
the psychological processes involved in scientific inference, simplicity has
also frequently been invoked as an important guiding principle. For example,
scientists frequently report strong aesthetic preferences in theory
construction and evaluation, using terms such as “simplicity,” “elegance,”
“parsimony” and so on to describe desirable properties of theoretical
proposals. Einstein has been attributed
with the remark that “Everything should be made as simple as possible, but not
simpler” (COVER/tHOMAS or EYSENCK/KEANE). This preference for simplicity is
sometimes expressed so strongly that it even overrides concern the fit with the
data DIRAC in STEWART/GOLUBITSKY. Thus simplicity has been implicated as a
guiding principle in finding patterns, from perceptual processing to scientific
reasoning. I propose that simplicity may have an even more general role in cognition: ranging from reasoning
and memory, to learning and similarity.
This
paper has three parts. The first introduces the problem of finding patterns in
data, and why it is normatively and descriptively puzzling because there are an
infinite number of patterns consistent with any finite set of data. The second
part considers the normative question of how patterns should be found. I
outline how simplicity can be quantified in terms of the mathematical theory of
Kolmogorov complexity and how this theory explains why searching for simple
patterns is normatively justified as a strategy for predicting and explaining
the world, and as a partial basis for deciding how to act. The third part
considers the descriptive problem of how various cognitive processes actually do
find patterns. The approach is programmatic--I aim to provide an integrated
framework for apparently diverse cognitive problems, and suggest directions for
future research, rather than attempting a definitive account in any one area.
Overall, I hope to show that simplicity is both a normatively justified and
descriptively plausible account of how the cognitive system finds patterns in a
range of domains.
Part I: The problem of finding patterns
Consider the problem of finding
patterns in a finite portion of an infinite sequence. In the portion of the
sequence that we observe, just two states are found. Let us call the binary
values “Black” and “White” to allow the visual representation shown in Figure
1(a). In this finite sequence, an intuitively evident pattern is that there is
an alternation of the two states. If this pattern is correct, then the sequence
should continue as shown in Figure 1(b). But another pattern, equally
consistent with the observed data, is that there is an infinite sequence of
“white”, followed by an alternating sequence of white and black, and then an
infinite sequence of “black.” The observed data is assumed to correspond to the
middle part of this sequence (Figure 1(c)). Moreover, a further pattern
consistent with the data consists of a jumble of states (many not occuring in
the observed part of the sequence at all--represented by patterned squares in
the figure) to the left and right of the alternating white and black items that
are observed. Again this kind of pattern is precisely consistent with the
observed data. More generally, it is clear that an infinite number of patterns
are consistent with any finite set of data.
INSERT FIGURE 1 ABOUT HERE
A similar example, of traditional
psychological interest (e.g., Dinnerstein & Wertheimer,1957; Kanizsa &
Gerbino, 1982), concerns the completion of occluded figures (Figure 2). The
intuitively natural completion of the occluded region in Figure 2(a) interprets
the figure as a square partially occluded by another square (Figure 2(b)) This
completition is predicted by two Gestalt principles: good continuation, which
states that lines should be assumed to continue as smoothly as possible, and
good form, which states that completions should prefer regular underlying
figures3. But, again, an infinite number of alternative completions
are possible (Figure 2(c)).
INSERT FIGURE 2 ABOUT HERE
The hard-headed psychologist may feel
tempted to dismiss the rather bizarre patterns shown in Figures 1 and 2 as
“silly.” Of course, such a psychologist might say, the cognitive system is only
concerned with “sensible” patterns, and bases its explanations, predictions and
decisions on these. The psychologist might go on to point that the really
interesting issue is how the cognitive system copes with patterns where they
are two “sensible” patterns which can be imposed on a pattern, and some choice
must be made between them. But this impatient response misses the point. The
psychologist must explain our intuitions about which patterns are
“silly” and which are “sensible” patterns, and cannot take them for granted,
because these intuitions are themselves the outcome of psychological processes.
Indeed, these intuitions must be explained in two ways. First, some normative
justification must be given for assuming that the cognitive system is justified
in favoring “sensible” patterns, and basing its predictions, explanations and
decisions on these. This issue is the focus of Part II. Second, the descriptive
question of how the cognitive system differentiates between “silly” and
“sensible” patterns must also be addressed--we leave desciptive issues to Part
III.
Part II: Finding patterns: The normative problem
Despite our strong intuitions that all
patterns consistent with a finite set of data are not equal, (i.e., that some
are plausible and others are absurd) there has been a long sceptical tradition
in philosophy arguing that no normative justification can be given for such
preferences (e.g., Goodman, 1965; Hume, 1739-1740/1965; Popper4, 1959/1934).
But this scepticism is unattractive, because it makes utterly mysterious the
remarkable and consistent success that cognitive systems enjoys on the basis of
favoring some patterns over others.
Fortunately,
the sceptical challenge can be addressed by applying the mathematical theory of
Kolmogorov complexity. This theory quantifies simplicity and shows that a
preference for simpler patterns is justified, because describing the world in
terms of simple patterns consistently leads to better predictions, explanations
and decisions.
Before
considering how this theory measures simplicity, however, we must first ask:
what should be measured for simplicity?
The simplicity
of what?
In choosing patterns on the basis of
simplicity, the most obvious suggestion is that the simplest available
pattern should be preferred. This principle correctly favours an
indefinitely long sequence of alternating black and white squares in Figure 1;
and the “square” completion in Figure 2. But, taken at face value, it also has
a paradoxical consequence: a very simple pattern, such as that the pattern in
Figure 1 is an infinitely long sequence of black squares or that the pattern in
Figure 2 is a simple uniform field, will always be preferred. Such
possibilities are, of course, ruled out by the constraint that the pattern has
to be consistent with the available data--thus, these “null” patterns
are just too simple. But this point itself raises difficult questions: What
does it mean for a pattern to be consistent with the available data? Can consistency with the input be traded
against simplicity of intepretation? If so, how are simplicity and consistency
with the input to be jointly optimized?
We shall see that the theoretical account of simplicity presented below
answers these questions.
There
is, however, a further, and more subtle difficulty: what rules out the simplest
possible “null” pattern--such a “pattern” could be interpreted as saying that
“anything goes”? The null pattern will
be consistent with the available data; indeed it would be consistent with any
data, because it rules nothing out. Mere consistency or compatibility with the
data is plainly not enough; the pattern must also, in some sense, capture
regularities in the data (Harman, 1965). But this appears to imply that
choosing a pattern involves the joint optimization of two factors; and the
relative influence of these two factors is unspecified. Moreover, this conclusion is unattractive
because two notions, simplicity and explanatory power, must be explicated
rather than just one.
Fortunately,
there is an alternative way to proceed.
This is to view a pattern as a way of encoding the data; and to
propose the pattern chosen is that which allows the simplest encoding of the
stimulus. This view disallows null
or nearly null patterns, which which
bear little or no relation to the data, because these organizations do not help
encode the stimulus simply. It also
provides an operational definition of the “explanatory power” of a pattern--as
the degree to which that organization helps provide a simple encoding of the
stimulus. If a pattern captures the
regularities in the pattern (i.e., if it “explains” those regularities), then
it will provide the basis for a brief description of the data; if an
organization fails to capture regularities in the data, then it will be of no
value in providing a brief description. Explanatory power is therefore not an
additional constraint that must be traded off against simplicity; maximizing
explanatory power is the same as maximizing the simplicity of the encoding of
the stimulus.
Quantifying
simplicity
To apply the injunction to choose the
pattern which provides the simplest encoding of the data, we need a measure of
simplicity. There is a long tradition in philosophy of equating simplicity with
brevity in some coding language (REF Phil of Sci-see anti-Occam). In
psychology, this general approach has been applied in a variety of contexts,5 from the
organization of simple sequences, such as the example we have just considered
(Leeuwenberg, 1969; Restle, 1970; Simon, 1972; Simon & Kotovsky, 1963; Vitz
& Todd, 1969), to judgments of “figural goodness” (Hochberg &
McAllister, 1953), the analysis of Johansson’s (1950) experiments on the
perception of motion configurations (Restle, 1979), and figural completion
(Buffart, Leeuwenberg & Restle, 1981).
It has also been advanced as a general framework for understanding
perceptual organization (e.g., Attneave & Frost, 1969; Leeuwenberg, 1971;
Leeuwenberg & Boselie, 1988). I shall discuss some of these topics in Part
III.
Approaches
based on brevity of encoding in some description language appear to be dogged
by two problems: 1) that a fresh description language must be constructed for
each fresh kind of pattern; 2) that the predictions of the theory depend on the
description language chosen, and there is no (direct) empirical means of
deciding between putative languages6.
Kolmogorov
complexity theory addresses these problems. The first problem is avoided by
choosing a much more general language for encoding. Specifically, the language
chosen is a universal programming language. A universal programming
language is a general purpose language for programming a computer. The familiar
programming languages such as PROLOG, LISP and PASCAL are all universal
programming languages. How can an object, such as a perceptual stimulus, be
encoded in a universal programming language such as, for example, LISP? The
idea is that a program in LISP encodes an object if the object is generated as
the output or final result of running the program. By the definition of a
universal programming language, if an object has a description from which it
can be reconstructed in any language, then it will have a description
from which it can be reconstructed in the universal programming language. It is
this that makes the programming language universal.
Moreover,
in solving the first problem, the second problem, that different patterns languages
give different code lengths, is solved automatically. A central result of Kolmogorov complexity theory, the Invariance
Theorem (Li & Vitányi, 1993), states that the length of the
shortest description, of an object, x is invariant (up to a constant)
between different universal languages (though, as we discuss below, the choice
of language may be important in developing theories of particular cognitive
processes). This quantity, K(x), is defined as the Kolmogorov
complexity of an object. Similarly, we can defined the conditional
Kolmogorov complexity, K(y|x), between two objects x
and y. This the length of the shortest program that transforms x
into y.
This
talk of universal programming languages may appear rather
unpsychological--after all, the cognitive system presumably does not represent
information in PROLOG, LISP or PASCAL!
But the notion of a universal programming language is actually very
broad--almost any reasonably rich system of representation, including most
proposals concerning mental representation, are universal, and hence the
Kolmogorov complexity measure can be applied.
So
we now have a definite interpretation of the claim that patterns are chosen on
the basis of simplicity: the pattern is chosen that with which the data can be
encoded as briefly as possible. We now consider why this preference for
simplicity is justified.
The
justification of simplicity
There are various criteria by which a
particular choice of pattern in a set of data might be justified. The best
pattern might the pattern that is the most likely explanation of how the
data was generated; the pattern that gives rise to the best predictions;
or the pattern that provides the best basis for decision making.
Fortunately, simplicity can be justified in each of these ways.
Simplicity and the most likely
explanation7
Suppose that we have data, D,
and a set of hypotheses concerning the pattern in the data. The most likely
hypothesis is the hypothesis, H, that has the greatest probability,
given the data. In symbols, this is the H that maximizes P(H|D).
Bayes’ theorem, a standard theorem of probability theory, states that:
(1)
That is, the probability of the
hypothesis given the data is proportional to the product of the probability of
the data given the hypothesis and the prior probability of the hypothesis. By
elementary mathematics, choosing the H that maximizes (1) is equivalent
to choosing the H that minimizes (2):
(2)
Under very general conditions, -log2P(x) is
approximated by the Kolmogorov complexity of x, K(x), and
-log2P(y|x) is approximated by its conditional
Kolmogorov complexity of y given x, K(y|x)
(see Li & Vitányi, 1995; Vitányi & Li, 1996 for a
rigorous analysis, and Chater (1996) for a more informal discussion). This
duality between probabilities and code lengths is of great importance, and has
been widely used in statistics (e.g., Rissanen, 1987, 1989) and artificial
intelligence (e.g., Cheeseman, 1995) and computer vision (e.g., Mumford, 1992),
as well as having direct psychological implications8 (Chater,
1996).
Thus,
choosing the H that maximizes (1) is equivalent to choosing the H
that minimizes (3):
(3)
But (3) has the following
interpretation: K(H) is the length of the shortest pattern to
specify the hypothesized pattern, H; and K(D|H) is
the length of the shortest pattern which specifies the data, D, given H.
The sum of these quantities is therefore the description length of the data,
given the hypothesized pattern--the description consists of two parts: first,
the pattern must be specified, and then the specific data must be specified in
terms of the pattern. Therefore, (3) can be informally glossed as follows
Shortest
description of D, using H (4)
According to Bayes’ theorem, H should
be chosen to be as probable as possible (i.e., to maximize (1)). But we have
seen that this is equivalent to choosing H to minimize (4): i.e., the pattern
should be chosen in order to provide the simplest specification of the data.
Therefore, choosing the simplest hypothesized pattern is justified
because it amounts to choosing the pattern which is the most likely
explanation of the data.
Simplicity and prediction
Let us consider prediction in the
simple setting where the environment consists of a string of 0s and 1s. A
continuous portion, x1...xn, of the
sequence is observed--the task is to predict the next item, xn+1, in the
sequence. By elementary probability theory
(5)
The best prediction xn+1 is the one
that has the highest probability of being true--i.e., that maximizes (5).
Because the denominator does not contain xn+1
the
best prediction will also maximize
(6)
and will minimize:
(7)
Using the equivalence between
Kolmogorov complexity and probability, as above, the best prediction xn+1
therefore
minimizes:
(8)
Thus prediction is achieved by finding
the pattern that is the basis for the shortest code for x1...xn
and
then choosing the next item xn+1
that
follows according to that pattern.9 We can
therefore conclude that the predictions of the pattern chosen by the simplicity
principle are most likely to be true.
This
heuristic argument has been made rigorous in Li & Vitányi (1997; see
also Li & Vitányi, 1993 for related discussion). Moreover,
mathematical justifications for prediction based on patterns which are chosen
to minimize description length have been provided in other mathematical
contexts (e.g., Rissanen, 1987, 1989; Vapnik, 1995). Moreover, predictions
based on this principle have been successful in a range of practical applications
(e.g., Goa & Li, 1989; Quinlan & Rivest, 1989). Thus, choosing patterns
on the basis of simplicity appears justified as a basis for prediction.
Simplicity and decision making
Finding patterns by simplicity allows
an agent to predict and explain the world. These are abstract goals, but
nonetheless goals which are of fundamental importance to guiding decisions
about practical action. The standard normative theory of how decisions should
be made, decision theory, requires associating possible events with by a number
representing its utility; and assessing the probabilities of outcomes if a
particular action is taken (Berger, 1985). Decision theory recommends choosing
the action which maximizes expected utility.10
This
account of how decisions should be made has a clear role for the simplicity
principle--simplicity determines the probability of possible events, which are
then combined with utilities to determine what action should be taken. Thus,
the simplicity principle relates not merely to the abstract normative goals of
inferring the most probable pattern, or predicting what will happen, but to the
concrete problems of deciding how to act.
Part III: Finding patterns: Describing cognitive function
I have argued that the finding patterns
should proceed by choosing patterns which support the most economical
encoding of the relevant data. This suggests a possible (partial) account of
the remarkable success of the cognitive system in prediction, understanding and
acting in an uncertain and complex environment: that cognitive processes search
for simplicity. I now consider whether this proposal provides a basis for
plausible descriptive psychological theories. I begin by giving a broad outline
of how the proposal that cognition is guided by simplicity should be
understood, and outlining in general terms its potential implications for core
aspects of cognition: reasoning, learning and memory. I then consider two case
studies, taken from the study of perception and of similarity, which show how
this approach can lead to specific theoretical proposals.
Simplicity and
cognition: The broad picture
A psychological simplicity principle
The normative
discussion suggests that the cognitive system should aim to find the simplest
possible interpretation of the information available to it, whether that
information be perceptual input or scientific data. But the proposal that the
cognitive system invariably finds the very simplest interpretation is
unrealistic, for two reasons. First,
the computational problem of finding the shortest encoding of a set of data is
generally very difficult. If the coding language is rich enough to express an
arbitrary computable function (i.e., if it can be viewed a general purpose
programming language--which is true of for a surprisingly large class of
languages), then the problem of finding the simplest interpretation is provably
uncomputable (see, e.g., the discussion in Chater (1996) and the formal results
in Li & Vitányi, (1993)). For such languages, the strong form of the
SL principle is therefore ruled out, on pain of violating the Church-Turing
thesis. But even if the representation language is not rich enough to express
an arbitrary computable function, the search for the simplest interpretation is
still typically combinatorially explosive (although see Helm & Leeuwenberg,
1986, for a rare counterexample).
Second, empirical considerations
also suggest that the cognitive system does not, in general, find the very
simplest perceptual organisation. A well-known perceptual example is that Glass
patterns (Glass, 1969; Glass & Peréz, 1973) where there is a wide
separation between the two “copies” of the pattern may appear to be entirely
unstructured. Chater (1996) considers a more extreme case: A binary expansion
of π represented as a pattern of black or white squares would appear
completely random; its simple description (as an expansion of π) cannot be
discovered by the cognitive system11.
A psychological form of the
simplicity, therefore, cannot specify that the cognitive succeeds in finding
the shortest description of the information available to it. Rather, simplicity
should be viewed as a goal of cognitive processing: the cognitive system
chooses the simplest interpretation of this information that it can find.
A
further important issue in the psychological interpretation of the simplicity
principle concerns mental representation. I noted above that Kolmogorov
complexity theory abstracts over representation languages, so that the theory
can be used as a general framework for theorizing about cognition without a
detailed undertanding of the nature of mental representation. Nonetheless, the specific
representations used by the cognitive system will be of crucial importance in
detailed psychological explanation. Indeed, note that, according to the
relationship between simplicity and probability above, the coding language can
be viewed as encoding a set of prior probabilities concerning possible
patterns. Evidence concerning mental representation from any source may thereby
be useful in providing constraints on the predictions of simplicity-based
accounts of cognition12. For example,
evidence from linguistics or psycholinguistics concerning the nature of the
mental representations involved in understanding natural language must be taken
into account in any simplicity/likelihood account of how the
perceptual/cognitive system finds structure in speech13.
Having
considered how the simplicity principle can be interpreted as a psychological
proposal, I now consider how it can be applied to understanding cognition.
Below, I outline how the simplicity principle can be applied to two specific
areas: perception and similarity. First, I sketch, in very broad terms, how it
can be related to some of the major topics in cognitive psychology.
Reasoning
In a series of papers (Chater &
Oaksford, 1990, 1993; Oaksford & Chater, 1991, 1992, 1993, 1995b), Mike
Oaksford and I have argued that almost all everyday reasoning is uncertain:
people draw conclusions that are plausible, but not certain, given the
premises. We have argued that probability theory, the calculus of uncertainty,
is therefore a more appropriate starting point for understanding human
reasoning that logic, the calculus of certainty. Moreover, we have argued that
people interpret classic psychological reasoning tasks, which are typically
assumed to be deductive, in probabilistic terms, and solve them using
strategies which can be understood in probabilistic terms. Thus, we argue that
people are not logical, but that they are rational; logic is simply the wrong
standard against which to assess most human reasoning. This viewpoint has
proved useful in providing detailed models of a range of standard reasoning
tasks, including Wason’s selection task (Wason, 1966; 1968; Oaksford &
Chater, 1994, 1995a; see Almor & Sloman, 1996; Evans & Over, 1996;
Laming, 1996 for critical discussion and Oaksford & Chater,1996 for a
response; see also Oaksford, Chater, Grainger & Larkin, 1997 for empirical
evidence), syllogistic reasoning (Chater & Oaksford, ms) and conditional
inference (Oaksford & Chater, ms). Because of the duality between
simplicity and probability, a probabilistic interpretation of human inference
is immediately compatible with the simplicity principle outlined here.
Note
that this viewpoint proposes that simplicity/probability is a goal of
reasoning--but that this goal will only be approximated. Theorists differ on
how good such an approximation might be. Kahneman and Tversky have argued that
their experimental results showed strong departures from the norms of
probability theory under certain conditions (e.g., Kahneman & Tversky,
1973; Kahneman, Slovic & Tversky,1982), although the reasoning hueristics
(the availability and representativeness) that they proposed people use are
usually reasonably reliable in normal circumstances. Gigerenzer and his
colleagues (e.g., Gigerenzer, Hell & Blank, 1988; Gigerenzer & Murray,
1987) have argued that Kahneman and Tversky may have substantially
underestimated the normative correctness of human probabilistic reasoning, and
shows experimental manipulations which clarify the task for the experimental
participant can dramatically improve the fit between reasoning performance and
probabilistic norms.
More
recently, however, Gigerenzer and Goldstein (1996) and Evans and Over
(1996?REF) have proposed that human performance should not be compared with
normative theories, such as logic or probability theory (or, in the present
context, the simplicity principle). They argue that such normative accounts are
entirely unnecessary for understanding human reasoning. Specifically,
Gigerenzer and Goldstein (1996) argue that reasoning should be understood as
consisting of “fast and frugal” heuristics which are adaptively successful, but
not normatively justified; and Evans and Over (1996) argue that much human
reasoning is “rational1,” i.e.,
successful with respect to achieving a person’s goals, but not “rational2,” i.e.,
conforming to a normative analysis. Both these viewpoints suggest that
reasoning may consistently succeed without conforming, even approximately, to
any normative standard. This seems unsatisfactory, because it leaves this
success unexplained (see Chater, Oaksford, Nakisa & Redington, ms.). By
contrast, the simplicity principle has both a normative justification, and also
is intended to describe cognitive performance14.
Learning from Experience PUT IN REFS
Learning from experience is a problem
of finding patterns in what are typically large amounts of complex and often
noisy data15. It therefore falls naturally
within the domain of application of the normative theory of finding patterns by
searching for simplicity. Moreover,
theorists have directly proposed that certain aspects of language
acquisition may proceed by finding the shortest possible encoding of the input
linguistic data. For example, Brent & Cartwright (199? CHECK) show how
morphological structure can found within isolated words, Wolff (??; see also
Atick, ??) considers how higher level structure can be found automatically in
text. Less directly, connectionist networks, perhaps the most popular
computational models of human learning (Elman et al) can be interpreted as
implementing Bayesian probabilistic inference (MacKay, 1992; Neal, 19??; see
also Chater, ??), and thus, by the connection between probability and
simplicity, as maximizing simplicity. Indeed, much recent interest in the study
of connectionist networks has focussed on directly viewing networks as
minimizing description length, and therefore as maximizing simplicity (Hinton
& Zemel, 19??; Zemel, 199? see Neural Computation). Thus, many current
psychological models of learning are compatible with the thesis that the
cognitive system maximizes simplicity.
Memory
Finally, note that the claim that the
cognitive system searches for patterns which provide the briefest encoding of
available information has a natural interpretation in terms of memory: that the
cognitive system seeks to minimize memory load. This leads to the prediction
that the richer the patterns that the cognitive system can find in a stimulus,
the better it will be remembered. This is a ubiquitous finding in all areas of
memory research, from the advantage of memory for words over non-sense strings,
to the memory for meaningful over non-meaningful pictures, to comprehensible vs
non-comprehensible stories (REFS). This viewpoint was also taken by theorists
working within an information-theoretic framework (Attneave, 1959; Garner,
1962, 1974).
It
is important to note that this account does not depend on the assumption that
the memories are stored as briefly as possible--i.e., with no redundancy.
Indeed, it has frequently been observed that this kind of storage would be
inappropriate, because it would not be robust to noise. Information theory
specifies that constructing an optimal redundant code is achieved by first
finding the simplest encoding, and then introducing redundancy so that each part
of this code is equally protected from corruption (Cover & Thomas, 1992).
Thus, for a given stimulus, finding a brief encoding will allow the
construction of a better redundant representation, which will thereby be noise
resistant and hence better remembered.
Case study 1:
Perception
Perception is, from an abstract point
of view, a process of finding patterns in sensory input. Thus, a simplicity
criterion for choosing between patterns may potentially be applied across a
wide range of aspects of perceptual analysis. For example, in low-level
perception, it has been conjectured that the compression the sensory
signal is a central goal (Atick & Redlich, 1990; Barlow, 1989; Blakemore,
199015). The goal of
compression is frequently viewed as stemming from limitations in the
information-carrying capacity of the sensory pathways. However, viewpoint
outlined here suggests a complementary interpretation. It could be that
compressed (i.e., simplest) perceptual representations will tend to involve the
extraction of features likely to have generated the sensory input (because
maximizing simplicity automatically maximizes likelihood). From this viewpoint,
perceptual inference occurs in the very earliest stages of perception
(e.g., as implemented in mechanisms such as lateral inhibition in the retina),
where neural coding serves to compress the sensory input. Thus, the search for
simplicity may operate in low-level perception.
Moreover,
the same principles might equally well be at work in high level perceptual
processing--the simplicity principle seems equally valuable in attempting to
understand the causal structure of a sequence of observed actions or events.
The key goal is to find patterns which are a reliable basis for explanation and
prediction; we have seen that following a simplicity principle is a way of
achieving goals of this kind. The simplicity principle therefore finds
potential applications in understanding perception at many scales. Which areas
of applications prove to be theoretically fruitful remains for future research--below, I discuss some areas where
the notion of simplicity has already been usefully applied.
Perceptual organization
How does the perceptual system derive a
complex and structured description of the perceptual world from sensory input?
Two apparently competing theories of perceptual organization have been
influential. The first, initiated by
Helmholtz (1910/1962), advocates the likelihood principle: that sensory
input will be organized into the most probable distal object or event consistent
with that input. The second, which has
been mentioned already, advocates what Pomerantz and Kubovy (1986) call the simplicity
principle: The perceptual system is viewed as finding the simplest, rather
than the most likely, perceptual organization consistent with the sensory
input.
Both
the likelihood and simplicity principles explain, at least at an intuitive
level, a wide range of phenomena of perceptual organization. Consider, for
example, the Gestalt law of good continuation, that perceptual interpretations
which involve continuous lines or contours are favored. The likelihood
explanation is based on the observation that continuous lines and contours are
very frequent in the environment (e.g., Brunswick, 1956). Although it is possible that the input was
generated by discontinuous lines or contours which happen, by coincidence, to
be arranged so that they are in alignment from the perspective of the viewer,
this possibility is rejected because it highly improbable. The simplicity explanation, by contrast,
suggests that continuous lines or contours are imposed on the stimulus when
they allow that stimulus to be described more simply.
Another
example is the tendency to perceptually interpret ambiguous 2D projections as
generated by 3D shapes containing only right angles (Attneave, 1972; Perkins,
1972, 1982; Shepard, 1981). The likelihood explanation is that right angled
structures are more frequent in the environment (at least in the “carpentered”
environment of the typical experimental subject (Segall, Campbell &
Herskovits, 1966)). The simplicity
explanation is that right angled structures are simpler--e.g., they have fewer
degrees of freedom--than trapezoidal structures.
There
has been considerable theoretical and empirical controversy concerning whether
likelihood or simplicity governings perceptual organization (e.g., Hatfield
& Epstein, 1985; Leeuwenberg & Boselie, 1988; Pomerantz and Kubovy,
1986; Rock, 1983). The controversy has been difficult to settle because neither
of the key principles, likelihood and simplicity, is clearly defined. Moreover,
there have been suspicions that the two principles are not in fact separate,
but are two sides of the same coin.
Pomerantz and Kubovy (1986) cite Mach (1886/1959): “The visual sense
acts therefore in conformity with the principle of economy [i.e., simplicity],
and at the same time, in conformity with the principle of probability [i.e.,
likelihood]” (p. 215), and themselves suggest that some resolution between the
two approaches might be possible--particularly in view of the fact that both
likelihood and simplicity explanations are typically appear to be available for
most phenomena in perceptual organization.
Chater
(1996) notes that the simplicity and likelihood principles are indeed, under natural
interpretations, equivalent. Specifically, if simplicity is interpreted as
length in a coding language, and likelihood is interpreted as subjective
probability, then any problem of maximizing simplicity can be reinterpreted as
a problem of maximizing likelihood17.
The
unification of the simplicity and likelihood views appears to be challenged,
however, by the apparent existence of phenomena which have been interpreted as
providing distinctive evidence for likelihood and against simplicity, or vice
versa. If the two principles are identical, empirical evidence distinguishing
between them should not be possible. Chater (1996) argues, however, that such
evidence can be interpreted from in both the simplicity and likelihood
frameworks. I briefly consider such evidence, and its interpretation.
Likelihood
is widely assumed to be favored by evidence that shows that preferred
perceptual organization is influenced by factors concerning the structure of
the everyday environment. For example,
consider 2D projections of a shaded pattern, which can be seen either as a bump
or an indentation (see, e.g., Rock, 1975).
The preferred interpretation is consistent with a light source from
above, as in natural light. Thus, the
perceptual system appears to choose the interpretation that is most likely;
but there is no intuitive difference between the simplicity of the two
interpretations. But such phenomena also have a simplicity-based explanation
can be intuitively understood as follows.
Consider the simplest description not of a single stimulus, but of a
typical sample of natural scenes. Any
regularity which is consistent across those scenes need not be encoded afresh
for each scene--rather, it can be treated as a “default”. That is, unless there is an specific additional
part of the code for a stimulus that indicates that the scene violates the
regularity (and in what way), it can be assumed that the regularity
applies. Therefore, other things being
equal, scenes which respect the regularity can be encoded more briefly than
those which do not. Moreover,
perceptual organizations of ambiguous scenes which respect the regularity will
be encoded more briefly than those which violate it. In particular, then, the perceptual organization of an ambiguous
stimulus obeying the natural regularity of illumination from above will be
briefer than the alternative organization with illumination from below. In general, preferences for likely
interpretations also give rise to preferences for simple interpretations: if
the code for perceptual stimuli and organizations is to be optimal when
considered over all (or a typical sample of) natural scenes, it will reflect
regularities across those scenes.
Simplicity
is assumed to be favored by cases of perceptual organizations which violate,
rather than conform to, environmental constraints. Leeuwenberg and Boselie
(1988) show a schematic drawing of a symmetrical two headed horse. The more likely interpretation, also
consistent with the drawing, is that there are two horses, one occluding the
other. But the perceptual system appears to reject likelihood. Instead, the
drawing is interpreted as a single, two-headed animal. But we can also provide
a likelihood explanation of this phenomenon, where likelihood applies locally
rather than globally. That is, the
perceptual system may determined the interpretation of particular parts of the
stimulus according to likelihood (e.g., the fact that there are no local depth
or boundary cues may locally suggest a continuous object). These local processes
may not always be guaranteed to arrive at the globally most likely
interpretation (see Hochberg, 1982).
Thus,
the evidence that distinguishes between the simplicity and likelihood
principles is actually compatible with both, and therefore does not challenge
the unification between them18.
Figural goodness
Some perceptual patterns are
intuitively judged to be more “regular” or “better” than others. These
intuitive judgements of “figural goodness” appear to reliably correlate with
the resistance of such patterns to noise, and the speed with which such
patterns are detected.
Hochberg
and McAllister (1953) argued for a direct connection between judgments of
figural goodness and choice of perceptual organisation. They identified figural
goodness with simplicity, and adopted the simplicity principle, as discussed
above: that perceptual organisations are chosen to maximizes simplicity19. According to
this viewpoint, the intuitive notion of goodness can be viewed as a measure of
the degree to which the percpetual system succeeds in finding a simple pattern
in the perceptual stimulus.
One
line of argument in favor of Hochberg and McAllister’s viewpoint is that there
is an interesting connection between the proposal that the simplicity principle
determines the choice of perceptual organizations, as outline above, and noise
resistance. Specifically, if the simplicity principle is right, then it
follows, as we shall see below, that simple patterns will be the most noise
resistant. Moreover, given that noise resistance is a litmus test for figural
goodness, this suggests the further implication that simple patterns will be
particularly good. Thus, the simplicity principle in perceptual organization
appears to imply that simplicity also governs goodness, as Hochberg and
McAllister propose.
The crucial step in the argument
above is that which shows that, if the simplicity principle is correct, the
simplicity of a pattern correlates directly with its resistance to noise. The
intuitive idea is that the noise resistance of a pattern depends on a
comparison between a “null” organization, in which the pattern is not imposed
and the stimulus is viewed purely as noise, and a “pattern + noise”
interpretation, in which the stimulus is viewed arising from a pattern which
has been corrupted by noise. According to the simplicity principle, the simpler
of the two interpretations will be perceived: that is, the pattern will be
perceived so long as the “pattern +
noise” interpretation is shorter than the “null” interpretation. This implies
that very simple patterns (with codes) will be the most noise resistant,
because more noise can be added, and the pattern + noise interpretation will
still be the shortest. If we assume noise resistance to be a litmus test for
figural goodness, this means that simple patterns will have the greatest
goodness. According, the simplicity principle as a principle concerning choice
between alternative perceptual organizations implies that simplicity determines
figural goodness. This provides an argument for Hochberg and McAllister’s
(1953) view that simplicity governs not only choice of perceptual organization
but also figural goodness.
Randomness
If simplicity
determines judgements of “goodness” or “regularity,” then this suggests that
complexity might determine judgements of “randomness” or “irregularity.” That
is, perhaps judgments of randomness can be viewed as the inverse of
goodness judgments (see, e.g., Alberoni, 1962). If perceived goodness is
determined by the degree to which the cognitive system succeeds in finding
structure in the stimulus, then this suggests that perceived randomness may be
determined by the degree to which the cognitive system fails to find
such structure. Interestingly, Falk and Konold (in press) have recently
provided support for this view. They give a persuasive theoretical analysis as
well as empirical confirmation of the suggestion that subjective judgments of
the randomness of a stimulus are inversely related to the success of their
attempts to find a brief code for that stimulus (see also, Kahneman &
Tversky, 1972) (Indeed, Falk and Konold’s (in press) analysis proposes an
algorithmic definition of randomness drawn from Kolmogorov complexity theory
(Li & Vitányi, 1993), thus using the same tools as the current
analysis of simplicity at a technical level.). This is a straightforward
inversion of the SL account of goodness: A stimulus is perceived as random to
the extent that no simple/likely organisation can be found for it. Thus,
the SL approach promises to unify the literature on goodness with that on
judgments of randomness (e.g., Bar-Hillel & Wagenaar, 1991; Budescu, 1987;
Lopes & Oden, 1987).
Case study 2:
Similarity
Consider the problem of finding
patterns in a stimulus consisting of two distinct objects. Each object may
contain internal patterns; but in addition, there may be patterns which
interrelate the two objects. For example, a short description of the stimulus
shown in Figure 3(a) would exploit the common patterns between the left and
right object; specifically by noting that one is the mirror image of the other
in a vertical axis of symmetry. The pattern interrelating the two parts of the
stimulus is very strong; once one half of the stimulus is described, the other
can be generated very simply, by describing the axis of symmetry. Figure 3(b)
shows a pair of objects which share somewhat less structure--specifying one in
terms of the other requires a reflection, and the interchange of black and
white. Figure 3(c) shows a case where there is less structure still; to specify
one object in terms of the other requires an additional translation of the
inner figure.
INSERT FIGURE 3 ABOUT HERE
Suppose that we ask: how similar are
the pairs of objects in Figure 3? Intuitively, similarity appears to decrease
from (a) to (c). Thus, the more shared patterns between two stimuli, and
therefore the simply one can be specified in terms of the other, the more
similar they are. Generalizing this observation leads to the proposal that the
judged similarity between two objects depends on the simplicity of the transformation
from the representation of one object to the representation of the other.
Ulrike Hahn and I (Chater & Hahn, 1996, in press) have called this the representational
distortion theory of similarity--the simpler the transformation between the
representations of a pair of objects, the more similar those objects are
assumed to be. In terms of Kolmogorov complexity, representational distortion
is expressed in terms the conditional Kolmogorov complexity, K(y|x),
introduced above--the length of the shortest program that transforms x
into y21.
Representational
distortion provides an interesting generalization of current psychological
theories of similarity. The two leading accounts, the geometric and featural
views, also treat similarity as a relation between mental representations. But
whereas representational distortion applies to any kind of representation, and
allows arbitrary computable transformations between them, these theories are
committed to very specific types of representations, and very specific
relations between them.
The
geometric view (Shepard, 1987) assumes
that objects are represented as points in an internal space. The similarity
between two objects is inversely related to the distance between their representations
in this space. By contrast, the set-theoretic view (Tversky, 1977) assumes that objects are represented as sets of
features. The similarity between two objects depends on the amount of overlap
between their sets of features. The representational limitations of both
accounts are severe. It does not seem possible to represent perceptual
organizations, parsed sentences, schemas for world knowledge, or sequences of
motor commands either as points in an internal space, or as sets of features. Rather,
they appear to require structured
representations which are able to capture relations between parts and
wholes and capture systems of relations between parts (Chomsky, 1965; Fodor,
1975; Fodor & Pylyshyn, 1988; Marr, 1982; Minsky, 1977). In short, structured representations
appear to be required to represent almost all cognitively significant stimuli;
and judgements of similarity between such stimuli thereby fall outside the
scope of both geometric and set-theoretic accounts of similarity.
I stress that representational
distortion, like the geometric and set-theoretic views, is defined over mental
representations of objects--not over
the objects themselves. To see why this is crucial, consider the
psychological similarity of two unrelated bursts of white noise. At an acoustic
level of description, where the bursts are considered as amplitudes varying
over time, a very long set of instructions would be required to transform one
of these bursts into the other. But the two noises may, nonetheless, be judged
to be similar, even to the point that the auditory system cannot distinguish
the two. According to this account, this is because the mental representation
of the two bursts does not include minute detail of each aspect of the noise.
Instead, they are concerned with a more general description, perhaps concerning
the duration, loudness, location and so on of the burst. These properties may be largely or
completely matched between stimuli, so that the mental representations of the
two sounds are identical, or differ only slightly, and hence the
representational distortion between them is small.
I stress also that the representational
distortion found by the cognitive system will not correspond exactly to
information distance. Discovering a short transformation between one
representation and another may require arbitrary amounts of computation. For
example, the sequences 1 5 3 7 2 3 9 0 6 and
3 0 7 4 4 7 8 1 2 are very simply related—if they are interpreted as
base 10 numbers, the second is double the first. Hence the representational
distortion between the two sequences is small; however, the cognitive system
may not find this short transformation; and the similarity between the two
representations may be judged to be low. We assume therefore only that the
cognitive system can approximate representational distortion to some degree22.
Geometric and set-theoretic theories
are special cases of representational distortion
I now note that geometric and
set-theoretic models can be seen as special cases of representational
distortion. The mathematical details have been omitted for brevity (see Chater
& Hahn, in preparation).
The Spatial Account
Representations are limited to vectors
of numbers. Transformations are limited to sequence of “nudges” of unit length
(this length can be thought of as a limit of resolution in the space) and a
“program” consists of a sequence of such nudges. If nudges can be in any
direction, then the simplest transformation between two points is given by the
distance of the straight line path between the points (this is the length of
the “program” of concatenated nudges, ignoring the cost of specifying the direction
of a nudge). This gives the Euclidean version of the spatial model.
Restrictions to nudge direction to the axes gives a city-block version;
allowing non-orthogonal axes derives the general Euclidean scaling model (Ashby
& Townsend, 1986).
The Set-theoretic Account
Representations are limited to sets of
features. Transformations are limited to the deletion and addition of features
one by one. Thus a program consists of a sequence of deletions and additions.
Assuming differential length for deletion and addition (specifically, deletion
has the shorter code, because additions require specifying what is to be
added), program length is then determined a weighted sum of the number of
features that object A has and object B does not (which must be deleted) and
that B has but A does not (which must be added). The length of this program is
a close variant of Tversky’s (1977) theory of similarity.
Properties of the representational
distortion theory of similarity
We now briefly consider some basic
properties of representational distortion that imply that it is a promising
starting point for a psychological theory of similarity.
Flexibility
The fact that similarity is defined
over general representations takes account of the great flexibility of human
similarity judgements (e.g., Medin, Goldstone & Gentner, 1993), because
similarity is defined over representations of objects, and the goals and
knowledge of the subject may affect the representations which are formed. As with the set-theoretic models (Tversky,
1977), this flexibility has both advantages, in terms of accounting for the
flexibility of people's similarity judgements, and disadvantages, from the
point of view of deriving testable empirical predictions.
Similarity and identity
According to representational
distortion, any object is more similar to itself than to any other object. This
is because the shortest possible program is the “empty” program, which,
clearly, leaves any representation unchanged (in symbols, K(x|x) = 0). Thus,
the representational distortion viewpoint automatically captures the
fundamental intuition that identity is the most extreme form of similarity. This
property of representational distortion seems attractive, but it appears to run
counter to data obtained by Tversky (1977) which appears to show that the
similarity between distinct objects can sometimes exceed the similarity between
identitical objects. However, the interpretation of this data is sufficiently
controversial (e.g., Nososkfy, ???) that it may be too early to take the
drastic conceptual step of rejecting the intuition that identity is the most
extreme form of similarity.
Asymmetry
Representational distortion allows for
asymmetry in similarity judgements: K(x|y) is not in
general equal to K(y|x).
This asymmetry is particularly apparent when the representations being
transformed differ substantially in complexity. Suppose that a subject knows a
reasonable amount about China, but rather little about Korea, except that it is
``rather like'' China in certain ways.
Then transforming the representation of China into the representation of
Korea will require a reasonably short program (which simply deletes large
amounts of information concerning China which is not relevant to Korea), while
the program transforming in the reverse direction will be complex, since the
minimal information known about Korea will be almost no help in constructing the
complex representation of China. Thus,
we would predict that K(China|Korea) should be greater than
K(Korea|China). This is observed
experimentally (Tversky, 1977).
Background knowledge
Similarity judgements are influenced by
background knowledge. For example, if the arabic number system is part of your
background knowledge, then you may perceive similarities between otherwise
dissimilar patterns (i.e., dissimilar as mere patterns of dots), because
numerical transformations will be available. It is difficult for any theory of
similarity to explain the role of background knowledge. In the spatial view,
the natural role of knowledge is in specifying the dimensions of the space in
which the comparison takes place, as well as assigning weights, which determine
the relative importance of each dimension (effectively by stretching or
squashing the space along the relevant dimension). In the set-theoretic view,
background knowledge can play a role in determining the features that are taken
into account in the comparison. In both cases, the role of knowledge is to
afffect the representations that are the input to the comparison
process. Similarly, background knowledge may affect the representations which
are compared, according to the view that similarity is representational
distortion. But, moreover, the representational distortion account allows an
additional way in which background can affect similarity comparisons: by
assuming that background knowledge forms an additional input to the program
which must transform one object into another. Thus, background knowledge
affects what operations are available in transforming one representation into
another--for example, a knowledge of the number system might suggest all manner
of numerical transformations which might relate two numbers (e.g., having the
concept of a prime might increase the degree to which people judge 43 and 47 to
be similar--partly because one can be generated from the by the instruction
“next prime” or “previous prime”). People with different mathematical knowledge
might thereby have different judgements about which numbers are similar. More
drastically, people who use different notions will thereby have dramatically
different judgements concerning the similarities between patterns corresponding
to formulae expressed in various notations.
Thus,
representational distortion provides a rich framework for understanding how
background knowledge influences similarity judgements--knowledge can readily
influence the nature of the similarity comparison itself, as well as changing
the representations that are inputs to the comparison (see Chater & Hahn,
in press, for discussion of related issues). It remains for future work to
determine to what extent this account can capture in detail the way in which
people’s similarity judgements are influenced by their background knowledge.
Summary
To sum up, the simplicity approach to
similarity arises as follows. If the cognitive system searches for the simplest
interpretation of the information available, then it will aim to exploit
regularities between different representations. The strength of the shared
regularities between two objects can be measured by the saving the conditional
Kolmogorov complexity between them: call this the “representational distortion”
between. Representational distortion can be viewed as a generalization of the
two standard psychological models of similarity: the spatial and
set-theoretical models. Moreover, it has a number of intuitively attractive
properties. An interesting project for future research is to attempt to develop
this theoretical account in more detail, and to provide experimental tests for
this approach (see Chater & Hahn, 1996, in preparation).
Scope and limits
I have proposed that the search for
simplicity is a fundamental principle of cognition. I have argued the principle
has potentially broad application. In this section, I list a number of
important limitations for this approach.
PUT IN REFERENCES FOR THIS SECTION
Representation
If simplicity is defined in terms of
brevity in a coding language, then simplicity will depend crucially on that
representation language--thus obtaining detailed psychological predictions from
the simplicity principle requires making specific assumptions about mental
representation. I noted above that Kolmogorov complexity theory is able to
abstract away from the specific coding language being used, because code
lengths in any two languages are equal up to a constant--but this constant may
be large in relation to the amount of data available in specific psychological
applications.
The
viability of simplicity-based accounts of cognition can be assessed by
constraining the coding language to determine simplicity by independent
theoretical and empirical evidence concerning the relevant aspect of mental
representation. Leeuwenberg and colleagues (Buffart, Leeuwenberg & Restle,
1981; Leeuwenberg, 1969, 1971) have pursued this approach in assessing the
viability of their coding language for certain classes perceptual stimuli,
structural information theory (e.g., van der Helm, van Lier & Leeuwenberg,
1992; van Lier, van der Helm & Leeuwenberg, 1994a, 1994b). Similar programs
of research may be possible with respect to other applications of the
simplicity principle.
Search PUT IN REFS
I have argued that the simplicity is
the criterion with which the cognitive system chooses between alternative
patterns that may be imposed on the environment. But I have also noted that the
cognitive system cannot, in general, maximize simplicity--in general, finding
the shortest code for a set of data is unncomputable; and even restricted
versions of the problem are generally combinatorially explosive. This means
that the sub-optimal solutions found by the cognitive system will depend on the
nature of the search process. The extent to which the search is conducted
serially or in parallel, whether it must be represented discretely, or can
embedded into a continuous search problem (Durbin & Willshaw, ??;
Smolensky, ??), whether it uses some version of gradient descent (REF?), stochastic
gradient descent (G&G; Hinton & Sejnowski??; Kirkpatrick??), techniques
such as simulated annealing (Geman & Geman, ??), and so on, are crucial,
and unresolved, issues. (CHECK MDL TRICKS) RELAXATION SEARCHES.
Speed PUT IN REFS
The discussion so far has also ignored
cognitive limitations concerning speed of processing. In perception, for
example, many complex patterns are found within fractions of a second (e.g.,
Hadyn fast faces; M-W shadowing). In view of the slowness of neural hardware,
this suggests that the search process must take only a small number of
steps--this places very strong constraints on the nature of the search process
(Feldman & Ballard, 1982?; R&McC??; Chater & Oaksford, 1990).
Speed
also plays an important role in another way: that the representations used by
the cognitive system must not only be brief, but must easy to use quickly. In
some contexts, there may be a trade-off between brevity and speed. Consider an
example from computer science: arithmetical operations may be rapidly computed
by consulting a large look-up table in which the answers to particular
arithmetic operations are prestored (particularly if this table can be searched
in parallel); by contrast, a more compact representation of arithmetic, for
example, in terms of axioms in some logical language, may be much briefer, but
require much more computation to use. This tension between cognitive goals of
speed and brevity may also be important in psychological contexts24.
Innate
constraints
PUT IN REFS
The simplicity viewpoint outlined here
may appear to be tied to a strong empiricist view of cognitive development. The
emphasis has been on the criteria that the cognitive system can use to find
patterns; this assumes that the patterns have to be found from experience,
rather than being innately specified. Nonetheless, the simplicity viewpoint is
equally compatible with empiricist and nativist viewpoints. Even strong
nativists require that the cognitive system searches for patterns--but they
claim that this search is subject to strong innate constraints. For example,
strong nativist viewpoints regarding language acquisition typically involve the
claim that the child can entertain only a restricted set of grammars (e.g.,
Chomsky, 1980; Pinker, ??).But the problem of finding the correct grammar most
compatible with linguistic experience is still immensely difficult (e.g., see
Redington, Chater & Finch, in press, for discussion)--and this pattern
finding problem may still be guided by simplicity. The restriction to a small set
of grammars amounts to having constrained internal representation in terms of
which linguistic hypotheses can be stated. But the simplicity principle may
nonetheless apply: the grammar chosen may be that which provides the briefest
encoding of linguistic input (see, e.g., Brent, ??; Grünwald, ??; Wolff,
??). More generally, the simplicity principle applies to problems of finding
patterns in the world from experience, whether or not finding such patterns is
guided by innate constraints.
The importance
of interests
I have so far considered the cognitive
system as engaged in a disinterested search for patterns. This leaves
out the fact that the some patterns are relevant and others irrelevant to the
interests of the agent. Clearly, people are more concerned in each other’s
faces than with patterns of shadow; and they are more concerned with each
other’s voices than with the sounds of footsteps or distant traffic. Faces and
voices are interesting not merely because they contain rich patterns, but
because they are of fundamental importance in relating to other people, and
thereby are relevant to achieving almost any goal a person may have. Equally,
it seems plausible that the perception of the physical world is to some degree
geared towards the detection of affordances (Gibson, 1979)--properties
of objects that are potentially relevant to the actions of the agent (e.g.,
whether an object can be eaten, lifted, thrown, and so on).
The
role of interests is beyond the scope of the simplicity principles, but compatible
with it. Interests affect how much cognitive effort is directed towards finding
different kinds of patterns; but the pattern finding process may, nonetheless,
proceed without reference to interests, and may be guided only by simplicity.
Scientific research provides an appropriate analogy--various practical
interests may determine the level of resources devoted to different areas of
research, but the research itself should use disinterested scientific criteria,
without reference those interests. Indeed, it is generally assumed that
interests must not be allowed to influence scientific research directly
(e.g., the conclusions reached should be based purely on evidence, rather
choosing conclusions on the basis of political or social convenience), for scientific
research to be valuable to society. Similarly, I assume that a separation
between interests and the criteria for finding patterns is cognitively
desirable, and that the remarkable success of the cognitive system indicates
that, to a large degree at least, this separation is respected25.
Conclusions
Many cognitive processes find patterns
in experience--from perceptual processing to scientific thinking. I suggest
that the cognitive system searches for the patterns according to simplicity
--where simple patterns are those which allow a brief specification of the
available data. This is normatively justified as providing a sounds basis for
prediction and explanation; and provides an attractive framework for
descriptive psychological theories in a range of cognitive domains. I propose
that it is worth exploring further the hypothesis that the search for
simplicity is a fundamental cognitive principle.
References
Alberoni,
F. (1962). Contribution to the study of subjective probability: I. Journal
of General Psychology, 66, 241-264.
Almor, A.
& Sloman, S. A. (1996). Is deontic reasoning special? Psychological
Review. 103, 374-380.
Anderson, J.
R. (1990). The adaptive character of thought, Hillsdale, N.J.: Lawrence
Erlbaum Associates.
Anderson,
J. R. (1991a). Is human cognition adaptive? Behavioral and Brain Sciences,
14, 471-517.
Anderson,
J. R. (1991b). The adaptive nature of human categorization. Psychological
Review, 98, 409-429.
Anderson,
J. R. & Schooler, L. J. (1991). Reflections of the environment in memory. Psychological
Science, 1, 396-408.
Ashby,
F. G. & Townsend, ??? (1986) ON SIMILARITY
Attneave,
F. (1959). Applications of information theory to psychology. New York:
Holt, Rinehart & Winston.
Attneave,
F. (1972). Representation of physical space. In A. W. Melton & E. J. Martin
(Eds.), Coding processes in human memory (pp. 283-306). Washington,
D.C.: Winston.
Attneave,
F. & Frost, R. (1969). The determination of perceived tridimensional
orientation by minimum criteria. Perception & Psychophysics, 6,
391-396.
Bar-Hillel,
M., & Wagenaar, W. A. (1991). The perception of randomness. Advances in
Applied Mathematics, 12, 428-454.
Berger,
J. O. (1985). Statistical decision theory and Bayesian analysis. New
York: Springer-Verlag.
Brunswick,
E. (1956). Perception and the representative design of psychological
experiments. Berkeley, CA: University of California Press.
Budescu,
D. V. (1987). A Markov model for generation of random binary sequences. Journal
of Experimental Psychology: Human Perception and Performance, 13,
25-39.
Buffart,
H., Leeuwenberg, E. & Restle, F. (1981). Coding theory of visual pattern
completion. Journal of Experimental Psychology: Human Perception and
Performance, 7, 241-274.
Chaitin,
G. J. (1966). On the length of programs for computing finite binary sequences. Journal
of the Association for Computing Machinery, 13, 547-569.
Chater,
N. & Oaksford, M. (1990). Autonomy, implementation and cognitive
architecture: A reply to Fodor and Pylyshyn. Cognition, 34,
93-107.
Chater,
N. & Oaksford, M. (1993). Logicism, mental models and everyday reasoning:
Reply to Garnham, Mind & Language, 8, 72-89.
Chater,
N. (1996). Reconciling simplicity and likelihood principles in perceptual
organisation. Psychological Review, 103, 566-581.
Chater,
N. (in press). Simplicity and the mind. The Psychologist.
Chater,
N. (submitted). Perceptual Organisation and Figural Goodness: One Principle or
Two?
Chater,
N., Crocker, M., & Pickering, M. (in press). The Rational Analysis of
Inquiry: The Case of Parsing. In M. Oaksford & N. Chater (Eds.) Rational
Models of Cognition. Oxford: Oxford University Press.
Chater,
N. & Hahn, U. (1996). ???. Proceedings of the ???
Chater,
N. & Hahn, U. (in press).???. In K. Lamberts & D. Shanks (Eds.) ???
Chater,
N. & Hahn, U. (in preparation). Representational distortion as a theory of
similarity.
Chater,
N. & Oaksford, M. (submitted). Rational analysis and heuristic processes
for syllogistic reasoning.
Chater,
N., Oaksford, M., Nakisa, R., & Redington, M. (submitted) Fast, frugal and
rational: Rational analysis and cognitive algorithms in human reasoning.
Cheeseman, P. (1995). On Bayesian model selection. In
Wolpert, D. (Ed.), The mathematics of generalization (pp. 315-330).
Redwood City, CA: Addison-Wesley.
Chomsky, N. (1965). Aspects of the theory of syntax.
Cambridge: MIT Press.
Cover,
T. M. & Thomas, J. A. (1992). Elements of information theory. New
York: John Wiley.
Dinnerstein,
D. & Wertheimer, M. (1957). Some determinants of phenomenal overlapping. American
Journal of Psychology, 70, 21-37.
Evans, J. St.
B. T., & Over, D. E. (1996). Rationality in the selection task: Epistemic
utility versus uncertainty reduction. Psychological Review, 103,
356-363.
EVANS
AND OVER 96? BOOK
Falk,
R., & Konold, C. (in press). Making sense of randomness: Implicit encoding
as a basis for judgment. Psychological Review.
Fodor,
J. A. (1983). Modularity of mind. Cambridge, MA: MIT Press.
Fodor,
J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture:
A critical analysis. Cognition, 28, 3-71.
Garner,
W. R. (1962). Uncertainty and structure as psychological concepts. New
York: John Wiley.
Garner,
W. R. (1974). The processing of information and structure. Potomac, Md:
LEA.
Gibson, J. J. (1979). The ecological approach to visual
perception. Boston:
Houghton-Mifflin.
Gigerenzer,
G. & Goldstein, D. (1996). Reasoning the fast and frugal way: Models of
bounded rationality. Psychological Review, 103, 650-669.
Gigerenzer,
G. & Hell, W. & Blank, H. (1988). Presentation and content: The use of
base-rates as a continuous variable. Journal of Experimental Psychology:
Human Perception and Performance, 14, 513-525.
Gigerenzer,
G. & Murray, D. J. (1987). Cognition as intuitive statistics.
Hillsdale, NJ: Erlbaum.
Glass,
L. (1969). Moiré effect from random dots. Nature, 223,
578-580.
Glass,
L., & Peréz, R. (1973). Perception of random dot interference
patterns. Nature, 246, 360-362.
Goa,
Q., Li, M., & Vitányi, P. (1989). Learning on-line handwritten
characters. In 11th International Joint Conference on Artificial
Intelligence, pp. 843-848, San Mateo, CA: Morgan Kaufman.
Harman,
G. (1965). The inference to the best explanation. Philosophical Review, 74
,88-95.
Hatfield,
G. & Epstein, W. (1985). The status of the minimum principle in the
theoretical analysis of visual perception. Psychological Bulletin, 97,
155-186.
Helm,
P. A. van der, & Leeuwenberg, E. L. J. (1986). Avoiding explosive search in
automatic selection of simplest pattern codes. Pattern Recognition, 19,
181-191.
Helm,
P. A. van der, & Leeuwenberg, E. L. J. (1996). Goodness of visual
regularities: A non-transformational approach. Psychological Review, 103,
429-456.
Helm,
P. A. van der, & Leeuwenberg, E. L. J. (1997?).
Helm,
P. A. van der, Lier, R. van & Leeuwenberg, E. L. J. (1992). Serial pattern
complexity: irregularity and hierarchy. Perception, 21, 517-544.
Helmholtz,
H. von (1910/1962). Treatise on physiological optics. (Vol. 3) (J. P.
Southall, Ed. and translation), New York: Dover.
Hochberg,
J. (1982). How big is a stumulus? In J. Beck (Ed.), Organization and
representation in perception. Hillsdale, NJ: LEA, pp. 191-218.
Hochberg,
J. & McAlister, E. (1953). A
quantitative approach to figure “goodness.” Journal of Experimental
Psychology, 46, 361-364.
Hume,
D. (1965). A treatise on human nature. L. A Selby-Bigge (Ed.), Oxford:
Clarendon Press (Original work published1739-1740).
Johansson,
G. (1950). Configurations in event perception. Stockholm: Almqvist &
Wiksell.
Kahneman,
D., Slovic, P. & Tversky, A. (Eds.). (1982). Judgment under uncertainty:
Heuristics and biases. Cambridge: Cambridge University Press.
Kahneman,
D. & Tversky, A. (1973). On the psychology of prediction. Psychological
Review, 80, 237--251.
Kanizsa,
G. & Gerbino, W. (1982). Amodal completion: Seeing or thinking? In J. Beck
(Ed.) Organization in representation and perception (pp. 167-190).
Hillsdale, NJ: Erlbaum.
Koffka,
K. (1962). Principles of Gestalt psychology (5th ed.). London: Routledge
and Kegan Paul. (Original work published in 1935).
Kolmogorov,
A. N. (1965). Three approaches to the quantitative definition of information. Problems
in Information Transmission, 1, 1-7.
Laming,
D. (1996). On the analysis of irrational data selection: A critique of Oaksford
and Chater (1994). Psychological Review, 103, 364--373.
Leeuwenberg,
E. (1969). Quantitative specification of information in sequential patterns. Psychological
Review, 76, 216-220.
Leeuwenberg,
E. (1971). A perceptual coding language for perceptual and auditory patterns. American
Journal of Psychology, 84, 307-349.
Leeuwenberg, E. & Boselie, F. (1988). Against the likelihood principle in visual
form perception. Psychological Review, 95, 485-491.
Li,
M., & Vitányi, P. (1993). An introduction to Kolmogorov
complexity and its applications. New York: Springer-Verlag.
Li,
M. & Vitányi, P. (1995). Computational machine learning in theory
and praxis. In J. van Leeuwen (Ed.) Computer Science Today (pp.
518-535). Heidelberg: Springer-Verlag.
Li,
M. & Vitányi, P. (1997). On prediction by data compression. ms.
Lier,
R. van, Helm, P. A. van der, & Leeuwenberg, E. L. J. (1994a). Integrating
global and local aspects of visual occlusion, Perception, 23,
883-903.
Lier,
R. van, Helm, P. A. van der, & Leeuwenberg, E. L. J. (1994b). Competing
global and local aspects of visual occlusion. Journal of Experimental
Psychology: Human Perception and Performance, 21, 571-583.
Lopes,
L. L., & Oden, G. C. (1987). Distinguishing between random and nonrandom
events. Journal of Experimental Psychology: Learning, Memory, and Cognition,
13, 392-400.
Mach,
E. (1959). The analysis of sensations and the relation of the physical to
the psychical. New York: Dover Publications. (Original work published
1886).
Mach,
E. (1960). The science of mechanics. La Salle, IL: Open Court (Original
work published 1883).
Marr,
D. (1982). Vision. New York: Freeman.
Medin,
D. L., Goldstone, R. & Gentner, D. (1993). Respects for similarity. Psychological
Review, 100, 254-278.
Minsky,
M. (1977). Frame system theory. In P. N. Johnson-Laird, & P. C. Wason
(eds.), Thinking: Readings in cognitive science, (pp. 355-376).
Cambridge: Cambridge University Press.
Mumford,
D. (1992). Pattern theory: A unifying perspective. In Joseph, A., Mignot, F.,
Murat, F., Prum, B. & Rentschler, R. (Eds.). Proceedings of the First
European Congress of Mathematics (pp. 187-224). Basel: Birkhäuser
Verlag.
Oaksford, M.,
& Chater, N. (1991). Against logicist cognitive science. Mind &
Language, 6, 1-38.
Oaksford, M.,
& Chater, N. (1992). Bounded rationality in taking risks and drawing
inferences. Theory & Psychology, 2, 225-230.
Oaksford, M., & Chater, N. (1993). Reasoning theories and bounded rationality. In K. I. Manktelow, & D. E. Over (Eds.), Rationality, (pp. 31-60). London: Routledge.
Oaksford,
M., & Chater, N. (1994). A rational analysis of the selection task as
optimal data selection. Psychological Review, 101, 608-631.
Oaksford,
M., & Chater, N. (1995a). Information gain explains relevance which
explains the selection task. Cognition, 57, 97--108.
Oaksford,
M., & Chater, N. (1995b). Theories of reasoning and the computational
explanation of everyday inference. Thinking and Reasoning, 1,
121-152.
Oaksford,
M., & Chater, N. (1996). Rational explanation of the selection task. Psychological
Review, 103, 581--591.
Oaksford,
M., Chater, N., Grainger, R. & Larkin, J. (in press). Optimal data
selection in the reduced array selection task (RAST). Journal of
Experimental Psychology: Learning, Memory and Cognition.
Oaksford,
M., & Chater, N. (in press) (Eds.). Rational models of cognition
CHECK. Oxford: Oxford University Press.
Oaksford,
M., & Chater, N. (in preparation?). ??CONDITIONAL INFERENCE
Paris,
J. (1992) The uncertain reasoner’s companion. Cambridge: Cambridge
University Press.
Pearl,
J. (1988). Probabilistic reasoning in intelligent systems. San Mateo,
CA: Morgan Kaufman.
Perkins,
D. N. (1972). Visual discrimination between rectangular and nonrectangular
parallelepipeds. Perception and Psychophysics, 12, 396-400.
Perkins,
D. N. (1982). The perceiver as organizer and geometer. In J. Beck (Ed.), Organization
and representation in perception (73-93). Hillsdale, NJ: LEA.
Pomerantz,
J. R. & Kubovy, M. (1986). Theoretical approaches to perceptual
organization: Simplicity and likelihood principles. In: K. R. Boff, L. Kaufman
& J. P. Thomas (Eds.), Handbook of perception and human performance,
Volume II: Cognitive processes and performance. (pp. 36:1-45) New York:
Wiley.
Popper,
K. (1959). The logic of scientific discovery. New York, Basic Books (1st
edition, Logik der Forschung, 1934).
Pylyshyn,
Z. W. (1984). Computation and cognition. Cambridge, MA: MIT Press.
Quinlan,
J. & Rivest, R. (1989). Inferring decision trees using the minimum
description length principle. Information and computation, 80,
227-248.
Restle,
F. (1970). Theory of serial pattern learning: Structural trees. Psychological
Review, 77, 481-495.
Restle,
F. (1979). Coding theory of the perception of motion configurations. Psychological
Review, 86, 1-24.
Rissanen,
J. (1987). Stochastic complexity. Journal of the Royal Statistical Society,
Series B, 49, 223-239.
Rissanen,
J. (1989). Stochastic complexity and statistical inquiry. Singapore:
World Scientific.
Rock,
I. (1975). An introduction to perception. New York: Macmillan.
Rock,
I. (1983). The logic of perception. Cambridge, MA: MIT Press.
Segall,
M. H., Campbell, D. T. & Herskovits, M. J. (1966). The influence of
culture on visual perception.
Indianapolis, Ind.: Bobbs-Merrill.
Shannon, C. E. (1948). The mathematical theory of
communication. Bell System Technical Journal, 27, 379-423,
623-656.
Shepard,
R. N. (1981). Psychophysical complementarity. In M. Kubovy & J. R.
Pomerantz (Eds.), Perceptual organization (pp. 279-342). Hillsdale, NJ:
LEA. Shepard, R. N. (1987). Toward a universal law of generalization for
psychological science. Science, 237, 1317-1323.
Simon,
H. A. (1972). Complexity and the representation of patterned sequences of
symbols. Psychological Review, 79, 369-382.
Simon,
H. A. & Kotovsky, K. (1963). Human acquisition of concepts for sequential
patterns. Psychological Review, 70, 534-546.
Sober,
E. (1975). Simplicity. Oxford: Clarendon Press.
Solomonoff,
R. J. (1964). A formal theory of inductive inference, Parts 1 and 2. Information
and Control, 7, 1-22, 224-254.
Tversky,
A. (1977). Features of similarity. Psychological Review, 84,
327-352.
Vapnik,
V. (1995). The nature of statistical learning theory. New York:
Springer.
Vitányi,
P. & Li, M. (1996). Minimum description length induction, Bayesianism and
Kolmogorov complexity. Manuscript.
Vitz,
P. C. & Todd, T. C. (1969). A coded element of the perceptual processing of
sequential stimuli. Psychological Review, 76, 433-449.
Wallace,
C. S. & Boulton, D. M. (1968). An information measure for classification. Computing
Journal, 11, 185-195.
Wallace,
C. S. & Freeman, P. R. (1987). Estimation and inference by compact coding. Journal
of the Royal Statistical Society, Series B, 49, 240-251.
Wason, P. C. (1966). Reasoning. In B. Foss (ed.), New
horizons in psychology (pp. 135-151), Harmondsworth, Middlesex: Penguin.
Wason,
P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental
Psychology, 20, 273-281.
Zvonkin,
A. K. & Levin, L. A. (1970). The complexity of finite objects and the
development of the concepts of information and randomness by means of the
theory of algorithms. Russian Mathematical Surveys, 25, 83-124.
Notes
Author’s Note
I would like to thank Mark Ellison,
Steven Finch, Ulrike Hahn, Peter van der Helm, Emmanuel Leeuwenberg, James
McClelland, Mike Oaksford, Martin Pickering, Emmanuel Pothos, Martin Redington,
Jerry Seligman, Julian Smith, Paul Vitányi and Johan Wagemans for
valuable discussions of these ideas at various stages in their development. An
brief and informal outline of some of the material here is given in an article
for The Psychologist (Chater, in press).
Footnotes
1.
The original formulation of “Occam’s razor” is that explanations postulating
the smallest number of entities should be prefered; thus it embodies a specific
measure of simplicity in terms of number of objects involved in an explanation.
Occam’s razor has since been interpreted more broadly as expressing a
preference for simple explanations.
2.
Indeed, without some implicit adherence to a simplicity principle, classical
statistical approaches to modelling data would be incoherent, because
increasing the generality of a model (e.g., switching from a straight line to a
cubic) can only improve the fit with the data.
3.
Van Lier, van der Helm & Leeuwenberg (1994b) consider cases where these
principles conflict, and a simplicity-based approach to resolving them.
4.
Popper allows that different hypotheses might be differentially favored for
investigation on the basis regarding their falsifiabilty--but this does not
bear on the examples given here, because they are all equally specific, and
therefore equally falsifiable.
5.
Some approaches have explicit defined a coding languages (e.g., Buffart,
Leeuwenberg & Restle, 1981; Leeuwenberg, 1969); others have measured code
lengths by appeal to probability and information theory (e.g., Attneave, 1959;
Garner, 1962).
6.
In practice, as Simon (1972) notes, the second problem is often not severe in
practice, because description lengths in different proposed description
languages tend to be highly correlated.
7. I follow the analysis in
Vitányi & Li (1996) in this subsection.
8.
This identification between probability and Kolmogorov complexity may seem
bizarre. After all, no constraints have been placed on the probability values,
so how can they be approximated by any function? This concern is legitimate--a
fundamental assumption that the all probability measures are computable.
Assuming the Church-Turing hypothesis, these distributions are, in any case,
the only ones that the cognitive system can entertain, so this is a mild
restriction in this context. It then turns
out that, remarkably, that there exists a “universal” probability
distribution, m(x), which,
within a multiplicate constant, assigns at least as high a probability to each
object, x, as does any computable probability distribution, P(x).
Moreover, it turns out that -log2m(x) = K(x), to within an additive constant. This
justifies the step from probabilities to code lengths in moving from equation
(2) to equation (3). The underlying matematical theory is outlined in Li &
Vitányi, 1993, chapter 4, and was pioneered by Zvonkin & Levin
(1970).
9.
There are “pathological” cases where this procedure does not minimize (8), but
these can be ignored in practice.
10.
Other decision rules, such as choosing the action for which the worst possible
outcome is least bad.
11.
Indeed, this expansion appears to “pass” all known statistical tests for
randomness (Li & Vitányi, 1993)--so even intensive statistical
analysis would also fail to reveal any structure.
12.
I thank Johan Wagemans (personal communication) for stressing the importance of
this point.
13.
Although neither the simplicity, nor the likelihood principle are currently
explicitly advocated in the context of human language processing, many proposals
in this area can be viewed in these terms, as Martin Pickering (personal
communication) has pointed out. For example, Frazier ??? can be viewed as
arguing that the parser prefers structures which are syntactically simple; and
Tanenhaus and colleagues (??) can be viewed as arguing for a likelihood
approach to parsing, which probabilistic information is drawn from a wide range
of sources.
14.
Note also that in artificial intelligence there has been an increasing tendency
to employ probabilistic methods (e.g., Pearl, 1998; Paris, 1992).
15.
A different analysis may be appropriate for understanding learning from
instruction, and learning from both instruction and experience. The question of
how or whether the search for simplicity, as outlined here, can be applied in
such contexts in an important topic for future research.
16.
For a fuller discussion of the issues raised in this section, see Chater
(1996).
17.
The converse is also true, aside from certain technical restrictions concerning
computability, which are not of psychological relevance (Chater, 1996).). This
follows directly from the analysis in
the subsection Simplicity and the most likely explanation above,
relating the simplicity principle to probability theory.
18.
Chater (1996) note that there are, nonetheless, senses in which a dispute
between the two principles may remain, for exasmple if each principle is
interpreted at what Marr (1982) calls the “algorithmic,” rather than the
“computational” level. Moreover, van der Helm & Leeuwenberg (1997) argue
for an alternative reconciliation between the two principles.
19.
Alternative approaches has been opposed by theorists who argue that the
goodness of a figure relates to the number of symmetries that it possesses with
respect to transformations (e.g., reflection, translation, and so on).
20.
Chater (1997) presents further theoretical and empirical arguments for a
simplicity account of figural goodness, in distinction for Helm and
Leeuwenberg’s weight of evidence.
21.
Li & Vitányi (1993) have used this idea to develop the theory of
information distance, which inspired representational distortion as a
psychological proposal. Paul Vitányi (personal communication) notes that
the mathematical notion of information distance was originally developed with
cognitive considerations in mind.
22.
Indeed, finding the representational distortion between arbitrary
representations is known to be an uncomputable function, and hence must
necessarily be approximated (Li & Vitányi, 1993).
23.
Essentially, the the logarithm of the probabilities in (12) can be identified
with their Kolmogorov complexities--this reflects the general relation between
code length and probability discussed in Part II.
24.For yet another sense in which speed can
conflict with simplicity/likelihood, see Chater, Crocker and Pickering (in
press).
25.
Clearly this is not always the case. It is anecdotally clear that people do
tend to believe what they want to believe SOC PSY REFS. But in general we do
not see, hear and infer only what we want to see, hear and infer; and were this
the case, the consequences would presumably be disasterous (see Fodor, 1983 for
related discussion).
Figures

Figure 1. An infinite number of
incompatible patterns are compatible with any finite sequence of data.

Figure 2. Occluded figures are
consistent with an infinite number of continuations.
(a)
(b)
(c)
Figure 3. Pairs of objects which
contain shared patterns.