A non-final (with minor things missing) of the paper published as: Chater, N. (1999). The search for simplicity: A fundamental cognitive principle? Quarterly Journal of Experimental Psychology, 52A, 273-302.

 

The search for simplicity:

A fundamental cognitive principle?

 

 

Nick Chater*

Department of Psychology,

University of Warwick.

 

 

 

 

 

 

 

 

E-mail :            nick.chater@warwick.ac.uk

 

*Please address correspondence concerning this article to Nick Chater, Department of Psychology, University of Warwick, Coventry CV4 7AL, UK.


 

 

 

 

 

It is proposed that the cognitive system imposes patterns on the world according to a simplicity principle: choose the pattern which provides the briefest representation of the available information. The simplicity principle is normatively justified--patterns which support simple representations provide good explanations and predictions, on the basis of which the agent can make decisions and actions. Moreover,  the simplicity principle appears to be consistent with empirical data from many psychological domains, from perceptual organization, to similarity, reasoning, memory and scientific thinking. Thus, the simplicity principle promises to serve as the starting point for the rational analysis of a wide range of cognitive processes, in Anderson’s (1990, 1991a) sense. The simplicity principle also provides a framework for integrating a wide range of existing psychological proposals.

 

 


 

The cognitive system must cope with a world which is immensely complex but which is, nonetheless, highly patterned. The patterns are crucial. In a completely random world, prediction, explanation and understanding would be impossible--there would be no patterns on which prediction could be based, to which explanations could refer, or the comprehension of which could amount of understanding. Even more fundamentally, there would be no basis to choose one action rather than another, without any patterns relating actions to consequences.

            The ability to find patterns in the world is therefore of central importance throughout cognition. Without the ability to find such patterns an agent might as well be in a random world: it would be able to predict, explain and understand nothing; and it would have no basis on which to choose its actions. By contrast, the cognitive systems of people and animals appear to be conspicuously successful in coping with the world. Somehow, cognitive processes are able to find patterns successfully.

            How is this success achieved? Any proposal must meet two adequacy criteria. (1) It must be normatively justified--without such normative justification, the success of the method of finding patterns is mysterious; (2) It must be descriptively correct--it must accord with empirical data--at least to some approximation. A theory which is both normatively justified and descriptively correct provides a rational anlaysis of a cognitive process (for discussion of this concept see, for example, Anderson, 1990, 1991a, 1991b; Anderson & Schooler, 1991; Oaksford & Chater, 1994, 1995a; Oaksford & Chater, in press). So explaining how the cognitive system successfully finds patterns requires providing a rational analysis of the cognitive systems pattern finding capabilities.

            I propose that patterns are found by following a fundamental principle: choose the pattern that provides the simplest explanation of the available data. Moreover, I suggest that this principle applies at all levels of cognition, from the organization of perceptual input, to scientific inquiry. Thus, the simplicity principle can be used as a starting point for detailed rational analyses of a wide range of cognitive processes.

            The idea that cognition involves a search for simplicity has a long lineage, in both the discussion of normative and descriptive issues. On the normative side, the injunction to favor simple scientific theories can be traced to William of Ockham1 (1290?-1349?) and is endorsed by Newton (see Li & Vitányi, 1993: p. 277)). Simplicity was also assigned fundamental importance in early positivist epistemology (e.g., Mach, 1960/1883), and it remains a standard principle in modern philosophy of science (e.g., Sober, 1975). Simplicity is also recognized as important in statistics. If a straight line and a cubic fit the same data equally well, then the straight line is the preferred model because it is simpler--it contains fewer adjustable parameters.2 Moreover, a preference for simple explanations is a standard methodological principle in informal scientific discourse--for a prominent psychological example, see Pylyshyn’s (1984) discussion of the importance of having fewer model parameters than data points in cognitive modelling. But although the preference for simple patterns has been widely recognized, simplicity has typically remained a largely intuitive notion. Over the last thirty years, however, a rich and important theory of simplicity, Kolmogorov complexity,  has been developed and widely applied by mathematicians (Chaitin, 1966; Kolmogorov, 1965; Solomonoff, 1964; for an overview see the excellent textbook by Li & Vitányi, 1993), statisticians (Rissanen, 1987, 1989; Wallace, & Freeman,1987) and computer scientists (Quinlan & Rivest, 1989; Wallace & Boulton, 1968). This theory allows rigorous normative justifications to be given for why choosing the simplest pattern leads to the best explanations and predictions; and also allows the more concrete formulation of the psychological proposal that cognition seeks to find the simplest pattern. I will outline this account of simplicity and its potential application to cognition below.

            Simplicity also has also been frequently viewed as important from the point of view of describing, rather than justifying, cognitive processes. Mach (1959/1886), one of the strongest advocates of simplicity of a normative principle in science, also proposed that perceptual system seeks to find the simplest representations of sensory input. This viewpoint is echoed in the proposal in the Gestalt tradition that perceptual organization is chosen to maximize “prägnanz” (Koffka, 1962/1935), a notion closely related to simplicity which aims to integrate the range of specific Gestalt principles of perceptual organization (good form, good continuation, and so on). Moreover, Hochberg and McAllister (1953) explicitly identified the goal of perceptual organization as maximizing simplicity, and this work was followed by a variety of related proposals, where simplicity is measured in different ways (Buffart, Leeuwenberg & Restle, 1981; Garner, 1962, 1974; Leeuwenberg, 1969, 1971). Moving from perception to the psychological processes involved in scientific inference, simplicity has also frequently been invoked as an important guiding principle. For example, scientists frequently report strong aesthetic preferences in theory construction and evaluation, using terms such as “simplicity,” “elegance,” “parsimony” and so on to describe desirable properties of theoretical proposals.  Einstein has been attributed with the remark that “Everything should be made as simple as possible, but not simpler” (COVER/tHOMAS or EYSENCK/KEANE). This preference for simplicity is sometimes expressed so strongly that it even overrides concern the fit with the data DIRAC in STEWART/GOLUBITSKY. Thus simplicity has been implicated as a guiding principle in finding patterns, from perceptual processing to scientific reasoning. I propose that simplicity may have an even more general  role in cognition: ranging from reasoning and memory, to learning and similarity.

            This paper has three parts. The first introduces the problem of finding patterns in data, and why it is normatively and descriptively puzzling because there are an infinite number of patterns consistent with any finite set of data. The second part considers the normative question of how patterns should be found. I outline how simplicity can be quantified in terms of the mathematical theory of Kolmogorov complexity and how this theory explains why searching for simple patterns is normatively justified as a strategy for predicting and explaining the world, and as a partial basis for deciding how to act. The third part considers the descriptive problem of how various cognitive processes actually do find patterns. The approach is programmatic--I aim to provide an integrated framework for apparently diverse cognitive problems, and suggest directions for future research, rather than attempting a definitive account in any one area. Overall, I hope to show that simplicity is both a normatively justified and descriptively plausible account of how the cognitive system finds patterns in a range of domains.

 

Part I: The problem of finding patterns

Consider the problem of finding patterns in a finite portion of an infinite sequence. In the portion of the sequence that we observe, just two states are found. Let us call the binary values “Black” and “White” to allow the visual representation shown in Figure 1(a). In this finite sequence, an intuitively evident pattern is that there is an alternation of the two states. If this pattern is correct, then the sequence should continue as shown in Figure 1(b). But another pattern, equally consistent with the observed data, is that there is an infinite sequence of “white”, followed by an alternating sequence of white and black, and then an infinite sequence of “black.” The observed data is assumed to correspond to the middle part of this sequence (Figure 1(c)). Moreover, a further pattern consistent with the data consists of a jumble of states (many not occuring in the observed part of the sequence at all--represented by patterned squares in the figure) to the left and right of the alternating white and black items that are observed. Again this kind of pattern is precisely consistent with the observed data. More generally, it is clear that an infinite number of patterns are consistent with any finite set of data.

 

INSERT FIGURE 1 ABOUT HERE

 

A similar example, of traditional psychological interest (e.g., Dinnerstein & Wertheimer,1957; Kanizsa & Gerbino, 1982), concerns the completion of occluded figures (Figure 2). The intuitively natural completion of the occluded region in Figure 2(a) interprets the figure as a square partially occluded by another square (Figure 2(b)) This completition is predicted by two Gestalt principles: good continuation, which states that lines should be assumed to continue as smoothly as possible, and good form, which states that completions should prefer regular underlying figures3. But, again, an infinite number of alternative completions are possible (Figure 2(c)).

 

INSERT FIGURE 2 ABOUT HERE

 

The hard-headed psychologist may feel tempted to dismiss the rather bizarre patterns shown in Figures 1 and 2 as “silly.” Of course, such a psychologist might say, the cognitive system is only concerned with “sensible” patterns, and bases its explanations, predictions and decisions on these. The psychologist might go on to point that the really interesting issue is how the cognitive system copes with patterns where they are two “sensible” patterns which can be imposed on a pattern, and some choice must be made between them. But this impatient response misses the point. The psychologist must explain our intuitions about which patterns are “silly” and which are “sensible” patterns, and cannot take them for granted, because these intuitions are themselves the outcome of psychological processes. Indeed, these intuitions must be explained in two ways. First, some normative justification must be given for assuming that the cognitive system is justified in favoring “sensible” patterns, and basing its predictions, explanations and decisions on these. This issue is the focus of Part II. Second, the descriptive question of how the cognitive system differentiates between “silly” and “sensible” patterns must also be addressed--we leave desciptive issues to Part III.

 

Part II: Finding patterns: The normative problem

Despite our strong intuitions that all patterns consistent with a finite set of data are not equal, (i.e., that some are plausible and others are absurd) there has been a long sceptical tradition in philosophy arguing that no normative justification can be given for such preferences (e.g., Goodman, 1965; Hume, 1739-1740/1965; Popper4, 1959/1934). But this scepticism is unattractive, because it makes utterly mysterious the remarkable and consistent success that cognitive systems enjoys on the basis of favoring some patterns over others.

            Fortunately, the sceptical challenge can be addressed by applying the mathematical theory of Kolmogorov complexity. This theory quantifies simplicity and shows that a preference for simpler patterns is justified, because describing the world in terms of simple patterns consistently leads to better predictions, explanations and decisions.

            Before considering how this theory measures simplicity, however, we must first ask: what should be measured for simplicity?

 

The simplicity of what?

In choosing patterns on the basis of simplicity, the most obvious suggestion is that the simplest available pattern should be preferred. This principle correctly favours an indefinitely long sequence of alternating black and white squares in Figure 1; and the “square” completion in Figure 2. But, taken at face value, it also has a paradoxical consequence: a very simple pattern, such as that the pattern in Figure 1 is an infinitely long sequence of black squares or that the pattern in Figure 2 is a simple uniform field, will always be preferred. Such possibilities are, of course, ruled out by the constraint that the pattern has to be consistent with the available data--thus, these “null” patterns are just too simple. But this point itself raises difficult questions: What does it mean for a pattern to be consistent with the available data?  Can consistency with the input be traded against simplicity of intepretation? If so, how are simplicity and consistency with the input to be jointly optimized?  We shall see that the theoretical account of simplicity presented below answers these questions. 

            There is, however, a further, and more subtle difficulty: what rules out the simplest possible “null” pattern--such a “pattern” could be interpreted as saying that “anything goes”?  The null pattern will be consistent with the available data; indeed it would be consistent with any data, because it rules nothing out. Mere consistency or compatibility with the data is plainly not enough; the pattern must also, in some sense, capture regularities in the data (Harman, 1965). But this appears to imply that choosing a pattern involves the joint optimization of two factors; and the relative influence of these two factors is unspecified.  Moreover, this conclusion is unattractive because two notions, simplicity and explanatory power, must be explicated rather than just one.

            Fortunately, there is an alternative way to proceed.  This is to view a pattern as a way of encoding the data; and to propose the pattern chosen is that which allows the simplest encoding of the stimulus.  This view disallows null or nearly null patterns, which  which bear little or no relation to the data, because these organizations do not help encode the stimulus simply.  It also provides an operational definition of the “explanatory power” of a pattern--as the degree to which that organization helps provide a simple encoding of the stimulus.  If a pattern captures the regularities in the pattern (i.e., if it “explains” those regularities), then it will provide the basis for a brief description of the data; if an organization fails to capture regularities in the data, then it will be of no value in providing a brief description. Explanatory power is therefore not an additional constraint that must be traded off against simplicity; maximizing explanatory power is the same as maximizing the simplicity of the encoding of the stimulus. 

 

Quantifying simplicity

To apply the injunction to choose the pattern which provides the simplest encoding of the data, we need a measure of simplicity. There is a long tradition in philosophy of equating simplicity with brevity in some coding language (REF Phil of Sci-see anti-Occam). In psychology, this general approach has been applied in a variety of contexts,5 from the organization of simple sequences, such as the example we have just considered (Leeuwenberg, 1969; Restle, 1970; Simon, 1972; Simon & Kotovsky, 1963; Vitz & Todd, 1969), to judgments of “figural goodness” (Hochberg & McAllister, 1953), the analysis of Johansson’s (1950) experiments on the perception of motion configurations (Restle, 1979), and figural completion (Buffart, Leeuwenberg & Restle, 1981).  It has also been advanced as a general framework for understanding perceptual organization (e.g., Attneave & Frost, 1969; Leeuwenberg, 1971; Leeuwenberg & Boselie, 1988). I shall discuss some of these topics in Part III.

            Approaches based on brevity of encoding in some description language appear to be dogged by two problems: 1) that a fresh description language must be constructed for each fresh kind of pattern; 2) that the predictions of the theory depend on the description language chosen, and there is no (direct) empirical means of deciding between putative languages6.

            Kolmogorov complexity theory addresses these problems. The first problem is avoided by choosing a much more general language for encoding. Specifically, the language chosen is a universal programming language. A universal programming language is a general purpose language for programming a computer. The familiar programming languages such as PROLOG, LISP and PASCAL are all universal programming languages. How can an object, such as a perceptual stimulus, be encoded in a universal programming language such as, for example, LISP? The idea is that a program in LISP encodes an object if the object is generated as the output or final result of running the program. By the definition of a universal programming language, if an object has a description from which it can be reconstructed in any language, then it will have a description from which it can be reconstructed in the universal programming language. It is this that makes the programming language universal.

            Moreover, in solving the first problem, the second problem, that different patterns languages give different code lengths, is solved automatically.  A central result of Kolmogorov complexity theory, the Invariance Theorem (Li & Vitányi, 1993), states that the length of the shortest description, of an object, x is invariant (up to a constant) between different universal languages (though, as we discuss below, the choice of language may be important in developing theories of particular cognitive processes). This quantity, K(x), is defined as the Kolmogorov complexity of an object. Similarly, we can defined the conditional Kolmogorov complexity, K(y|x), between two objects x and y. This the length of the shortest program that transforms x into y.

            This talk of universal programming languages may appear rather unpsychological--after all, the cognitive system presumably does not represent information in  PROLOG, LISP or PASCAL! But the notion of a universal programming language is actually very broad--almost any reasonably rich system of representation, including most proposals concerning mental representation, are universal, and hence the Kolmogorov complexity measure can be applied.

            So we now have a definite interpretation of the claim that patterns are chosen on the basis of simplicity: the pattern is chosen that with which the data can be encoded as briefly as possible. We now consider why this preference for simplicity is justified.

 

The justification of simplicity

There are various criteria by which a particular choice of pattern in a set of data might be justified. The best pattern might the pattern that is the most likely explanation of how the data was generated; the pattern that gives rise to the best predictions; or the pattern that provides the best basis for decision making. Fortunately, simplicity can be justified in each of these ways.

 

Simplicity and the most likely explanation7

Suppose that we have data, D, and a set of hypotheses concerning the pattern in the data. The most likely hypothesis is the hypothesis, H, that has the greatest probability, given the data. In symbols, this is the H that maximizes P(H|D). Bayes’ theorem, a standard theorem of probability theory, states that:

 

                                                                                (1)

 

That is, the probability of the hypothesis given the data is proportional to the product of the probability of the data given the hypothesis and the prior probability of the hypothesis. By elementary mathematics, choosing the H that maximizes (1) is equivalent to choosing the H that minimizes (2):

 

                                                                            (2)

 

Under very general conditions, -log2P(x) is approximated by the Kolmogorov complexity of x, K(x), and -log2P(y|x) is approximated by its conditional Kolmogorov complexity of y given x, K(y|x) (see Li & Vitányi, 1995; Vitányi & Li, 1996 for a rigorous analysis, and Chater (1996) for a more informal discussion). This duality between probabilities and code lengths is of great importance, and has been widely used in statistics (e.g., Rissanen, 1987, 1989) and artificial intelligence (e.g., Cheeseman, 1995) and computer vision (e.g., Mumford, 1992), as well as having direct psychological implications8 (Chater, 1996).

            Thus, choosing the H that maximizes (1) is equivalent to choosing the H that minimizes (3):

 

                                                                                             (3)

 

But (3) has the following interpretation: K(H) is the length of the shortest pattern to specify the hypothesized pattern, H; and K(D|H) is the length of the shortest pattern which specifies the data, D, given H. The sum of these quantities is therefore the description length of the data, given the hypothesized pattern--the description consists of two parts: first, the pattern must be specified, and then the specific data must be specified in terms of the pattern. Therefore, (3) can be informally glossed as follows

 

            Shortest description of D, using H                                                         (4)

 

According to Bayes’ theorem, H should be chosen to be as probable as possible (i.e., to maximize (1)). But we have seen that this is equivalent to choosing H to minimize (4): i.e., the pattern should be chosen in order to provide the simplest specification of the data. Therefore, choosing the simplest hypothesized pattern is justified because it amounts to choosing the pattern which is the most likely explanation of the data.

 

Simplicity and prediction

Let us consider prediction in the simple setting where the environment consists of a string of 0s and 1s. A continuous portion, x1...xn, of the sequence is observed--the task is to predict the next item, xn+1, in the sequence. By elementary probability theory

 

                                                                    (5)

 

The best prediction xn+1 is the one that has the highest probability of being true--i.e., that maximizes (5). Because the denominator does not contain xn+1 the best prediction will also maximize

 

                                                                                                (6)

 

and will minimize:

 

                                                                                   (7)

 

Using the equivalence between Kolmogorov complexity and probability, as above, the best prediction xn+1 therefore minimizes:

 

                                                                                                           (8)

 

Thus prediction is achieved by finding the pattern that is the basis for the shortest code for x1...xn and then choosing the next item xn+1 that follows according to that pattern.9 We can therefore conclude that the predictions of the pattern chosen by the simplicity principle are most likely to be true.

            This heuristic argument has been made rigorous in Li & Vitányi (1997; see also Li & Vitányi, 1993 for related discussion). Moreover, mathematical justifications for prediction based on patterns which are chosen to minimize description length have been provided in other mathematical contexts (e.g., Rissanen, 1987, 1989; Vapnik, 1995). Moreover, predictions based on this principle have been successful in a range of practical applications (e.g., Goa & Li, 1989; Quinlan & Rivest, 1989). Thus, choosing patterns on the basis of simplicity appears justified as a basis for prediction.

 

Simplicity and decision making

Finding patterns by simplicity allows an agent to predict and explain the world. These are abstract goals, but nonetheless goals which are of fundamental importance to guiding decisions about practical action. The standard normative theory of how decisions should be made, decision theory, requires associating possible events with by a number representing its utility; and assessing the probabilities of outcomes if a particular action is taken (Berger, 1985). Decision theory recommends choosing the action which maximizes expected utility.10

            This account of how decisions should be made has a clear role for the simplicity principle--simplicity determines the probability of possible events, which are then combined with utilities to determine what action should be taken. Thus, the simplicity principle relates not merely to the abstract normative goals of inferring the most probable pattern, or predicting what will happen, but to the concrete problems of deciding how to act.

           

Part III: Finding patterns: Describing cognitive function

I have argued that the finding patterns should proceed by choosing patterns which support the most economical encoding of the relevant data. This suggests a possible (partial) account of the remarkable success of the cognitive system in prediction, understanding and acting in an uncertain and complex environment: that cognitive processes search for simplicity. I now consider whether this proposal provides a basis for plausible descriptive psychological theories. I begin by giving a broad outline of how the proposal that cognition is guided by simplicity should be understood, and outlining in general terms its potential implications for core aspects of cognition: reasoning, learning and memory. I then consider two case studies, taken from the study of perception and of similarity, which show how this approach can lead to specific theoretical proposals.

 

Simplicity and cognition: The broad picture

           

A psychological simplicity principle

The normative discussion suggests that the cognitive system should aim to find the simplest possible interpretation of the information available to it, whether that information be perceptual input or scientific data. But the proposal that the cognitive system invariably finds the very simplest interpretation is unrealistic, for two reasons.    First, the computational problem of finding the shortest encoding of a set of data is generally very difficult. If the coding language is rich enough to express an arbitrary computable function (i.e., if it can be viewed a general purpose programming language--which is true of for a surprisingly large class of languages), then the problem of finding the simplest interpretation is provably uncomputable (see, e.g., the discussion in Chater (1996) and the formal results in Li & Vitányi, (1993)). For such languages, the strong form of the SL principle is therefore ruled out, on pain of violating the Church-Turing thesis. But even if the representation language is not rich enough to express an arbitrary computable function, the search for the simplest interpretation is still typically combinatorially explosive (although see Helm & Leeuwenberg, 1986, for a rare counterexample).

            Second, empirical considerations also suggest that the cognitive system does not, in general, find the very simplest perceptual organisation. A well-known perceptual example is that Glass patterns (Glass, 1969; Glass & Peréz, 1973) where there is a wide separation between the two “copies” of the pattern may appear to be entirely unstructured. Chater (1996) considers a more extreme case: A binary expansion of π represented as a pattern of black or white squares would appear completely random; its simple description (as an expansion of π) cannot be discovered by the cognitive system11.

            A psychological form of the simplicity, therefore, cannot specify that the cognitive succeeds in finding the shortest description of the information available to it. Rather, simplicity should be viewed as a goal of cognitive processing: the cognitive system chooses the simplest interpretation of this information that it can find.

            A further important issue in the psychological interpretation of the simplicity principle concerns mental representation. I noted above that Kolmogorov complexity theory abstracts over representation languages, so that the theory can be used as a general framework for theorizing about cognition without a detailed undertanding of the nature of mental representation. Nonetheless, the specific representations used by the cognitive system will be of crucial importance in detailed psychological explanation. Indeed, note that, according to the relationship between simplicity and probability above, the coding language can be viewed as encoding a set of prior probabilities concerning possible patterns. Evidence concerning mental representation from any source may thereby be useful in providing constraints on the predictions of simplicity-based accounts of cognition12. For example, evidence from linguistics or psycholinguistics concerning the nature of the mental representations involved in understanding natural language must be taken into account in any simplicity/likelihood account of how the perceptual/cognitive system finds structure in speech13.

            Having considered how the simplicity principle can be interpreted as a psychological proposal, I now consider how it can be applied to understanding cognition. Below, I outline how the simplicity principle can be applied to two specific areas: perception and similarity. First, I sketch, in very broad terms, how it can be related to some of the major topics in cognitive psychology.

 

Reasoning

In a series of papers (Chater & Oaksford, 1990, 1993; Oaksford & Chater, 1991, 1992, 1993, 1995b), Mike Oaksford and I have argued that almost all everyday reasoning is uncertain: people draw conclusions that are plausible, but not certain, given the premises. We have argued that probability theory, the calculus of uncertainty, is therefore a more appropriate starting point for understanding human reasoning that logic, the calculus of certainty. Moreover, we have argued that people interpret classic psychological reasoning tasks, which are typically assumed to be deductive, in probabilistic terms, and solve them using strategies which can be understood in probabilistic terms. Thus, we argue that people are not logical, but that they are rational; logic is simply the wrong standard against which to assess most human reasoning. This viewpoint has proved useful in providing detailed models of a range of standard reasoning tasks, including Wason’s selection task (Wason, 1966; 1968; Oaksford & Chater, 1994, 1995a; see Almor & Sloman, 1996; Evans & Over, 1996; Laming, 1996 for critical discussion and Oaksford & Chater,1996 for a response; see also Oaksford, Chater, Grainger & Larkin, 1997 for empirical evidence), syllogistic reasoning (Chater & Oaksford, ms) and conditional inference (Oaksford & Chater, ms). Because of the duality between simplicity and probability, a probabilistic interpretation of human inference is immediately compatible with the simplicity principle outlined here.

            Note that this viewpoint proposes that simplicity/probability is a goal of reasoning--but that this goal will only be approximated. Theorists differ on how good such an approximation might be. Kahneman and Tversky have argued that their experimental results showed strong departures from the norms of probability theory under certain conditions (e.g., Kahneman & Tversky, 1973; Kahneman, Slovic & Tversky,1982), although the reasoning hueristics (the availability and representativeness) that they proposed people use are usually reasonably reliable in normal circumstances. Gigerenzer and his colleagues (e.g., Gigerenzer, Hell & Blank, 1988; Gigerenzer & Murray, 1987) have argued that Kahneman and Tversky may have substantially underestimated the normative correctness of human probabilistic reasoning, and shows experimental manipulations which clarify the task for the experimental participant can dramatically improve the fit between reasoning performance and probabilistic norms.

            More recently, however, Gigerenzer and Goldstein (1996) and Evans and Over (1996?REF) have proposed that human performance should not be compared with normative theories, such as logic or probability theory (or, in the present context, the simplicity principle). They argue that such normative accounts are entirely unnecessary for understanding human reasoning. Specifically, Gigerenzer and Goldstein (1996) argue that reasoning should be understood as consisting of “fast and frugal” heuristics which are adaptively successful, but not normatively justified; and Evans and Over (1996) argue that much human reasoning is “rational1,” i.e., successful with respect to achieving a person’s goals, but not “rational2,” i.e., conforming to a normative analysis. Both these viewpoints suggest that reasoning may consistently succeed without conforming, even approximately, to any normative standard. This seems unsatisfactory, because it leaves this success unexplained (see Chater, Oaksford, Nakisa & Redington, ms.). By contrast, the simplicity principle has both a normative justification, and also is intended to describe cognitive performance14.

 

Learning from Experience PUT IN REFS

Learning from experience is a problem of finding patterns in what are typically large amounts of complex and often noisy data15. It therefore falls naturally within the domain of application of the normative theory of finding patterns by searching for simplicity. Moreover,  theorists have directly proposed that certain aspects of language acquisition may proceed by finding the shortest possible encoding of the input linguistic data. For example, Brent & Cartwright (199? CHECK) show how morphological structure can found within isolated words, Wolff (??; see also Atick, ??) considers how higher level structure can be found automatically in text. Less directly, connectionist networks, perhaps the most popular computational models of human learning (Elman et al) can be interpreted as implementing Bayesian probabilistic inference (MacKay, 1992; Neal, 19??; see also Chater, ??), and thus, by the connection between probability and simplicity, as maximizing simplicity. Indeed, much recent interest in the study of connectionist networks has focussed on directly viewing networks as minimizing description length, and therefore as maximizing simplicity (Hinton & Zemel, 19??; Zemel, 199? see Neural Computation). Thus, many current psychological models of learning are compatible with the thesis that the cognitive system maximizes simplicity.

 

Memory

Finally, note that the claim that the cognitive system searches for patterns which provide the briefest encoding of available information has a natural interpretation in terms of memory: that the cognitive system seeks to minimize memory load. This leads to the prediction that the richer the patterns that the cognitive system can find in a stimulus, the better it will be remembered. This is a ubiquitous finding in all areas of memory research, from the advantage of memory for words over non-sense strings, to the memory for meaningful over non-meaningful pictures, to comprehensible vs non-comprehensible stories (REFS). This viewpoint was also taken by theorists working within an information-theoretic framework (Attneave, 1959; Garner, 1962, 1974).

            It is important to note that this account does not depend on the assumption that the memories are stored as briefly as possible--i.e., with no redundancy. Indeed, it has frequently been observed that this kind of storage would be inappropriate, because it would not be robust to noise. Information theory specifies that constructing an optimal redundant code is achieved by first finding the simplest encoding, and then introducing redundancy so that each part of this code is equally protected from corruption (Cover & Thomas, 1992). Thus, for a given stimulus, finding a brief encoding will allow the construction of a better redundant representation, which will thereby be noise resistant and hence better remembered.

 

Case study 1: Perception

Perception is, from an abstract point of view, a process of finding patterns in sensory input. Thus, a simplicity criterion for choosing between patterns may potentially be applied across a wide range of aspects of perceptual analysis. For example, in low-level perception, it has been conjectured that the compression the sensory signal is a central goal (Atick & Redlich, 1990; Barlow, 1989; Blakemore, 199015). The goal of compression is frequently viewed as stemming from limitations in the information-carrying capacity of the sensory pathways. However, viewpoint outlined here suggests a complementary interpretation. It could be that compressed (i.e., simplest) perceptual representations will tend to involve the extraction of features likely to have generated the sensory input (because maximizing simplicity automatically maximizes likelihood). From this viewpoint, perceptual inference occurs in the very earliest stages of perception (e.g., as implemented in mechanisms such as lateral inhibition in the retina), where neural coding serves to compress the sensory input. Thus, the search for simplicity may operate in low-level perception.

            Moreover, the same principles might equally well be at work in high level perceptual processing--the simplicity principle seems equally valuable in attempting to understand the causal structure of a sequence of observed actions or events. The key goal is to find patterns which are a reliable basis for explanation and prediction; we have seen that following a simplicity principle is a way of achieving goals of this kind. The simplicity principle therefore finds potential applications in understanding perception at many scales. Which areas of applications prove to be theoretically fruitful  remains for future research--below, I discuss some areas where the notion of simplicity has already been usefully applied.

 

Perceptual organization

How does the perceptual system derive a complex and structured description of the perceptual world from sensory input? Two apparently competing theories of perceptual organization have been influential.  The first, initiated by Helmholtz (1910/1962), advocates the likelihood principle: that sensory input will be organized into the most probable distal object or event consistent with that input.  The second, which has been mentioned already, advocates what Pomerantz and Kubovy (1986) call the simplicity principle: The perceptual system is viewed as finding the simplest, rather than the most likely, perceptual organization consistent with the sensory input.

            Both the likelihood and simplicity principles explain, at least at an intuitive level, a wide range of phenomena of perceptual organization. Consider, for example, the Gestalt law of good continuation, that perceptual interpretations which involve continuous lines or contours are favored. The likelihood explanation is based on the observation that continuous lines and contours are very frequent in the environment (e.g., Brunswick, 1956).  Although it is possible that the input was generated by discontinuous lines or contours which happen, by coincidence, to be arranged so that they are in alignment from the perspective of the viewer, this possibility is rejected because it highly improbable.  The simplicity explanation, by contrast, suggests that continuous lines or contours are imposed on the stimulus when they allow that stimulus to be described more simply. 

            Another example is the tendency to perceptually interpret ambiguous 2D projections as generated by 3D shapes containing only right angles (Attneave, 1972; Perkins, 1972, 1982; Shepard, 1981). The likelihood explanation is that right angled structures are more frequent in the environment (at least in the “carpentered” environment of the typical experimental subject (Segall, Campbell & Herskovits, 1966)).  The simplicity explanation is that right angled structures are simpler--e.g., they have fewer degrees of freedom--than trapezoidal structures.

            There has been considerable theoretical and empirical controversy concerning whether likelihood or simplicity governings perceptual organization (e.g., Hatfield & Epstein, 1985; Leeuwenberg & Boselie, 1988; Pomerantz and Kubovy, 1986; Rock, 1983). The controversy has been difficult to settle because neither of the key principles, likelihood and simplicity, is clearly defined. Moreover, there have been suspicions that the two principles are not in fact separate, but are two sides of the same coin.  Pomerantz and Kubovy (1986) cite Mach (1886/1959): “The visual sense acts therefore in conformity with the principle of economy [i.e., simplicity], and at the same time, in conformity with the principle of probability [i.e., likelihood]” (p. 215), and themselves suggest that some resolution between the two approaches might be possible--particularly in view of the fact that both likelihood and simplicity explanations are typically appear to be available for most phenomena in perceptual organization.

            Chater (1996) notes that the simplicity and likelihood principles are indeed, under natural interpretations, equivalent. Specifically, if simplicity is interpreted as length in a coding language, and likelihood is interpreted as subjective probability, then any problem of maximizing simplicity can be reinterpreted as a problem of maximizing likelihood17.

            The unification of the simplicity and likelihood views appears to be challenged, however, by the apparent existence of phenomena which have been interpreted as providing distinctive evidence for likelihood and against simplicity, or vice versa. If the two principles are identical, empirical evidence distinguishing between them should not be possible. Chater (1996) argues, however, that such evidence can be interpreted from in both the simplicity and likelihood frameworks. I briefly consider such evidence, and its interpretation.

            Likelihood is widely assumed to be favored by evidence that shows that preferred perceptual organization is influenced by factors concerning the structure of the everyday environment.  For example, consider 2D projections of a shaded pattern, which can be seen either as a bump or an indentation (see, e.g., Rock, 1975).  The preferred interpretation is consistent with a light source from above, as in natural light.  Thus, the perceptual system appears to choose the interpretation that is most likely; but there is no intuitive difference between the simplicity of the two interpretations. But such phenomena also have a simplicity-based explanation can be intuitively understood as follows.  Consider the simplest description not of a single stimulus, but of a typical sample of natural scenes.  Any regularity which is consistent across those scenes need not be encoded afresh for each scene--rather, it can be treated as a “default”.  That is, unless there is an specific additional part of the code for a stimulus that indicates that the scene violates the regularity (and in what way), it can be assumed that the regularity applies.  Therefore, other things being equal, scenes which respect the regularity can be encoded more briefly than those which do not.  Moreover, perceptual organizations of ambiguous scenes which respect the regularity will be encoded more briefly than those which violate it.  In particular, then, the perceptual organization of an ambiguous stimulus obeying the natural regularity of illumination from above will be briefer than the alternative organization with illumination from below.  In general, preferences for likely interpretations also give rise to preferences for simple interpretations: if the code for perceptual stimuli and organizations is to be optimal when considered over all (or a typical sample of) natural scenes, it will reflect regularities across those scenes.

            Simplicity is assumed to be favored by cases of perceptual organizations which violate, rather than conform to, environmental constraints. Leeuwenberg and Boselie (1988) show a schematic drawing of a symmetrical two headed horse.  The more likely interpretation, also consistent with the drawing, is that there are two horses, one occluding the other. But the perceptual system appears to reject likelihood. Instead, the drawing is interpreted as a single, two-headed animal. But we can also provide a likelihood explanation of this phenomenon, where likelihood applies locally rather than globally.  That is, the perceptual system may determined the interpretation of particular parts of the stimulus according to likelihood (e.g., the fact that there are no local depth or boundary cues may locally suggest a continuous object). These local processes may not always be guaranteed to arrive at the globally most likely interpretation (see Hochberg, 1982).

            Thus, the evidence that distinguishes between the simplicity and likelihood principles is actually compatible with both, and therefore does not challenge the unification between them18.

 

Figural goodness

Some perceptual patterns are intuitively judged to be more “regular” or “better” than others. These intuitive judgements of “figural goodness” appear to reliably correlate with the resistance of such patterns to noise, and the speed with which such patterns are detected.

            Hochberg and McAllister (1953) argued for a direct connection between judgments of figural goodness and choice of perceptual organisation. They identified figural goodness with simplicity, and adopted the simplicity principle, as discussed above: that perceptual organisations are chosen to maximizes simplicity19. According to this viewpoint, the intuitive notion of goodness can be viewed as a measure of the degree to which the percpetual system succeeds in finding a simple pattern in the perceptual stimulus.

            One line of argument in favor of Hochberg and McAllister’s viewpoint is that there is an interesting connection between the proposal that the simplicity principle determines the choice of perceptual organizations, as outline above, and noise resistance. Specifically, if the simplicity principle is right, then it follows, as we shall see below, that simple patterns will be the most noise resistant. Moreover, given that noise resistance is a litmus test for figural goodness, this suggests the further implication that simple patterns will be particularly good. Thus, the simplicity principle in perceptual organization appears to imply that simplicity also governs goodness, as Hochberg and McAllister propose.

            The crucial step in the argument above is that which shows that, if the simplicity principle is correct, the simplicity of a pattern correlates directly with its resistance to noise. The intuitive idea is that the noise resistance of a pattern depends on a comparison between a “null” organization, in which the pattern is not imposed and the stimulus is viewed purely as noise, and a “pattern + noise” interpretation, in which the stimulus is viewed arising from a pattern which has been corrupted by noise. According to the simplicity principle, the simpler of the two interpretations will be perceived: that is, the pattern will be perceived so long as the  “pattern + noise” interpretation is shorter than the “null” interpretation. This implies that very simple patterns (with codes) will be the most noise resistant, because more noise can be added, and the pattern + noise interpretation will still be the shortest. If we assume noise resistance to be a litmus test for figural goodness, this means that simple patterns will have the greatest goodness. According, the simplicity principle as a principle concerning choice between alternative perceptual organizations implies that simplicity determines figural goodness. This provides an argument for Hochberg and McAllister’s (1953) view that simplicity governs not only choice of perceptual organization but also figural goodness.

 

Randomness

If simplicity determines judgements of “goodness” or “regularity,” then this suggests that complexity might determine judgements of “randomness” or “irregularity.” That is, perhaps judgments of randomness can be viewed as the inverse of goodness judgments (see, e.g., Alberoni, 1962). If perceived goodness is determined by the degree to which the cognitive system succeeds in finding structure in the stimulus, then this suggests that perceived randomness may be determined by the degree to which the cognitive system fails to find such structure. Interestingly, Falk and Konold (in press) have recently provided support for this view. They give a persuasive theoretical analysis as well as empirical confirmation of the suggestion that subjective judgments of the randomness of a stimulus are inversely related to the success of their attempts to find a brief code for that stimulus (see also, Kahneman & Tversky, 1972) (Indeed, Falk and Konold’s (in press) analysis proposes an algorithmic definition of randomness drawn from Kolmogorov complexity theory (Li & Vitányi, 1993), thus using the same tools as the current analysis of simplicity at a technical level.). This is a straightforward inversion of the SL account of goodness: A stimulus is perceived as random to the extent that no simple/likely organisation can be found for it. Thus, the SL approach promises to unify the literature on goodness with that on judgments of randomness (e.g., Bar-Hillel & Wagenaar, 1991; Budescu, 1987; Lopes & Oden, 1987).

 

 

Case study 2: Similarity

Consider the problem of finding patterns in a stimulus consisting of two distinct objects. Each object may contain internal patterns; but in addition, there may be patterns which interrelate the two objects. For example, a short description of the stimulus shown in Figure 3(a) would exploit the common patterns between the left and right object; specifically by noting that one is the mirror image of the other in a vertical axis of symmetry. The pattern interrelating the two parts of the stimulus is very strong; once one half of the stimulus is described, the other can be generated very simply, by describing the axis of symmetry. Figure 3(b) shows a pair of objects which share somewhat less structure--specifying one in terms of the other requires a reflection, and the interchange of black and white. Figure 3(c) shows a case where there is less structure still; to specify one object in terms of the other requires an additional translation of the inner figure.

 

INSERT FIGURE 3 ABOUT HERE

 

Suppose that we ask: how similar are the pairs of objects in Figure 3? Intuitively, similarity appears to decrease from (a) to (c). Thus, the more shared patterns between two stimuli, and therefore the simply one can be specified in terms of the other, the more similar they are. Generalizing this observation leads to the proposal that the judged similarity between two objects depends on the simplicity of the transformation from the representation of one object to the representation of the other. Ulrike Hahn and I (Chater & Hahn, 1996, in press) have called this the representational distortion theory of similarity--the simpler the transformation between the representations of a pair of objects, the more similar those objects are assumed to be. In terms of Kolmogorov complexity, representational distortion is expressed in terms the conditional Kolmogorov complexity, K(y|x), introduced above--the length of the shortest program that transforms x into y21.

            Representational distortion provides an interesting generalization of current psychological theories of similarity. The two leading accounts, the geometric and featural views, also treat similarity as a relation between mental representations. But whereas representational distortion applies to any kind of representation, and allows arbitrary computable transformations between them, these theories are committed to very specific types of representations, and very specific relations between them.

            The geometric view  (Shepard, 1987) assumes that objects are represented as points in an internal space. The similarity between two objects is inversely related to the distance between their representations in this space. By contrast, the set-theoretic view  (Tversky, 1977) assumes that objects are represented as sets of features. The similarity between two objects depends on the amount of overlap between their sets of features. The representational limitations of both accounts are severe. It does not seem possible to represent perceptual organizations, parsed sentences, schemas for world knowledge, or sequences of motor commands either as points in an internal space, or as sets of features. Rather, they appear to require  structured representations which are able to capture relations between parts and wholes and capture systems of relations between parts (Chomsky, 1965; Fodor, 1975; Fodor & Pylyshyn, 1988; Marr, 1982; Minsky, 1977). In short, structured representations appear to be required to represent almost all cognitively significant stimuli; and judgements of similarity between such stimuli thereby fall outside the scope of both geometric and set-theoretic accounts of similarity.

         I stress that representational distortion, like the geometric and set-theoretic views, is defined over mental representations of objects--not over  the objects themselves. To see why this is crucial, consider the psychological similarity of two unrelated bursts of white noise. At an acoustic level of description, where the bursts are considered as amplitudes varying over time, a very long set of instructions would be required to transform one of these bursts into the other. But the two noises may, nonetheless, be judged to be similar, even to the point that the auditory system cannot distinguish the two. According to this account, this is because the mental representation of the two bursts does not include minute detail of each aspect of the noise. Instead, they are concerned with a more general description, perhaps concerning the duration, loudness, location and so on of the burst.  These properties may be largely or completely matched between stimuli, so that the mental representations of the two sounds are identical, or differ only slightly, and hence the representational distortion between them is small.

         I stress also that the representational distortion found by the cognitive system will not correspond exactly to information distance. Discovering a short transformation between one representation and another may require arbitrary amounts of computation. For example, the sequences 1 5 3 7 2 3 9 0 6 and  3 0 7 4 4 7 8 1 2 are very simply related—if they are interpreted as base 10 numbers, the second is double the first. Hence the representational distortion between the two sequences is small; however, the cognitive system may not find this short transformation; and the similarity between the two representations may be judged to be low. We assume therefore only that the cognitive system can approximate representational distortion to some degree22.

 

Geometric and set-theoretic theories are special cases of representational distortion

 

I now note that geometric and set-theoretic models can be seen as special cases of representational distortion. The mathematical details have been omitted for brevity (see Chater & Hahn, in preparation).

 

The Spatial Account

Representations are limited to vectors of numbers. Transformations are limited to sequence of “nudges” of unit length (this length can be thought of as a limit of resolution in the space) and a “program” consists of a sequence of such nudges. If nudges can be in any direction, then the simplest transformation between two points is given by the distance of the straight line path between the points (this is the length of the “program” of concatenated nudges, ignoring the cost of specifying the direction of a nudge). This gives the Euclidean version of the spatial model. Restrictions to nudge direction to the axes gives a city-block version; allowing non-orthogonal axes derives the general Euclidean scaling model (Ashby & Townsend, 1986).

 

The Set-theoretic Account

Representations are limited to sets of features. Transformations are limited to the deletion and addition of features one by one. Thus a program consists of a sequence of deletions and additions. Assuming differential length for deletion and addition (specifically, deletion has the shorter code, because additions require specifying what is to be added), program length is then determined a weighted sum of the number of features that object A has and object B does not (which must be deleted) and that B has but A does not (which must be added). The length of this program is a close variant of Tversky’s (1977) theory of similarity.

 

Properties of the representational distortion theory of similarity

 

We now briefly consider some basic properties of representational distortion that imply that it is a promising starting point for a psychological theory of similarity.

 

Flexibility

The fact that similarity is defined over general representations takes account of the great flexibility of human similarity judgements (e.g., Medin, Goldstone & Gentner, 1993), because similarity is defined over representations of objects, and the goals and knowledge of the subject may affect the representations which are formed.  As with the set-theoretic models (Tversky, 1977), this flexibility has both advantages, in terms of accounting for the flexibility of people's similarity judgements, and disadvantages, from the point of view of deriving testable empirical predictions.

 

Similarity and identity

According to representational distortion, any object is more similar to itself than to any other object. This is because the shortest possible program is the “empty” program, which, clearly, leaves any representation unchanged (in symbols, K(x|x) = 0). Thus, the representational distortion viewpoint automatically captures the fundamental intuition that identity is the most extreme form of similarity. This property of representational distortion seems attractive, but it appears to run counter to data obtained by Tversky (1977) which appears to show that the similarity between distinct objects can sometimes exceed the similarity between identitical objects. However, the interpretation of this data is sufficiently controversial (e.g., Nososkfy, ???) that it may be too early to take the drastic conceptual step of rejecting the intuition that identity is the most extreme form of similarity. 

 

Asymmetry

Representational distortion allows for asymmetry in similarity judgements: K(x|y) is not in general equal to K(y|x).  This asymmetry is particularly apparent when the representations being transformed differ substantially in complexity. Suppose that a subject knows a reasonable amount about China, but rather little about Korea, except that it is ``rather like'' China in certain ways.  Then transforming the representation of China into the representation of Korea will require a reasonably short program (which simply deletes large amounts of information concerning China which is not relevant to Korea), while the program transforming in the reverse direction will be complex, since the minimal information known about Korea will be almost no help in constructing the complex representation of China.  Thus, we would predict that K(China|Korea) should be greater than K(Korea|China).  This is observed experimentally (Tversky, 1977).

 

Background knowledge

Similarity judgements are influenced by background knowledge. For example, if the arabic number system is part of your background knowledge, then you may perceive similarities between otherwise dissimilar patterns (i.e., dissimilar as mere patterns of dots), because numerical transformations will be available. It is difficult for any theory of similarity to explain the role of background knowledge. In the spatial view, the natural role of knowledge is in specifying the dimensions of the space in which the comparison takes place, as well as assigning weights, which determine the relative importance of each dimension (effectively by stretching or squashing the space along the relevant dimension). In the set-theoretic view, background knowledge can play a role in determining the features that are taken into account in the comparison. In both cases, the role of knowledge is to afffect the representations that are the input to the comparison process. Similarly, background knowledge may affect the representations which are compared, according to the view that similarity is representational distortion. But, moreover, the representational distortion account allows an additional way in which background can affect similarity comparisons: by assuming that background knowledge forms an additional input to the program which must transform one object into another. Thus, background knowledge affects what operations are available in transforming one representation into another--for example, a knowledge of the number system might suggest all manner of numerical transformations which might relate two numbers (e.g., having the concept of a prime might increase the degree to which people judge 43 and 47 to be similar--partly because one can be generated from the by the instruction “next prime” or “previous prime”). People with different mathematical knowledge might thereby have different judgements about which numbers are similar. More drastically, people who use different notions will thereby have dramatically different judgements concerning the similarities between patterns corresponding to formulae expressed in various notations.

            Thus, representational distortion provides a rich framework for understanding how background knowledge influences similarity judgements--knowledge can readily influence the nature of the similarity comparison itself, as well as changing the representations that are inputs to the comparison (see Chater & Hahn, in press, for discussion of related issues). It remains for future work to determine to what extent this account can capture in detail the way in which people’s similarity judgements are influenced by their background knowledge.

 

Summary

To sum up, the simplicity approach to similarity arises as follows. If the cognitive system searches for the simplest interpretation of the information available, then it will aim to exploit regularities between different representations. The strength of the shared regularities between two objects can be measured by the saving the conditional Kolmogorov complexity between them: call this the “representational distortion” between. Representational distortion can be viewed as a generalization of the two standard psychological models of similarity: the spatial and set-theoretical models. Moreover, it has a number of intuitively attractive properties. An interesting project for future research is to attempt to develop this theoretical account in more detail, and to provide experimental tests for this approach (see Chater & Hahn, 1996, in preparation).

 

Scope and limits

I have proposed that the search for simplicity is a fundamental principle of cognition. I have argued the principle has potentially broad application. In this section, I list a number of important limitations for this approach.

 

PUT IN REFERENCES FOR THIS SECTION

 

Representation

If simplicity is defined in terms of brevity in a coding language, then simplicity will depend crucially on that representation language--thus obtaining detailed psychological predictions from the simplicity principle requires making specific assumptions about mental representation. I noted above that Kolmogorov complexity theory is able to abstract away from the specific coding language being used, because code lengths in any two languages are equal up to a constant--but this constant may be large in relation to the amount of data available in specific psychological applications.

            The viability of simplicity-based accounts of cognition can be assessed by constraining the coding language to determine simplicity by independent theoretical and empirical evidence concerning the relevant aspect of mental representation. Leeuwenberg and colleagues (Buffart, Leeuwenberg & Restle, 1981; Leeuwenberg, 1969, 1971) have pursued this approach in assessing the viability of their coding language for certain classes perceptual stimuli, structural information theory (e.g., van der Helm, van Lier & Leeuwenberg, 1992; van Lier, van der Helm & Leeuwenberg, 1994a, 1994b). Similar programs of research may be possible with respect to other applications of the simplicity principle.

 

Search PUT IN REFS

I have argued that the simplicity is the criterion with which the cognitive system chooses between alternative patterns that may be imposed on the environment. But I have also noted that the cognitive system cannot, in general, maximize simplicity--in general, finding the shortest code for a set of data is unncomputable; and even restricted versions of the problem are generally combinatorially explosive. This means that the sub-optimal solutions found by the cognitive system will depend on the nature of the search process. The extent to which the search is conducted serially or in parallel, whether it must be represented discretely, or can embedded into a continuous search problem (Durbin & Willshaw, ??; Smolensky, ??), whether it uses some version of gradient descent (REF?), stochastic gradient descent (G&G; Hinton & Sejnowski??; Kirkpatrick??), techniques such as simulated annealing (Geman & Geman, ??), and so on, are crucial, and unresolved, issues. (CHECK MDL TRICKS) RELAXATION SEARCHES.

 

Speed PUT IN REFS

The discussion so far has also ignored cognitive limitations concerning speed of processing. In perception, for example, many complex patterns are found within fractions of a second (e.g., Hadyn fast faces; M-W shadowing). In view of the slowness of neural hardware, this suggests that the search process must take only a small number of steps--this places very strong constraints on the nature of the search process (Feldman & Ballard, 1982?; R&McC??; Chater & Oaksford, 1990).

            Speed also plays an important role in another way: that the representations used by the cognitive system must not only be brief, but must easy to use quickly. In some contexts, there may be a trade-off between brevity and speed. Consider an example from computer science: arithmetical operations may be rapidly computed by consulting a large look-up table in which the answers to particular arithmetic operations are prestored (particularly if this table can be searched in parallel); by contrast, a more compact representation of arithmetic, for example, in terms of axioms in some logical language, may be much briefer, but require much more computation to use. This tension between cognitive goals of speed and brevity may also be important in psychological contexts24.

 

Innate constraints PUT IN REFS

The simplicity viewpoint outlined here may appear to be tied to a strong empiricist view of cognitive development. The emphasis has been on the criteria that the cognitive system can use to find patterns; this assumes that the patterns have to be found from experience, rather than being innately specified. Nonetheless, the simplicity viewpoint is equally compatible with empiricist and nativist viewpoints. Even strong nativists require that the cognitive system searches for patterns--but they claim that this search is subject to strong innate constraints. For example, strong nativist viewpoints regarding language acquisition typically involve the claim that the child can entertain only a restricted set of grammars (e.g., Chomsky, 1980; Pinker, ??).But the problem of finding the correct grammar most compatible with linguistic experience is still immensely difficult (e.g., see Redington, Chater & Finch, in press, for discussion)--and this pattern finding problem may still be guided by simplicity. The restriction to a small set of grammars amounts to having constrained internal representation in terms of which linguistic hypotheses can be stated. But the simplicity principle may nonetheless apply: the grammar chosen may be that which provides the briefest encoding of linguistic input (see, e.g., Brent, ??; Grünwald, ??; Wolff, ??). More generally, the simplicity principle applies to problems of finding patterns in the world from experience, whether or not finding such patterns is guided by innate constraints.

 

The importance of interests

I have so far considered the cognitive system as engaged in a disinterested search for patterns. This leaves out the fact that the some patterns are relevant and others irrelevant to the interests of the agent. Clearly, people are more concerned in each other’s faces than with patterns of shadow; and they are more concerned with each other’s voices than with the sounds of footsteps or distant traffic. Faces and voices are interesting not merely because they contain rich patterns, but because they are of fundamental importance in relating to other people, and thereby are relevant to achieving almost any goal a person may have. Equally, it seems plausible that the perception of the physical world is to some degree geared towards the detection of affordances (Gibson, 1979)--properties of objects that are potentially relevant to the actions of the agent (e.g., whether an object can be eaten, lifted, thrown, and so on).

            The role of interests is beyond the scope of the simplicity principles, but compatible with it. Interests affect how much cognitive effort is directed towards finding different kinds of patterns; but the pattern finding process may, nonetheless, proceed without reference to interests, and may be guided only by simplicity. Scientific research provides an appropriate analogy--various practical interests may determine the level of resources devoted to different areas of research, but the research itself should use disinterested scientific criteria, without reference those interests. Indeed, it is generally assumed that interests must not be allowed to influence scientific research directly (e.g., the conclusions reached should be based purely on evidence, rather choosing conclusions on the basis of political or social convenience), for scientific research to be valuable to society. Similarly, I assume that a separation between interests and the criteria for finding patterns is cognitively desirable, and that the remarkable success of the cognitive system indicates that, to a large degree at least, this separation is respected25.

 

Conclusions

Many cognitive processes find patterns in experience--from perceptual processing to scientific thinking. I suggest that the cognitive system searches for the patterns according to simplicity --where simple patterns are those which allow a brief specification of the available data. This is normatively justified as providing a sounds basis for prediction and explanation; and provides an attractive framework for descriptive psychological theories in a range of cognitive domains. I propose that it is worth exploring further the hypothesis that the search for simplicity is a fundamental cognitive principle.

 


References

 

Alberoni, F. (1962). Contribution to the study of subjective probability: I. Journal of General Psychology, 66, 241-264.

Almor, A. & Sloman, S. A. (1996). Is deontic reasoning special? Psychological Review. 103, 374-380.

Anderson, J. R. (1990). The adaptive character of thought, Hillsdale, N.J.: Lawrence Erlbaum Associates.

Anderson, J. R. (1991a). Is human cognition adaptive? Behavioral and Brain Sciences, 14, 471-517.

Anderson, J. R. (1991b). The adaptive nature of human categorization. Psychological Review, 98, 409-429.

Anderson, J. R. & Schooler, L. J. (1991). Reflections of the environment in memory. Psychological Science, 1, 396-408.

Ashby, F. G. & Townsend, ??? (1986) ON SIMILARITY

Attneave, F. (1959). Applications of information theory to psychology. New York: Holt, Rinehart & Winston.

Attneave, F. (1972). Representation of physical space. In A. W. Melton & E. J. Martin (Eds.), Coding processes in human memory (pp. 283-306). Washington, D.C.: Winston. 

Attneave, F. & Frost, R. (1969). The determination of perceived tridimensional orientation by minimum criteria. Perception & Psychophysics, 6, 391-396.

Bar-Hillel, M., & Wagenaar, W. A. (1991). The perception of randomness. Advances in Applied Mathematics, 12, 428-454.

Berger, J. O. (1985). Statistical decision theory and Bayesian analysis. New York: Springer-Verlag.

Brunswick, E. (1956). Perception and the representative design of psychological experiments. Berkeley, CA: University of California Press.

Budescu, D. V. (1987). A Markov model for generation of random binary sequences. Journal of Experimental Psychology: Human Perception and Performance, 13, 25-39.

Buffart, H., Leeuwenberg, E. & Restle, F. (1981). Coding theory of visual pattern completion. Journal of Experimental Psychology: Human Perception and Performance, 7, 241-274.

Chaitin, G. J. (1966). On the length of programs for computing finite binary sequences. Journal of the Association for Computing Machinery, 13, 547-569.

Chater, N. & Oaksford, M. (1990). Autonomy, implementation and cognitive architecture: A reply to Fodor and Pylyshyn. Cognition, 34, 93-107.

Chater, N. & Oaksford, M. (1993). Logicism, mental models and everyday reasoning: Reply to Garnham, Mind & Language, 8, 72-89.

Chater, N. (1996). Reconciling simplicity and likelihood principles in perceptual organisation. Psychological Review, 103, 566-581.

Chater, N. (in press). Simplicity and the mind. The Psychologist.

Chater, N. (submitted). Perceptual Organisation and Figural Goodness: One Principle or Two?

Chater, N., Crocker, M., & Pickering, M. (in press). The Rational Analysis of Inquiry: The Case of Parsing. In M. Oaksford & N. Chater (Eds.) Rational Models of Cognition. Oxford: Oxford University Press.

Chater, N. & Hahn, U. (1996). ???. Proceedings of the ???

Chater, N. & Hahn, U. (in press).???. In K. Lamberts & D. Shanks (Eds.) ???

Chater, N. & Hahn, U. (in preparation). Representational distortion as a theory of similarity.

Chater, N. & Oaksford, M. (submitted). Rational analysis and heuristic processes for syllogistic reasoning.

Chater, N., Oaksford, M., Nakisa, R., & Redington, M. (submitted) Fast, frugal and rational: Rational analysis and cognitive algorithms in human reasoning.

Cheeseman, P. (1995). On Bayesian model selection. In Wolpert, D. (Ed.), The mathematics of generalization (pp. 315-330). Redwood City, CA: Addison-Wesley.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge: MIT Press.

Cover, T. M. & Thomas, J. A. (1992). Elements of information theory. New York:  John Wiley.

Dinnerstein, D. & Wertheimer, M. (1957). Some determinants of phenomenal overlapping. American Journal of Psychology, 70, 21-37. 

Evans, J. St. B. T., & Over, D. E. (1996). Rationality in the selection task: Epistemic utility versus uncertainty reduction. Psychological Review, 103, 356-363.

EVANS AND OVER 96? BOOK

Falk, R., & Konold, C. (in press). Making sense of randomness: Implicit encoding as a basis for judgment. Psychological Review.

Fodor, J. A. (1983). Modularity of mind. Cambridge, MA: MIT Press.

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28, 3-71.

Garner, W. R. (1962). Uncertainty and structure as psychological concepts. New York: John Wiley.

Garner, W. R. (1974). The processing of information and structure. Potomac, Md: LEA.

Gibson, J. J. (1979). The ecological approach to visual perception.  Boston: Houghton-Mifflin.

Gigerenzer, G. & Goldstein, D. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103, 650-669.

Gigerenzer, G. & Hell, W. & Blank, H. (1988). Presentation and content: The use of base-rates as a continuous variable. Journal of Experimental Psychology: Human Perception and Performance, 14, 513-525.

Gigerenzer, G. & Murray, D. J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum.

Glass, L. (1969). Moiré effect from random dots. Nature, 223, 578-580.

Glass, L., & Peréz, R. (1973). Perception of random dot interference patterns. Nature, 246, 360-362.

Goa, Q., Li, M., & Vitányi, P. (1989). Learning on-line handwritten characters. In 11th International Joint Conference on Artificial Intelligence, pp. 843-848, San Mateo, CA: Morgan Kaufman.

Harman, G. (1965). The inference to the best explanation. Philosophical Review, 74 ,88-95.

Hatfield, G. & Epstein, W. (1985). The status of the minimum principle in the theoretical analysis of visual perception. Psychological Bulletin, 97, 155-186.

Helm, P. A. van der, & Leeuwenberg, E. L. J. (1986). Avoiding explosive search in automatic selection of simplest pattern codes. Pattern Recognition, 19, 181-191.

Helm, P. A. van der, & Leeuwenberg, E. L. J. (1996). Goodness of visual regularities: A non-transformational approach. Psychological Review, 103, 429-456.

Helm, P. A. van der, & Leeuwenberg, E. L. J. (1997?).

Helm, P. A. van der, Lier, R. van & Leeuwenberg, E. L. J. (1992). Serial pattern complexity: irregularity and hierarchy. Perception, 21, 517-544.

Helmholtz, H. von (1910/1962). Treatise on physiological optics. (Vol. 3) (J. P. Southall, Ed. and translation), New York: Dover.

Hochberg, J. (1982). How big is a stumulus? In J. Beck (Ed.), Organization and representation in perception. Hillsdale, NJ: LEA, pp. 191-218.

Hochberg, J.  & McAlister, E. (1953). A quantitative approach to figure “goodness.” Journal of Experimental Psychology, 46, 361-364.

Hume, D. (1965). A treatise on human nature. L. A Selby-Bigge (Ed.), Oxford: Clarendon Press (Original work published1739-1740).

Johansson, G. (1950). Configurations in event perception. Stockholm: Almqvist & Wiksell.

Kahneman, D., Slovic, P. & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press.

Kahneman, D. & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 237--251.

Kanizsa, G. & Gerbino, W. (1982). Amodal completion: Seeing or thinking? In J. Beck (Ed.) Organization in representation and perception (pp. 167-190). Hillsdale, NJ: Erlbaum.

Koffka, K. (1962). Principles of Gestalt psychology (5th ed.). London: Routledge and Kegan Paul. (Original work published in 1935).

Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problems in Information Transmission, 1, 1-7. 

Laming, D. (1996). On the analysis of irrational data selection: A critique of Oaksford and Chater (1994). Psychological Review, 103, 364--373.

Leeuwenberg, E. (1969). Quantitative specification of information in sequential patterns. Psychological Review, 76, 216-220.

Leeuwenberg, E. (1971). A perceptual coding language for perceptual and auditory patterns. American Journal of Psychology, 84, 307-349.

Leeuwenberg, E. & Boselie, F. (1988).  Against the likelihood principle in visual form perception. Psychological Review, 95, 485-491.

Li, M., & Vitányi, P. (1993). An introduction to Kolmogorov complexity and its applications. New York: Springer-Verlag.

Li, M. & Vitányi, P. (1995). Computational machine learning in theory and praxis. In J. van Leeuwen (Ed.) Computer Science Today (pp. 518-535). Heidelberg: Springer-Verlag.

Li, M. & Vitányi, P. (1997). On prediction by data compression. ms.

Lier, R. van, Helm, P. A. van der, & Leeuwenberg, E. L. J. (1994a). Integrating global and local aspects of visual occlusion, Perception, 23, 883-903.

Lier, R. van, Helm, P. A. van der, & Leeuwenberg, E. L. J. (1994b). Competing global and local aspects of visual occlusion. Journal of Experimental Psychology: Human Perception and Performance, 21, 571-583.

Lopes, L. L., & Oden, G. C. (1987). Distinguishing between random and nonrandom events. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 392-400.

Mach, E. (1959). The analysis of sensations and the relation of the physical to the psychical. New York: Dover Publications. (Original work published 1886).

Mach, E. (1960). The science of mechanics. La Salle, IL: Open Court (Original work published 1883).

Marr, D. (1982). Vision. New York: Freeman.

Medin, D. L., Goldstone, R. & Gentner, D. (1993). Respects for similarity. Psychological Review, 100, 254-278.

Minsky, M. (1977). Frame system theory. In P. N. Johnson-Laird, & P. C. Wason (eds.), Thinking: Readings in cognitive science, (pp. 355-376). Cambridge: Cambridge University Press.

Mumford, D. (1992). Pattern theory: A unifying perspective. In Joseph, A., Mignot, F., Murat, F., Prum, B. & Rentschler, R. (Eds.). Proceedings of the First European Congress of Mathematics (pp. 187-224). Basel: Birkhäuser Verlag.

Oaksford, M., & Chater, N. (1991). Against logicist cognitive science. Mind & Language, 6, 1-38.

Oaksford, M., & Chater, N. (1992). Bounded rationality in taking risks and drawing inferences. Theory & Psychology, 2, 225-230.

Oaksford, M., & Chater, N. (1993). Reasoning theories and bounded rationality. In K. I. Manktelow, & D. E. Over (Eds.), Rationality, (pp. 31-60). London: Routledge.

Oaksford, M., & Chater, N. (1994). A rational analysis of the selection task as optimal data selection. Psychological Review, 101, 608-631.

Oaksford, M., & Chater, N. (1995a). Information gain explains relevance which explains the selection task. Cognition, 57, 97--108.

Oaksford, M., & Chater, N. (1995b). Theories of reasoning and the computational explanation of everyday inference. Thinking and Reasoning, 1, 121-152.

Oaksford, M., & Chater, N. (1996). Rational explanation of the selection task. Psychological Review, 103, 581--591.

Oaksford, M., Chater, N., Grainger, R. & Larkin, J. (in press). Optimal data selection in the reduced array selection task (RAST). Journal of Experimental Psychology: Learning, Memory and Cognition.

Oaksford, M., & Chater, N. (in press) (Eds.). Rational models of cognition CHECK. Oxford: Oxford University Press.

Oaksford, M., & Chater, N. (in preparation?). ??CONDITIONAL INFERENCE

Paris, J. (1992) The uncertain reasoner’s companion. Cambridge: Cambridge University Press.

Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo, CA: Morgan Kaufman.

Perkins, D. N. (1972). Visual discrimination between rectangular and nonrectangular parallelepipeds. Perception and Psychophysics, 12, 396-400.

Perkins, D. N. (1982). The perceiver as organizer and geometer. In J. Beck (Ed.), Organization and representation in perception (73-93). Hillsdale, NJ: LEA.

Pomerantz, J. R. & Kubovy, M. (1986). Theoretical approaches to perceptual organization: Simplicity and likelihood principles. In: K. R. Boff, L. Kaufman & J. P. Thomas (Eds.), Handbook of perception and human performance, Volume II: Cognitive processes and performance. (pp. 36:1-45) New York: Wiley.

Popper, K. (1959). The logic of scientific discovery. New York, Basic Books (1st edition, Logik der Forschung, 1934).

Pylyshyn, Z. W. (1984). Computation and cognition. Cambridge, MA: MIT Press.

Quinlan, J. & Rivest, R. (1989). Inferring decision trees using the minimum description length principle. Information and computation, 80, 227-248.

Restle, F. (1970). Theory of serial pattern learning: Structural trees. Psychological Review, 77, 481-495. 

Restle, F. (1979). Coding theory of the perception of motion configurations. Psychological Review, 86, 1-24.

Rissanen, J. (1987). Stochastic complexity. Journal of the Royal Statistical Society, Series B, 49, 223-239.

Rissanen, J. (1989). Stochastic complexity and statistical inquiry. Singapore: World Scientific.

Rock, I. (1975). An introduction to perception. New York: Macmillan.

Rock, I. (1983). The logic of perception. Cambridge, MA: MIT Press.

Segall, M. H., Campbell, D. T. & Herskovits, M. J. (1966). The influence of culture on visual perception.  Indianapolis, Ind.: Bobbs-Merrill.

Shannon, C. E. (1948). The mathematical theory of communication. Bell System Technical Journal, 27, 379-423, 623-656.

Shepard, R. N. (1981). Psychophysical complementarity. In M. Kubovy & J. R. Pomerantz (Eds.), Perceptual organization (pp. 279-342). Hillsdale, NJ: LEA. Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317-1323.

Simon, H. A. (1972). Complexity and the representation of patterned sequences of symbols. Psychological Review, 79, 369-382.

Simon, H. A. & Kotovsky, K. (1963). Human acquisition of concepts for sequential patterns. Psychological Review, 70, 534-546.

Sober, E. (1975). Simplicity. Oxford: Clarendon Press.

Solomonoff, R. J. (1964). A formal theory of inductive inference, Parts 1 and 2. Information and Control, 7, 1-22, 224-254.

Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.

Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.

Vitányi, P. & Li, M. (1996). Minimum description length induction, Bayesianism and Kolmogorov complexity. Manuscript. 

Vitz, P. C. & Todd, T. C. (1969). A coded element of the perceptual processing of sequential stimuli. Psychological Review, 76, 433-449.

Wallace, C. S. & Boulton, D. M. (1968). An information measure for classification. Computing Journal, 11, 185-195. 

Wallace, C. S. & Freeman, P. R. (1987). Estimation and inference by compact coding. Journal of the Royal Statistical Society, Series B, 49, 240-251.

Wason, P. C. (1966). Reasoning. In B. Foss (ed.), New horizons in psychology (pp. 135-151), Harmondsworth, Middlesex: Penguin.

Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20, 273-281.

Zvonkin, A. K. & Levin, L. A. (1970). The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 25, 83-124.


Notes

Author’s Note

I would like to thank Mark Ellison, Steven Finch, Ulrike Hahn, Peter van der Helm, Emmanuel Leeuwenberg, James McClelland, Mike Oaksford, Martin Pickering, Emmanuel Pothos, Martin Redington, Jerry Seligman, Julian Smith, Paul Vitányi and Johan Wagemans for valuable discussions of these ideas at various stages in their development. An brief and informal outline of some of the material here is given in an article for The Psychologist (Chater, in press).

 

Footnotes

1. The original formulation of “Occam’s razor” is that explanations postulating the smallest number of entities should be prefered; thus it embodies a specific measure of simplicity in terms of number of objects involved in an explanation. Occam’s razor has since been interpreted more broadly as expressing a preference for simple explanations.

2. Indeed, without some implicit adherence to a simplicity principle, classical statistical approaches to modelling data would be incoherent, because increasing the generality of a model (e.g., switching from a straight line to a cubic) can only improve the fit with the data.

3. Van Lier, van der Helm & Leeuwenberg (1994b) consider cases where these principles conflict, and a simplicity-based approach to resolving them.

4. Popper allows that different hypotheses might be differentially favored for investigation on the basis regarding their falsifiabilty--but this does not bear on the examples given here, because they are all equally specific, and therefore equally falsifiable.

5. Some approaches have explicit defined a coding languages (e.g., Buffart, Leeuwenberg & Restle, 1981; Leeuwenberg, 1969); others have measured code lengths by appeal to probability and information theory (e.g., Attneave, 1959; Garner, 1962).

6. In practice, as Simon (1972) notes, the second problem is often not severe in practice, because description lengths in different proposed description languages tend to be highly correlated.

7. I follow the analysis in Vitányi & Li (1996) in this subsection.

8. This identification between probability and Kolmogorov complexity may seem bizarre. After all, no constraints have been placed on the probability values, so how can they be approximated by any function? This concern is legitimate--a fundamental assumption that the all probability measures are computable. Assuming the Church-Turing hypothesis, these distributions are, in any case, the only ones that the cognitive system can entertain, so this is a mild restriction in this context. It then turns  out that, remarkably, that there exists a “universal” probability distribution, m(x), which, within a multiplicate constant, assigns at least as high a probability to each object, x, as does any computable probability distribution, P(x). Moreover, it turns out that -log2m(x) = K(x), to within an additive constant. This justifies the step from probabilities to code lengths in moving from equation (2) to equation (3). The underlying matematical theory is outlined in Li & Vitányi, 1993, chapter 4, and was pioneered by Zvonkin & Levin (1970).

9. There are “pathological” cases where this procedure does not minimize (8), but these can be ignored in practice.

10. Other decision rules, such as choosing the action for which the worst possible outcome is least bad.

11. Indeed, this expansion appears to “pass” all known statistical tests for randomness (Li & Vitányi, 1993)--so even intensive statistical analysis would also fail to reveal any structure.

12. I thank Johan Wagemans (personal communication) for stressing the importance of this point.

13. Although neither the simplicity, nor the likelihood principle are currently explicitly advocated in the context of human language processing, many proposals in this area can be viewed in these terms, as Martin Pickering (personal communication) has pointed out. For example, Frazier ??? can be viewed as arguing that the parser prefers structures which are syntactically simple; and Tanenhaus and colleagues (??) can be viewed as arguing for a likelihood approach to parsing, which probabilistic information is drawn from a wide range of sources.

14. Note also that in artificial intelligence there has been an increasing tendency to employ probabilistic methods (e.g., Pearl, 1998; Paris, 1992).

15. A different analysis may be appropriate for understanding learning from instruction, and learning from both instruction and experience. The question of how or whether the search for simplicity, as outlined here, can be applied in such contexts in an important topic for future research.

16. For a fuller discussion of the issues raised in this section, see Chater (1996).

17. The converse is also true, aside from certain technical restrictions concerning computability, which are not of psychological relevance (Chater, 1996).). This follows directly from  the analysis in the subsection Simplicity and the most likely explanation above, relating the simplicity principle to probability theory.

18. Chater (1996) note that there are, nonetheless, senses in which a dispute between the two principles may remain, for exasmple if each principle is interpreted at what Marr (1982) calls the “algorithmic,” rather than the “computational” level. Moreover, van der Helm & Leeuwenberg (1997) argue for an alternative reconciliation between the two principles.

19. Alternative approaches has been opposed by theorists who argue that the goodness of a figure relates to the number of symmetries that it possesses with respect to transformations (e.g., reflection, translation, and so on).

20. Chater (1997) presents further theoretical and empirical arguments for a simplicity account of figural goodness, in distinction for Helm and Leeuwenberg’s weight of evidence.

21. Li & Vitányi (1993) have used this idea to develop the theory of information distance, which inspired representational distortion as a psychological proposal. Paul Vitányi (personal communication) notes that the mathematical notion of information distance was originally developed with cognitive considerations in mind.

22. Indeed, finding the representational distortion between arbitrary representations is known to be an uncomputable function, and hence must necessarily be approximated (Li & Vitányi, 1993).

23. Essentially, the the logarithm of the probabilities in (12) can be identified with their Kolmogorov complexities--this reflects the general relation between code length and probability discussed in Part II.

24.For  yet another sense in which speed can conflict with simplicity/likelihood, see Chater, Crocker and Pickering (in press).

25. Clearly this is not always the case. It is anecdotally clear that people do tend to believe what they want to believe SOC PSY REFS. But in general we do not see, hear and infer only what we want to see, hear and infer; and were this the case, the consequences would presumably be disasterous (see Fodor, 1983 for related discussion).

 


Figures

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1. An infinite number of incompatible patterns are compatible with any finite sequence of data.


 

 

 

 

 

 

Figure 2. Occluded figures are consistent with an infinite number of continuations.



 

    

                                      (a)

 

 

                                      (b)

 

 

                                      (c)

 

 

 

 

 

 

Figure 3. Pairs of objects which contain shared patterns.