Selective Sampling with Information-Storage Constraints

A decision-maker acquires payoff-relevant information until she reaches her storing capacity, at which point she either terminates the decision-making and chooses an action, or discards some information. By conditioning the probability of termination on the information collected, she controls the correlation between the payoff state and her terminal action. We provide an optimality condition for the emerging stochastic choice. The condition highlights the benefits of selective memory applied to the extracted signals. The constrained-optimal choice rule exhibits (i) confirmation bias, (ii) speed-accuracy complementarity, (iii) overweighting of rare events, and (iv) salience effect.


Introduction
Economic agents often acquire information about the state of the economy before making their decisions. The information is typically modelled as a signal that helps the agent refine the distribution of the state and improve the decision-making. Often, signals come over time and agents can absorb only a small number of them. We capture this information-processing friction by assuming that agents can receive as many signals as they wish but can remember only a finite number of them when making their choices. In the simplest setting we analyze, the agent can only remember one signal. A key strategic variable that we consider is to allow the agent to ignore some signals with positive probability and restart the signal extraction process. An agent in the face of the first signal she observes can either make a choice based on this observation or dispose with the first signal and rerun the very same stochastic signal extraction process. In the face of the second observed signal she can either make a final decision (after the second observation and according to the same strategy that maps instantaneous observations to choices) or rerun the signal extraction again and so on. Specifically, we allow agents to employ an arbitrary stationary decision process that specifies for each possible signal realization a probability with which the agent restarts the process as well as the chosen action in case of termination. We do not impose time constraints and costs in the basic formulation so that the friction comes solely from the limited information-storing capacity of the agent.
We ask ourselves: Should the agent optimally make her choice as soon as she receives the first signal whatever the realization of it is, or could she be better off by rerunning the very same information-acquisition process? Can hesitation-selective repetition of a fixed stochastic decision procedure where the repetition conditions on the procedure's outcome-be welfare-enhancing?
A general insight is that selective rerunning of the primitive decision procedure is typically optimal. To document this most generally, we provide a simple necessary condition satisfied by the optimal rerunning strategy. The result is an interim indifference condition imposed on the agent who has concluded her decision-making with a plan to choose a particular action. Given the recommended action, the agent's posterior expected payoff from implementing this action must be the same as the posterior expected payoff from rerunning the whole decision-making-the whole selective repetitions of the primitive signal extraction-and implementing whichever action the second run of the decision-making will recommend. We refer to this as to the second-thought-free condition.
For illustration, consider a binary decision of whether to make an investment of a fixed size.
The agent receives payoff 1 if she invests in the good state of the economy, payoff −2 if she invests in the bad state, and receives 0 when she does not invest whatever the state. Both states are a priori equally likely and give rise to a population of good and bad signals, with the share of the good signals at 90% in the good state and 10% in the bad state. The agent draws possibly several signal realizations in sequence but remembers only the last one when making her investment choice. We assume she invests if the last drawn signal was good (and does not invest upon the last bad signal).
Observe that the decision rule generated by the immediate termination upon the first signal that comes in does not generate a second-thought-free choice rule: An agent whose first observed signal was good prefers to rerun the decision process, since the new run will either lead to investing again or will give rise to the signal realization that conflicts with the first observation and will lead to not investing. Since, conditional on two conflicting signals, not investing is preferred (because the false-positive investment error is relatively costly), the agent benefits from having second thoughts when the first observed signal is good.
In the sequel, we interpret the probability of terminating the decision process after receiving a particular signal as a search intensity for the signal. A higher probability of termination at a given information set inflates the likelihood that the agent makes the terminal choice at the set.
We show that the failure of the second-thought-free condition with uniform search intensity in the above investment decision example indicates that relative to the uniform search, the agent benefits from increasing the search intensity for the bad attribute.
More generally, the second-thought-free condition follows from the first-order condition imposed on the optimal search intensities. In the above example, a marginal increase in the relative search intensity in favor of the bad signal is welfare-enhancing when starting from the immediate termination strategy. Consider a deviation from the immediate termination strategy that consists of repeating the signal draw with a small probability whenever the observed signal is good. This new decision procedure effectively replaces a marginal measure of contingencies in which the agent terminates after observing the good signal with new draws of the signal. The indifference to marginal changes in the search intensities at optimum implies the second-thought-free condition. Given that a typical signal structure combined with the immediate termination strategy would not result in a second-thought-free rule for a generic payoff function, we conclude that some asymmetric termination strategy that differentiates the termination decision according to the last observed signal is generically optimal.
The model provides microfoundations to a range of behavioral stylized facts. The unifying principle of our behavioral insights is the intuition that the agent targets her search towards the type of evidence that would provide her with more informed posteriors under the uniform search. This principle generates confirmation bias, since evidence that confirms the agent's prior leads to more informed posterior than does evidence that contradicts the prior. An optimally targeted information search also generates speed-accuracy complementarity; that is, accuracy of choice declines with the response time. The effect is generated by the confirmation bias: The agent encountering evidence contradicting her prior is likely to disregard the evidence and to have a second thought. Hence, long response times indicate a surprising state of the world, and the constrained-optimal choice rule commits errors in the surprising state relatively often. Overweighting of rare events occurs when the agent's task is to form a probability belief about an event that is known to be rare, such as a flight accident, by observing a random flight outcome. Since observing a flight accident is far more informative about the probability of future accidents than observing an uneventful flight, the agent optimally biases her search towards eventful flights. In the last behavioral application, we show that distinct states of the world are salient in the sense that they attract the agent's attention (i.e., trigger higher termination rates in our framework). The effect arises because an indistinct perception stimulus that can be generated by several similar states is less informative than a distinct perception stimulus likely generated by a specific distinct state. Hence, the optimal information search targets stimuli indicating distinct states.

Related literature
When the decision-maker can choose from feasible error distributions, then choice data partially reveal the decision-maker's objective, since the constrained-optimal choice rule exhibits costly types of error relatively rarely. At various levels of formalization, this error-management idea has appeared repeatedly in biology, psychology and economics; see Johnson et al. (2013) for an interdisciplinary review. A range of economic models make the error-management idea precise by specifying particular sets of feasible error distributions. These models differ greatly in the assumed constraints imposed on decision-making, and hence in the predicted constrained-optimal stochastic choice rules. In what follows we review the variety of the modeling approaches to frictional decision-making. Sims (2003), Matějka and McKay (2015) and Steiner et al. (2017) constrain the expected reduction of the entropy of the decision-maker's belief within the decision process. The entropy-based models generate constrained-optimal choice rules akin to the logit rules used in structural estimation. The details of the predictions over the stochastic choice are, however, sensitive to the assumed entropy-based constraint. A different information-processing constraint is assumed in sequentialsampling and drift-diffusion models, e.g. Wald (1945), Arrow et al. (1949), Ratcliff (1978), Hébert andWoodford (2016), and Morris and Strack (2017). By choosing the regions of the stopping beliefs, the decision-maker can trade off the accuracy against the speed of her decision procedure. The class of drift-diffusion models restricts the agents to learning procedures with continuously evolving Bayesian beliefs. In contrast to this assumption, Zhong (2017) argues that learning processes with discontinuous belief evolution are optimal under a broad class of information acquisition costs, and Che and Mierendorff (2016) study one such discontinuous learning in which signals arrive in a Poisson process. Yet another modeling approach to limited cognition conceptualizes decisionmaking and information processing as finite algorithms with adjustable parameters, e.g. Hellmann and Cover (1970), Compte and Postlewaite (2012), and Wilson (2014).
Compared to the above diverse models that fully specify the cognitive friction, our model delivers partial characterization of the constrained-optimal stochastic choice rule without a full specification of the cognitive constraint. Another research program delivering robust predictions about constrained-optimal stochastic behavior without full specification of the cognitive friction is by Caplin et al. (2017) who provide full behavioral characterization of all posterior-separable information cost models. Yet another alternative to our attempt to deliver predictions robust to the detail of the friction is to estimate the information processing cost from the choice data. The proposed methodologies of information cost identification in Caplin and Dean (2015) and Oliveira et al. (2017) make this approach theoretically feasible when rich data, such as choice data over menus or many decision problems, are available. This paper belongs to a growing economic literature that explains established behavioral phenomena as the constrained-optimal behavior of decision-makers facing information processing frictions. For instance, Robson (2001), Rayo and Becker (2007), Netzer (2009), and Khaw et al. (2017 provide microfoundations for risk attitudes; Gabaix and Laibson (2017) endogenize discounting; and Wilson (2014), Compte and Postlewaite (2012), and Leung (2017) establish constrained-optimal ignorance of weakly informative news. The partitional model of Dow (1991) focuses on the problem of coarsening of rich information available to the decision-maker. 1 A related model of extreme-events oversampling was developed by psychologists Lieder et al. (2014). Their decision-maker has access to the objective distribution of a payoff state but faces a friction in the computation of the payoff expectations. She forms the expected payoffs for considered actions in a Monte Carlo simulation as a utility average over a finite sample of simulated payoff states. Instead of the uniform sampling from the true belief distribution, she optimally oversamples states in which stakes are high (and corrects for the oversampling by lowering the respective weights), as in the so-called importance-sampling Monte Carlo method, e.g. Glynn and Iglehart (1989). This project and us examine the opposing sides of information processing. While we assume non-representative sampling in the signal collection and allow for frictionless formation of the posterior beliefs, Lieder et al. take available information as given and impose the friction on its interpretation.
Brain imaging technologies have recently advanced to the level that documenting the process of hesitation in real time for non-human subjects is possible. Studies on rats (Redish, 2016) and monkeys (Rich and Wallis, 2016) document that animals in some decision problems deliberate in a stochastic series of action plans, in which their minds move there-and-back between the available options. The researchers are able to link distinct hippocampal activities representing particular parts of a maze, and link in real time those brain patterns to a sequence of deliberation of a rat that has paused and is choosing how to proceed in the maze. Unlike the neural processes represented by the drift-diffusion model of Ratcliff (1978) that assumes a continuous evolution of a mental state, these deliberation processes, like our model, exhibit discrete transitions between the deliberated actions. 2

General model
An agent faces a decision under uncertainty. She chooses an action a ∈ A in a process specified below and receives a payoff u(a, θ) in the fixed payoff state θ ∈ Θ drawn from an interior prior 1 Somewhat less related is a literature that explores how mistakes in learning can lead to behavioral biases (see in particular the modeling of coarse learning in Jehiel (2005) that can give rise to overoptimism as shown in Jehiel (2017)). By contrast, in our approach, the agent is constrained in the information she can store but is otherwise behaving optimally given the constraint.
2 The study of Rich and Wallis (2016) suggests that, in contrast with our baseline model but in accord with our extensions in Section 6, information is aggregated across the rounds of deliberation. distribution π ∈ Δ(Θ). The state space Θ and the action set A are finite. The agent chooses a signal structure-or equivalently a Blackwell experiment-p, where p is characterized by a family of conditional signal distributions p(x | θ), θ ∈ Θ. The experiment generates a signal x from a finite signal space X. The conditional signal distributions are fully mixed: p(x | θ) > 0 for all x, θ. We allow the agent to choose among possibly several such Blackwell experiments and we let P denote the set of experiments from which she chooses. We impose no restrictions on the set P of the feasible experiments (other than the full-support of each p).
One interpretation is that each experiment p ∈ P is a particular reasoning approach available to the agent. The agent can repeat the selected experiment arbitrarily many times, but she is unable to aggregate the information across the repetitions. Each run of the experiment is a cognition that exhausts the agent's capacity dedicated to the problem being solved. Once the agent hits the constraint at the end of the experiment, she can continue only after she unclogs her capacity by amnesia.
The agent can condition the repetition of the experiment on the last observed signal. She signal realization x; we call β a termination strategy. The agent runs the experiment p for the first time, receives signal realization x 1 with probability p(x 1 | θ) and terminates the reasoning with probability β x 1 . She restarts her reasoning with the complementary probability 1−β x 1 , and receives a signal realization x 2 from a new run of the process p with probability p(x 2 | θ), terminates with probability β x 2 or restarts with probability 1 − β x 2 , and continues to rerun the primitive process p until she terminates after a random number of repetitions of p; see Figure 1. When the agent chooses having distinct β x for different x, then she implements the familiar idea of selective memory; some facts and observations are easily forgotten whereas others are remembered and they trigger choice. After the agent terminates the reasoning with a terminal signal x, she selects an action a = σ(x) according to an action strategy σ : X −→ A. 3 Let S be the set of all mappings from X to A.
By excluding the termination strategy (0, . . . , 0) we prohibit the agent from avoiding to take the decision a ∈ A. Since β = (0, . . . , 0) and each feasible experiment p generates all signals with a positive probability in each state, the decision process almost surely eventually terminates. Note that we can accommodate agents who have an outside option by enlarging the action space.
The outcomes of distinct runs of p are conditionally independent. Thus, the probability that the agent terminates after t repetitions of the experiment p resulting in the signal history (1) We let r(a | θ; p, β, σ) = Figure 1: For each (p, β, σ), the decision process is a Markov chain evolving on the agent's states of mind, with transition probabilities that depend on the payoff state θ. The chain begins in the state of mind O and transits to states x ∈ X with probabilities p(x | θ). The process returns to O with probability 1 − β x , or terminates with choice of a = σ(x) with probability β x .
denote the probability that the agent who employs the experiment p, the termination strategy β, and action strategy σ terminates with action a. We call r(p, β, σ) := (r(a | θ; p, β, σ)) a∈A,θ∈Θ the choice rule. The set of feasible choice rules is R(P) = {r(p, β, σ) : p ∈ P, β ∈ B, σ ∈ S}. Sometimes we abuse notation, omit p, β, σ and write r(a | θ) for the probability of a in state θ under the rule constructed by some p, β, σ.
The repeated-cognition problem is to select a feasible choice rule r that maximizes the expected payoff: max r∈R(P) θ∈Θ,a∈A π θ r(a | θ)u(a, θ).
The agent solving the repeated-cognition problem knows the prior π, payoff function u, and the set P of the feasible processes p. The optimization in (3) can be an outcome of selective pressures that favor successful decision procedures via cultural or biological evolution, or via competition of firms differing in their internal procedures.
We focus on the benefit of the repeated cognition in that our baseline model abstracts from the cost of time and is therefore applicable to agents who can repeat the basic cognition process p quickly. The model extends to agents who exponentially discount future payoffs, and thus face non-trivial cost of repeated cognition. We report such an extension in Section 6.2.

Examples
The first example is the simplest specification of the above baseline model. The reader may focus on this elementary setting during the first reading of the paper.
Example 1 (elementary setting). The agent has access only to one statistical experiment: where p is an exogenous information structure that specifies conditional signal distributions p(x | θ).
In this case, the agent only chooses the termination probabilities β x , and the action strategy σ that determines the action a = σ(x) chosen at each terminal signal x.
Our baseline model, however, accommodates more complex cognitive processes that involve remembering multiple signals, or partial aggregation of information over time. The accommodation of these complex examples within our baseline model is nontrivial and postponed to Section 6.1.
Example 2 (imperfect information aggregation). This setting relaxes the agent's inability to aggregate information across the repetitions of her reasoning by endowing her with a finite set of memory states that she can use to represent the signal histories. One interpretation of this example is that the agent keeps track of the number of elapsed rounds, can count up to a finite number, and her count of the rounds is allowed to be stochastic. The setting of this example builds on Hellmann and Cover (1970) and Wilson (2014).
The agent is endowed with one Blackwell experiment μ(x | θ) with a finite signal space X and, additionally, with a finite set M of the memory states m. After each run of the experiment μ, the agent either terminates or continues with decision-making. If the agent continues, then she transitions from the current memory state to a new memory state and reruns the statistical experiment μ(x | θ). That is, the agent selects a (generalization of the) termination strategy: is the probability that the agent in the memory state m who has observed signal x in the last run of the experiment μ continues with the decisionmaking and transitions to the memory state m , and γ(t | m, x) is the probability that such an agent terminates. We restrict γ(t | m, x) to be positive for all pairs m, x to ensure that the decisionmaking almost surely eventually terminates. The terminating agent chooses action σ(m, x) that depends both on the current memory state and on the signal acquired in the last run of μ. The agent starts the decision-making in the memory state m 0 . A pair γ, σ induces a θ-dependent Markov chain over the memory states that eventually terminates with choice σ(m, x), where m is the last memory state and x is the last signal received. Let p(a | θ; γ, σ) be the probability that the agent terminates with the choice a in state θ, and let P iia be the set of all stochastic choice rules p that this agent can construct. She selects the choice rule from P iia that maximizes her ex ante expected payoff.
Although, unlike in our baseline model, the agent of this example is endowed with nontrivial memory, we show in Section 6.1 that the example can be accommodated within our baseline model by a suitable choice of the set P of the experiments and of the signal space X.
Example 3 (partial forgetting). In this example, the agent comprehends up to N > 1 signals sequentially drawn from the experiment μ(x | θ) with finite signal space X. She can discard some of the accumulated data at any interim stage of the decision process. Unlike in the baseline model, she is not restricted to discarding all her information.
unless h is a truncation of h, or h = hx for some x ∈ X and |hx| ≤ N . Constraints 1 and 2 require the agent to condition the decision to discard information or to terminate only on her current history independently of the state. Constraint 3 allows the agent to expand her information set only by running the experiment μ(x | θ). Constraint 4 restricts each step of information acquisition to one draw from μ(x | θ) or to a partial discarding of the accumulated information. Let p(a | θ; γ, σ) be the probability that the agent who employs (γ, σ) terminates with action a in the state θ. The agent chooses γ and σ to maximize her ex ante expected payoff.
Again, we show in Section 6.1 that this example can be embedded within the baseline model once the set P of the experiments and the signal space X are suitably specified.

Optimal cognition biases
We now derive a necessary optimality condition that the choice rule solving the repeated-cognition problem must satisfy. We argue in Section 3.3 that, generically, the condition requires the agent to engage in selective memory-that is, to ignore some signals more often than others.

Second-thought-free choice rules
We start with a definition of second-thought-free choice rules. If the agent's decision process generates such a rule, then she has no incentive to rerun the process regardless of the action recommendation with which the process terminates. Our main result below states that an optimal rule that solves the repeated-cognition problem is second-thought-free.
Let A and Θ be finite action and state sets with generic elements a and θ. Let r be a generic stochastic choice rule that specifies conditional probabilities r(a | θ) of each action a ∈ A in each state θ ∈ Θ. Definition 1. The choice rule r is second-thought-free with respect to the utility u if the agent prefers each action recommended by the rule to a new run of the rule r. That is, for each action a chosen with positive probability, where the expectations are with respect to the random variables θ and a 2 , and α(θ, a 1 , a 2 ) = π θ r(a 1 | θ)r(a 2 | θ) is the joint probability distribution of the state and two actions consecutively recommended by the rule r.
Consider an agent whose decision process generates choice a with probability r(a | θ) in state θ. The definition requires the agent who terminates her decision process with an action plan a to weakly prefer the action a to forgetting a and choosing whichever action a new run of the decision process will recommend. Although the definition allows for the strict preference against having a second thought, the next lemma shows that (4) is always met with equality: If a choice rule is second-thought-free, then the agent is indifferent between terminating and the second thought.
The lemma is a simple consequence of the law of iterated expectations. (4) is met with equality for each action a chosen with positive probability:

Lemma 1. If a choice rule r is second-thought-free, then the condition
All proofs are relegated to Appendix. We refer to (5) as the second-thought-free condition.

Optimality condition
Our main result follows. Note that the existence of the solution to the repeated-cognition problem is not guaranteed since we do not impose any restrictions on the set P of the feasible experiments.
We prove existence for a binary setting in Section 4, where we impose more structure on the model. (3), then it is second-thoughtfree, and satisfies (5).

Proposition 1. If a choice rule solves the repeated-cognition problem
To understand the statement, consider the optimal choice rule r * generated by a process that consists of a random number of repetitions of a primitive cognition p. Once these repetitions of p terminate with a terminal signal x and the agent is about to take an action a = σ(x), then, according to the proposition, she must be indifferent between a, and running the process associated with r * from scratch, where the new run of r * would involve new repetitions of p.
To prove Proposition 1, we first introduce an effective experiment s(p, β) and distinguish it from the primitive experiment p. While p(x | θ) specifies the probability that one run of the experiment p results in signal x, s(x | θ; p, β) is the probability that selective repetitions of p according to the termination strategy β terminate with the signal x. Relative to the primitive probabilities p(x | θ), the effective probabilities s(x | θ; p, β) are inflated for those signals x at which the agent terminates with a high probability β x : Lemma 2. The probability that the agent who employs a primitive experiment p and a termination strategy β terminates with signal x in state θ is The simple proof in the appendix exploits the stationarity of the decision process. The lemma implies that s(p, β) and hence also r(p, β, σ) are homogeneous of degree zero with respect to β.
Thus, since we abstract from the delay costs, for any optimal termination strategy β * , αβ * for α ∈ (0, 1) is optimal too, and it generates the same choice rule as β * .
Lemma 2 can be used to rewrite the agent's objective as an explicit function of the termination strategy: The repeated-cognition problem is equivalent to Proposition 1 follows from the first-order condition with respect to the termination strategy β.
Somewhat counterintuitively, the second-thought-free condition imposes a non-profitability requirement on a non-stationary deviation that is not feasible to the agent. The second-thought-free condition imposed on the rule r(p * , β * , σ * ) induced by the optimal (p * , β * , σ * ) considers the agent who has repeated the primitive experiment p * a stochastic number of times according to the strategy β * , and is about to terminate for the first time. The condition asks whether the agent can gain by not terminating and by repeating the whole decision process r(p * , β * , σ * ) once and only once.
Since our agent is allowed to construct only rules based on stationary termination strategies, this considered deviation is infeasible to the agent. The second-thought-free condition follows from a marginal argument. Consider a marginal decrease of the termination probabilities β x for all signals This replaces a small measure of contingencies in which the agent terminates the process r(p, β, σ) with action a with a longer process that consists of two or more repetitions of r(p, β, σ). The probability of the replacement with k repetitions of r(p, β, σ) is of the order of ε k . Thus, the marginal optimality condition only compares the profitability of the termination after the first run of r(p * , β * , σ * ) with its single repetition, and neglects all further repetitions. At optimum, the agent is indifferent between the termination with a and the single repetition of the optimal process.
Comment. Proposition 1 does not make reference to the set P of the feasible experiments.
Rather, the second-thought-free condition is a restriction imposed only on the choice rule r * , on the prior, and on the utility function. This makes our result relevant to those analysts who do not know the set P of the feasible primitive cognition processes, but observe the joint distribution of the actions and the payoff states.

Benefit of cognition biases
For an illustration of how the agent can benefit from a cognition bias that favors termination of the decision-making after encountering a certain types of signal, we review a special case of our elementary setting from Example 1. The agent's task is to announce the realized value of the binary state θ ∈ {0, 1}. She is rewarded with payoff u(a, θ) = 1 if her announcement a ∈ {0, 1} matches the state θ, and she receives zero otherwise. The state θ is drawn from the uniform prior and the agent has access only to one statistical experiment; P = {p} is a singleton. The available statistical experiment is asymmetric in that it satisfies That is, x = 1 (resp. x = 0) is informative that the state θ is more likely to be 1 (resp. 0) and signal x = 1 is more informative than x = 0. Let us fix the action strategy to be the identity function σ I (x) = x. 5 With probability .2, the experiment delivers the signal x = 1, which provides the agent with the correct advice in 90% of cases, whereas the somewhat less informative signal x = 0 that is correct in only 60% of the cases is observed with probability .8.
The choice rule r(p, 1, σ I ) = p, however, is not optimal since it fails to be second-thought-free.
To verify this, consider the agent who has run the experiment p, receives the muddled signal x = 0, and is about to conclude the decision-making with action a 1 = 0. Before she implements a 1 , let her contemplate having a second thought that involves running the experiment p once again and implementing whichever action a 2 = x 2 the new run recommends. If a 2 = a 1 , then the second thought will have been inconsequential. If a 2 = a 1 , then it is more likely that θ = 1 than θ = 0 (because x = 1 is more informative than x = 0), and thus the agent benefits from switching her choice from a 1 = 0 to a 2 = 1. Thus, overall, the agent who is about to terminate the decision process r(p, 1, σ I ) with action 0 benefits from having the second thought. By Proposition 1, the process r(p, 1, σ I ) is not optimal.
Generically, choice rules are not second-thought-free, and therefore, like in this example, the agent who has access to a generic experiment p and has a generic utility u and a prior π benefits from selective repetitions of the experiment p. In this example, due to the symmetry of the prior and of the utility function, the second-thought-free rule must be symmetric: the optimal termination strategy β * must satisfy To see why the optimal rule must be symmetric, recall that the second thought affects the payoff only if the two runs of the decision process will result in the conflicting signals x 1 = x 2 . If the rule was not symmetric, then the agent would fail to be indifferent between the two actions, and hence would want to have a second thought after terminating with one of the two actions. Vice versa, when the rule satisfies the symmetry (8), then both states are equally likely in this contingency, and thus the agent is indeed indifferent between the two actions; the symmetric rule is second-thought-free.
The symmetry condition (8) implies that β * 0 /β * 1 = 0.15, and r(1 | 1; p, β * , σ I ) = r(0 | 0; p, β * , σ I ) = 0.79. 6 The fastest such rule terminates immediately after the agent observes the highly informative signal x = 1, but when the agent observes the muddled signal x = 0, then she is likely to hesitate: she reruns her cognition p with probability 1 − .15 = .85. Such a bias in her cognition inflates the likelihood of her decision taking place at the information set x = 1 and this modifies the precisions of posteriors under the effective experiment s(p, β * ). The probability of the correct choice after terminating at either signal realization becomes 0.79, and thus the overall expected payoff increases to 0.79 compared to only .66 under the unbiased strategy β = 1.

The binary setting
We extend the example studied in subsection 3.3 to cover the general setting with binary action and state spaces. The agent chooses a ∈ A = {0, 1} and receives u(a, θ), where the payoff state θ is drawn from an interior prior π ∈ Δ(Θ), and Θ = A. To avoid a trivial case, we assume that neither action is dominant. Then, without loss of generality, u(a, θ) = u θ > 0 if a = θ and u(a, θ) = 0 otherwise. The set P of the feasible statistical experiments is finite, and each p ∈ P delivers a signal x from a finite signal space X with probability p(x | θ). The agent chooses p ∈ P, the termination strategy β = (β x ) x∈X and action strategy σ : X −→ A to maximize the expected payoff as in (3).
The first result states that there exists a solution in which the agent ignores all but two signal realizations of the chosen experiment p. That is, she always repeats the experiment upon encountering all but two signals. Roughly, the result follows because it is advantageous to consider only the most informative signal realizations for each state. 7 Lemma 3. There exists a solution in which the termination probability β x is positive for at most two signal values x ∈ X.
Based on the lemma, we can, without loss of generality, restrict the signal space X to be binary, and identify it with the action and state spaces, X = A = Θ. Again without loss of generality, we choose signal labels in such a way that each experiment p ∈ P satisfies the monotone likelihood ratio property: p(1 | θ)/p(0 | θ) increases in θ. We continue to assume that p(x | θ) > 0 for all x and θ.
Recall that σ I is the identity function, and let the agent employ experiment p and the action strategy σ I . The next lemma characterizes the set R p,σ I = {r(p, β, σ I ) : β ∈ B} of the feasible choice rules that such an agent has access to. To characterize this set, we introduce a parameter that we dub perceptual distance between states 0 and 1 under the experiment p: The perceptual distance is a summary statistic of the information structure of the experiment p.
The larger it is, the more p reliably discriminates between the two states. The monotone likelihood solving (8) for β * 0 /β * 1 . 7 This insight exploits the assumption of perfect patience, since impatient agents would trade off informativeness against delay costs. We conjecture that when exponential discounting is considered, then the result that the agent ignores all but two signal realizations continues to hold for sufficiently patient agents and generic signal structures. property of each p implies that d p > 1. The next lemma states that the perceptual distance is preserved under any termination strategy β.
That is, a rule r can be constructed from p if and only if it preserves the perceptual distance: r(0|1)r(1|0) = d p (or if it always selects a same action). By controlling the termination strategy β, the agent trades off the likelihoods r(0 | 0; p, β, σ I ) and r(1 | 1; p, β, σ I ) of the correct choice in the states 0 and 1, respectively. See Figure 2. The set R p,σ I of rules accessible from p is compact.
Thanks to the chosen labeling of the signals, the agent can equate her choice to the observed signal without a loss: Lemma 5. For any rule r(p, β, σ) there exists β such that the rule r(p, β , σ I ) achieves at least as high expected payoff as r (p, β, σ).
The solution to the repeated-cognition problem in the binary setting exists since the objective is continuous in the choice rule and the agent optimizes on the compact set p∈P R p,σ I of the rules.
Let p be the experiment with the maximal perceptual distance: p ∈ arg max p∈P d p , and let d = max p∈P d p . In line with the intuition that the agent should go for the most informative experiment, we establish: Lemma 6. There exists a solution to the repeated-cognition problem in which the agent employs the experiment p with the maximal perceptual distance.
The last lemma implies that all details of the set P relevant for the solution are summarized in the one-dimensional statistic d that is independent of the payoff function u. 8 We are now ready to solve the binary setting. The optimal effective choice rule r * (a | θ) = r(a | θ; p, β * , σ I ) consists of four unknown probabilities and it is determined by four conditions: the second-thought-free condition (5), the feasibility condition from Lemma 4, and two normalization conditions. Let parameter R = π 1 u 1 π 0 u 0 measure the relative a priori attractiveness of action 1.
When the ex ante attractiveness of one of the actions is too strong relative to the precision of the best available experiment, then the agent ignores the signals generated by the experiment and always chooses the ex ante attractive action. This arises when the linear indifference line in Figure   2 is steeper than the slope of the set R p,σ I at the corners. These slopes at the corners are −d and −1/d, which implies the range (1/d, d) of R for which the optimal rule is interior. In Section 6.1 we show how Proposition 2 extends to cover the agents endowed with more elaborate memory.
The solution of the repeated-cognition problem has natural comparative statics reminiscent of wishful thinking. As the ex ante attractiveness of a state increases, the agent searches more intensively for evidence in support of this state, her relative performance in this state improves, and the relative speed of the decision-making in this state increases. Let f θ = x β x p(x | θ) stand for the decision rate in state θ; it is the per-round probability that the agent in state θ terminates the decision-making. The response time in state θ is geometrically distributed with the expected response time 1/f θ .

Behavioral applications
This section presents four behavioral effects generated by our model. We demonstrate the first three effects-confirmation bias, speed-accuracy complementarity, and overweighting of rare events-in the binary setting from Section 4. The fourth effect-salience of distinct states-will be presented in a setting with multiple states.

Confirmation bias
Psychologists and economists distinguish at least three mechanisms leading to confirmation bias: (i) People search for evidence selectively, targeting the evidence type in accord with their priors, e.g. Nickerson (1998); (ii) they selectively memorize and recall the data supporting their priors, e.g. Oswald and Grosjean (2004); and (iii) they selectively interpret ambiguous evidence, e.g. Rabin and Schrag (1999) and Fryer et al. (2016). We focus on the first two mechanisms and interpret them in light of our optimal repeated-cognition result.
The next result provides a simple illustration of why some form of confirmation bias is constrained optimal. We consider here an agent who has access to only one symmetric primitive experiment.

Corollary 2.
When action 1 is a priori more attractive, R ∈ (1, d), and the unique primitive experiment is symmetric, p(1 | 1) = p(0 | 0), then the agent searches relatively more intensively for To see how the above result relates to confirmation bias, consider an agent whose task is to announce the realized state of the world: she receives reward u 1 = u 0 = 1 if she makes the correct announcement and receives 0 otherwise. The agent finds the state θ = 1 a priori more likely than the state 0, π 1 > π 0 .
Consider in this setting the suboptimal decision process that terminates immediately after the first run of the experiment and chooses the action equal to the observed signal: β 0 = β 1 = 1, σ = σ I . We first observe, paralleling an argument made in subsection 3.3, that such an unbiased process is suboptimal. To see this, assume that the agent has observed the a priori unlikely signal 0. Such a surprised agent is better off by restarting the decision-making instead of terminating with action 0, since if the new run of the process concludes with signal and action 1, then the switch from action 0 to 1 is beneficial. This is because when the experiment p is symmetric, then, conditional on the two conflicting signals, the a priori more common state 1 is relatively more likely.
The agent benefits from the second thought whenever she receives the surprising recommendation, and thus will deviate from the uniform search in favor of the a priori likely signal 1.
The optimal strategy resembles the natural process in which the selective memory gives rise to confirmation bias. We consider the fastest optimal strategy, letting β * 1 = 1. When the agent observes signal 1 that confirms her prior belief, then she terminates and immediately announces the state 1. But if she is surprised, observing signal 0 that contradicts her prior, then she discards the signal with positive probability β * 0 and repeats the experiment. Although finding the exact optimal value β * 0 may be difficult, the fact that double-checking one's own reasoning when one arrives at a surprising conclusion is a common practice suggests that people are able to deviate from the unbiased suboptimal process in the payoff improving direction.
Let us relate the above effect of confirmation bias to the choice of media outlets. Let each state of the world θ ∈ {0, 1} generate an infinite sequence of signals x k iid. drawn from the conditional distribution p(x | θ). The agent, whose task is to identify the realized state, can comprehend only one such signal realization and she can access it only via a media outlet. The media outlets differ in their editorial policies (β x ) x . Each outlet draws a first signal x 1 , terminates and reports x 1 to its readers with probability β x 1 , and with the residual probability 1 − β x 1 the outlet redraws the new signal x 2 , etc, until the outlet terminates its search and reports the last observed signal. The reader of the outlet with an editorial policy β observes signal x in the state θ with probability s(x | θ; p, β) = βxp(x|θ) x β x p(x |θ) . As in Gentzkow and Shapiro (2006), our agent prefers outlets biased in favor of her prior belief; she prefers an editorial search policy β that is biased in favor of the signal realization that confirms the agent's own prior. For example, a voter a priori favouring, say, Trump, who has time to read only one headline in the media outlet of her choice, will optimally choose an outlet that persistently searches for evidence favorable to Trump, and that will report encountered evidence that is unfavorable to Trump only with a relatively smaller probability.
The source of the media bias in Gentzkow and Shapiro is reputation: Their agents evaluate media outlets confirming their prior beliefs as being relatively reliable information sources. In our case, all outlets are ex ante identical in that they have access to the same signal-generating process and thus reputation does not play a role. Rather, in our case, the demand for the prior-confirming outlets is driven by the information-aggregation friction. When the reader's attention span allows the outlets to report only one signal, then the optimal editorial policy favors the signal that advises the reader correctly in the a priori more likely state, since this state has a large weight in the reader's a priori objective. 9

Speed-accuracy complementarity
The binary setting generates the speed-accuracy complementarity effect-a stylized fact stating that delayed choices tend to be less accurate than speedy choices; see the psychology studies of Swensson (1972) and Luce (1986). We establish this effect in the setting from the previous subsection: u 0 = u 1 = 1, π 1 > π 0 , considering again a symmetric primitive signal distribution, p(1 | 1) = p(0 | 0) > 1/2. Let ϕ(θ, a, t) be the joint probability distribution of the state θ, chosen action a, and the reaction time t generated by the solution (p, β * , σ I ) of the repeated-cognition problem.
Due to the stationarity of the decision process, the probability of the correct choice conditional on the payoff state is independent of the reaction time: Pr ϕ (a = θ | θ, t) = Pr ϕ (a = θ | θ). At optimum, this conditional probability of the correct choice is larger in the a priori more likely state 1 than in the state 0, reflecting the relative weights of the two states in the a priori objective. Overall, unconditionally on the payoff state, the probability Pr ϕ (a = θ | t) of the correct choice depends on the response time because t correlates with θ. A long response time indicates that the agent has repeatedly encountered the a priori surprising signal x = 0 and has hesitated to terminate. Hence, conditional on large t, the likelihood of the a priori surprising state becomes high. The longer the agent has hesitated, the more likely it is that she is facing the a priori surprising state in which she is making more mistakes.
Corollary 3 is in line with empirical evidence from Ratcliff and McKoon (2008), who asked lab participants to report a binary state visually encoded by a pattern of moving dots on a computer screen. We conceptualize the sequence of the observed dot movements as a sequence of the signals generated by a primitive experiment p in our model. Ratcliff and McKoon's design exhibits a symmetry that justifies the symmetry p(1 | 1) = p(0 | 0) assumed here. The experiment consisted of two treatments that differed in the prior distribution of the two states. The prior in each treatment was announced to participants in instructions. In line with our predictions, the posterior probability that the participant's announcement is correct is higher when she announces the a priori expected state than when she makes the surprising announcement. The announcements of the surprising state are stochastically delayed relative to the announcements of the a priori expected as an indication that the stake is small, thereby leading her to optimally terminate the process sooner with relatively imprecise posteriors. The authors' proposed mechanism differs from ours.
While their agent "gives up"after long delays, in our model long delays indicate that the agent has encountered the surprising state in which her decision strategy is less well-adapted. Since there is no uncertainty about the magnitude of the stakes in Ratcliff  We show in Section 6 that our solution of the binary setting and all the related effects extend to agents who partially aggregate information, although we abstract from the trade-off between longer aggregations and the aggregation cost.

Overweighting of rare events
As in previous subsections, we consider a task consisting of state recognition (i.e. u(a, θ) = 1 if a = θ and u(a, θ) = 0 otherwise). In contrast to previous applications, we assume that the two states θ ∈ {0, 1} are equally likely, but the distribution of the signal referred to as event in this section x ∈ {0, 1} is asymmetric across states. Specifically, the probability of x = 1 in the state of the world θ is ρ θ ∈ (0, 1) and the probability of x = 0 is 1 − ρ θ .
Let us assume, essentially without loss of generality, that ρ 0 < ρ 1 < 1 − ρ 0 . 11 The a priori probability of event x = 1 is (ρ 0 + ρ 1 )/2 < 1/2, and thus the event x = 1 is relatively rarer than x = 0. We observe that at the optimum, the agent is relatively more likely to discard the more common event x = 0 in agreement with Kahneman and Tversky (1979), who observe that agents tend to overweight rare events.
Corollary 4. At the optimum, the agent is biased in favor of the rare event x = 1: β * 1 > β * 0 > 0 (and her guess of the state equals the observed signal, i.e. σ = σ I ).
To illustrate the result, consider an agent who is forming her probability belief over a flight accident. The accident probability per flight in the safe state of the world is 10 −6 , whereas it is 10 −5 in the dangerous state of the world, and both states are a priori equally likely. The agent can sequentially observe arbitrarily many past flight outcomes, but cannot aggregate the information, and recalls only the last observed flight. She guesses that the state of the world is dangerous if and only if the last observed flight is eventful.
Consider first an agent who always terminates right after the observation of the first datapoint. Such an agent benefits from a "second thought" whenever she observes an uneventful flight: Either the second observed flight will be uneventful, in which case the second thought will have been inconsequential, or the redrawn flight will be eventful and the agent will switch her assessment from the safe to the dangerous state. Such a switch is beneficial since conditional on two contradicting data-points the dangerous state is relatively more likely. More generally, conditional on two contradicting data-points, the state θ for which ρ θ is closer to 1/2 is relatively more likely, since it is relatively likely that such a state generates contradicting signals. Thus, relative to the immediate termination strategy, the agent will benefit from discarding the uneventful flight observations with positive probability.
For the assumed accident rates, the optimal ratio of the search intensities is β * 1 /β * 0 ≈ 316, 000 where the signal x = 1 stands for the eventful flight observation. Thus, the agent's search for data is heavily biased towards the rare accidents. This strategy generates probabilities of the correct state identification approximately equal to 0.76 in both states of the world. Since the immediate termination strategy identifies the correct state with a probability equal approximately to half, the bias significantly improves the payoff. Bordalo, Gennaioli, and Shleifer (2012) interpret salience as directed attention focus and quote the popular work by Daniel Kahneman (2011): "Our mind has a useful capability to focus on whatever is odd, different or unusual."

Salience
The quote states a causal relation between the two features of the salient phenomena: These are (i) odd, different or unusual, and because of (i), people benefit from (ii) focusing their attention on such phenomena. Here, we confirm Kahneman's intuition within our proposed framework. Our microfoundation of the salience effect is related to the insight emerging in psychological research on visual salience. Itti (2007) conceptualizes the visual salience effect as attention allocation to a subset of the visual field that is "sufficiently different from its surroundings to be worthy of [one's] attention."Similarly, in our model, a payoff state is salient if it stands out sufficiently from similar states to be worthy of the focus of the agent's information search.
As before, the agent faces a perceptual task that requires her to announce a random state θ. She is endowed with a primitive perception technology that generates a perceived value θ of the state.
The primitive perception is informative but noisy: The perceived value θ equals the true state θ with a high probability, but mistakes, θ = θ, occur sometimes. We view the primitive perception technology as a black-box model of a physiological sensor that generates a noisy first impression θ of the true state θ. The agent can use the sensor repeatedly but is not able to aggregate the information. She conditions the repetition of the sensor's use on the most recent perception and announces the terminal perception.
We formalize this perception task as follows. The agent makes an announcement a ∈ A = Θ, where 2 < |Θ| < ∞, and receives payoff u(a, θ) = 1 if her announcement is correct, a = θ, and u(a, θ) = 0 if a = θ. The prior is uniform. Each use of the agent's sensor generates a signal/perception θ ∈ X = Θ, with conditional probabilities p(θ | θ). We make two assumptions on p: For two states θ 1 and θ 2 , we say that θ 1 is more distinct than θ 2 if for each state θ 3 = θ 1 , θ 2 , p(θ 1 | θ 3 ) < p(θ 2 | θ 3 ). Suppose for illustration that the perceptual task involves recognition of a color from a set {azure, indigo, red}. Intuitively, the red color stands out of this set, and this is captured by the above definition. Assume that the two shades of blue are similar in that the agent's first impression confuses them in 10% of cases, p(azure | indigo) = p(indigo | azure) = 0.1, but p(θ | red) = p(red | θ) = 0.01 for θ ∈ {azure,indigo}. Then, the red color is more distinct according to our definition than either of the two blue shades.
To simplify the exposition, we assume that the agent uses the identity action strategy σ I ; she announces the state equal to her last perception. We also make a regularity assumption that the optimal termination probabilities β x are positive for all x ∈ Θ. 12 Let r * = r(p, β * , σ I ) be the optimal feasible choice rule.
Proposition 3. If state θ 1 is more distinct than state θ 2 , then the agent's terminal perception is biased in favor of the more distinct state θ 1 at the expense of the less distinct state θ 2 : Since the primitive perception technology p is symmetric by assumption, the asymmetry in favor of the distinct state of the optimal terminal perception r * is driven solely by the optimization of the termination strategy. To gain the intuition for the salience of the distinct states, consider a state θ * that is similar to many other states and an agent who always terminates the process after the first round: β = 1. This agent is relatively uninformed whenever she forms perception θ * , since the true state differs from θ * with a sizeable probability. The agent with the indistinct perception θ * would thus benefit from "having a second thought"-i.e., from running the primitive perception formation process once again. The optimal termination strategy involves repeating the primitive process with relatively high probability whenever the agent forms a perception of an indistinct state, and this shifts the terminal perception in favor of the distinct states.

Robustness checks
We explore here the robustness of our results to an increase in the memory capacity of the agent and to discounting. The first subsection returns to the examples 2 and 3 from Section 2.2 in which the agent is endowed with nontrivial memory and aggregates many observed signals. We show that the two examples are special cases of our baseline model, and hence the optimal rules solving them are second-thought-free. The second subsection accommodates discounting.

Sophisticated decision processes
Examples 2 and 3 from Section 2.2 seemingly violate our baseline model in that the agent can aggregate information across signal realizations. We show, however, that these examples are formally special cases of our baseline model appropriately specified.
Recall that P iia is the set of all stochastic choice rules that the agent from example 2 (imperfect information aggregation) can construct. Consider our baseline model with the signal space X = A and the set of the feasible primitive experiments P = P iia . The set R(P iia ) = {r (p, β, σ) : p ∈ P iia , β ∈ B, σ ∈ S} is then a set of the stochastic choice rules that can be constructed as follows. The agent runs any process p ∈ P iia , and observes a signal/action recommendation a with probability p(a | θ). She terminates with probability β a , according to the termination strategy β = (β a ) a∈A , and upon the termination chooses an action a = σ(a), where σ ∈ S is any mapping A −→ A. She reruns the process p with probability 1 − β a , observes a new action recommendation generated by p, et cetera, until she terminates after a stochastic number of repetitions of the process p.
As it turns out, no new choice rules beyond those from P iia can be constructed by these selective repetitions. This follows because the repetitions of the rule p ∈ P iia with the termination strategy β can always be replicated with an appropriate choice of a different rule in P iia that whenever p would terminate with a restarts the process from scratch with probability 1 − β a . Formally: According to the lemma, example 2 is the special case of our baseline model with P = P iia , since in such a specification of the baseline model, the set of the feasible rules coincides with those in the example. In particular, the optimal choice rule p * ∈ P iia solving the example coincides with the optimal rule r * ∈ R(P iia ) solving this specification of the baseline model.
The repeated-cognition problem with P = P iia is purely formal in that the optimal termination probabilities β * x = 1 for all x ∈ X = A, and thus the agent conducts the optimal process p * ∈ P iia only once and terminates. Nevertheless, the observation that p * solves the repeated-cognition problem has an important implication.

Corollary 5. The choice rule that solves example 2 (imperfect information aggregation) is secondthought-free.
Wilson (2014) differs from this example mainly in that she assumes exogenous termination probabilities. By adding optimization over the terminations to the model of Wilson, we gained the partial characterization of the optimal choice rule without fully solving the problem: One can conclude that the optimal choice rule is second-thought-free without analyzing the optimal use of the memory states.
The same argument applies to example 3 (partial forgetting). As with the previous example, let R(P pf ) be the set of the feasible choice rules in our baseline model with the set of the feasible primitive experiments P identified with P pf .
Thus, again, the rule p * ∈ P pf solving example 3, and the optimal rule r * ∈ R(P pf ) coincide, and thus the rule solving the example must be second-thought-free.

Corollary 6. The choice rule that solves example 3 (partial forgetting) is second-thought-free.
Importantly, when the action and state sets are binary (while the signal space can be arbitrary), then Proposition 2 derived above for memoryless agents applies to the sophisticated agents from examples 2 and 3. The optimal choice rules solving the examples simply satisfy Proposition 2 with d set to the maximal perceptual distance among the rules p in P iia and P pf , respectively.

Costly delay
Our baseline model abstracts from the cost of time in that the agent is only concerned with how the repetitions of the signal extraction affect the correlation of the signal with the state. We now incorporate discounting.
We continue to study the baseline model from Section 2.1, except that the agent discounts future payoffs exponentially with the discount factor δ ∈ (0, 1). To accommodate discounting, we redefine the choice rule induced by the experiment p, the termination strategy β and the action strategy σ as follows.
where ρ x t | θ; p, β defined in (1) is the conditional probability of the signal history x t . That is, r δ (a | θ; p, β, σ) is the discounted probability of the choice of action a in the state θ. When δ = 1, then (11) coincides with our baseline definition of the choice rule r(p, β, σ).
The set of the feasible discounted rules is R δ (P) = {r δ (p, β, σ) : p ∈ P, β ∈ B, σ ∈ S}. The discounted repeated-cognition problem is to select a feasible rule r δ that maximizes the expected payoff: where discounting is incorporated in the definition of the feasible rules.

Proposition 4. If the termination strategy
The condition has the same interpretation as the second-thought-free condition in the absence of discounting. The left-hand side is the payoff for following the optimal decision process r * δ summed up across all contingencies that terminate with choice of a. The right-hand side is the payoff that the agent would get across the same contingencies if she restarted the decision process r * δ instead of the termination.
For illustration, we now revisit the confirmation bias application from Section 5.1 with an impatient agent. We will find that, unless discounting is too strong, the impatient agent chooses qualitatively the same strategy as the patient one, although the impatient agent speeds up her decision-making by choosing larger termination probabilities.

Discussion
The second-thought-free condition is related to Piccione and Rubinstein (1997), who show that the ex ante optimal decision strategy of a forgetful decision-maker can be thought of as a team equilibrium of multiple selves, with each self representing the decision-maker at an information set.
Each self takes the strategies of the other selves as given, internalizes the other selves' strategies in her Bayesian inference, and maximizes the decision-maker's payoff. Our agent is forgetful in that she conditions terminations only on the last signal, not on the signal history. As in Piccione and Rubinstein, the optimal termination strategy is a team equilibrium in the sense of the modified multi-self equilibrium defined there. The self who has received a signal makes inferences about the state both from the the observed signal and from the fact that the previous selves (if any) have not terminated. Given her equilibrium posterior belief, each self decides whether to terminate with the action that is optimal under the formed posterior or to delegate the decision to the next self.
The equilibrium is mixed, with each self indifferent between terminating and passing the decision forward.
Meyer (1991) studies optimal biases in a sequential-learning problem of an agent who receives a sequence of binary signals and, unlike our agent, can aggregate the sequence of signals. She analyzes the best signal structure maintaining that the signal realization is binary. Her main insight is that some asymmetries in the signal structure are optimal. This can easily be understood in a twoperiod version, as symmetry of the conditional signal distributions would imply that the second signal is worthless. Although the observation that asymmetries in the information structure may be optimal arises both in her and our framework, the two papers study distinct optimizations.
While our agent controls termination probabilities in a stationary decision process, Meyer's agent controls the choice of a Blackwell experiment in each round of a non-stationary process.
In the above setting, we have assumed that the agent chooses optimally the action strategy σ : X → A. In some contexts, the action strategy may instead be hardwired by automatic responses and possibly suboptimal. We note that as long as the agent can optimally adjust the termination strategy β, the second-thought-free condition continues to hold since it follows from the first-order condition with respect to the termination strategy only. Behavioral insights such as those highlighted in Section 5 would be unaltered in such environments.

Summary
Agents, who cannot comprehend all facts available to them, benefit from selective attention. We show that agents can implement a targeted information search in a process that resembles the natural phenomenon of hesitation. Like a hesitant person, the agent can, conditional on the action contemplated, decide whether she implements the action or whether she will have a second thought, and run the cognition process once more. Such hesitation can be productive, despite consisting of repetitions of the same stochastic cognition process. By conditioning the probability of the repetition on the conclusion of the reasoning, the agent controls the correlation of her terminal conclusion and the payoff state. The optimal decision process arising in our model exhibits natural hesitation patterns: The agent will have second thoughts-that is, she will repeat her cognitionwhenever the expected payoff for the currently favored choice is inferior to the expected payoff for continuing decision-making. At optimum, the agent terminating the decision-making must be indifferent between terminating with the currently contemplated action, and repeating the process.
In a sense, the condition formalizes the concept of a reasonable doubt. Abstracting from many considerations such as information aggregation, a jury deciding a trial under common law should be, if using the optimal decision procedure, indifferent between declaring a verdict and announcing a hung jury and initiate retrial.
Let us conclude by reviewing the limitations of our main result. The central assumptionthe ability of the agent to freely repeat her decision process-may fail for several reasons. One reason is that the agent may only have access to a limited data set that constrains her to a finite number of repetitions of the primitive decision process, making the optimal termination strategy non-stationary. Another complication arises if the outcomes of distinct runs of the same cognition process are not conditionally independent as assumed in our model; this may arise if some cognition errors are systematic and are likely to emerge in distinct repetitions of the cognition. We conjecture that the second-thought-free condition holds in such a case, with the agent internalizing the correlations between the cognition runs.

A.1 Proofs for Section 3
Proof of Lemma 1. Suppose by contradiction that (4) holds with strict inequality for some a chosen with positive probability. Then, , and applying the law of iterated expectation, this simplifies to E α [u(a 1 , θ)] > E α [u(a 2 , θ)]. This establishes the contradiction since a 1 and a 2 are conditionally iid. draws generated by the rule r.
Proof of Lemma 2. The effective experiment s(p, β) satisfies a recursion where the first summand is the probability that the agent terminates with signal x after the first run of the primitive experiment p, and the second summand is the probability that the agent continues with decision-making after the first run of p and terminates with x later. Solving for s (x | θ; p, β) gives (6).
Proof of Proposition 1. Let (p * , β * , σ * ) solve the repeated-cognition problem. Consider an action a chosen with a positive probability. There must exist x such that σ * (x) = a and β * x > 0. Therefore, the constraint β x ≥ 0 is not binding for this x, and the first-order condition of problem (7) with respect to β x is: where we have summed over all x such that σ * (x ) = a in the second line. Multiplication by β * x and summation over all x such that σ * (x) = a and β x > 0 give Division of this by θ π θ r (a | θ; p * , β * , σ * ) leads to (4). Lemma 1 implies (5).

A.2 Proofs for Section 4
Proof of Lemma 3. Assume that there exists a solution with β x positive for n > 2 signals x ∈ X.
We show that then there exists a solution with n − 1 positive signals. The proposition follows from the induction on n.
Let us prove the induction step. Fix the primitive experiment p employed by the agent, let β be an optimal termination strategy for the given p, and let X be the set of signals with positive β x , and write shortly s(x | θ) for the effective experiment s(x | θ; p, β) induced by p and β. Let us abuse notation by letting s(x) = θ π θ s(x | θ) stand for the unconditional effective probability of x. For x ∈ X let q x ∈ Δ(Θ) be the posterior belief upon terminating with x: q x (θ) = π θ s(x | θ)/s(x).
Since |X | > 2 and the state space Θ is binary, there exists a signal x * ∈ X such that q x * is in the convex hull of the posteriors q x , x ∈ X \ {x * }. Let μ x be the coefficients that decompose q We will construct an alternative feasible effective experiments(x | θ) with unconditional probabilities of x denoted bys(x) and the posteriors π θs (x | θ)/s(x) denoted byq x (θ) such that: andq Since the experiments is more informative than s (in the sense of the Blackwell comparison), there exists a solution with this alternative feasible effective experiments, as needed for the induction step.
It remains to constructs. Note that if an effective experiment s(x | θ; p, β) = βxp (x|θ) induced by some p and β, then for any vector of probabilitiesβ x , the experiment is also feasible, since it is induced by p and β = (β x β x ) x∈X .
The last two inequalities imply which establishes contradiction because by Lemma 4, the rule r(a | θ; p, β, σ I ) satisfies and therefore it inherits the monotone likelihood ratio property from p.
Proof of Lemma 6. Consider the choice rule r(p, β, σ I ) constructed from the experiment p with perceptual distance d p = d, and fix the probability r(0 | 0; p, β, σ I ) = α of the correct choice in state 0 to a value α ∈ (0, 1). Then, by Lemma 4, the probability r(1 | 1; p, β, σ I ) of the correct choice in state 1 satisfies For each α, the solution for r(1 | 1; p, β, σ I ) of this equation increases in d.
Proof of Proposition 2. The agent's objective is linear with respect to the choice rule r(p, β, σ).
Thus, the optimal rule is the point of tangency of the set R p,σ I of the feasible rules and of an indifference line; see Figure 2. The slope dr(0|0;p,β,σ I ) dr(1|1;p,β,σ I ) is decreasing in r (1 | 1; p, β, σ I ) and attains value −1/d for r (1 | 1; p, β, σ I ) = 0, and value −d for r (1 | 1; p, β, σ I ) = 1. Thus, when R < 1/d or R > d, then the problem has the corner solution as specified in statements 1 and 2 of the proposition.

A.3 Proofs for Section 5
Proof of Corollary 2. Since β * 1 /β * 0 increases in R, it suffices to show that β * 1 /β * 0 = 1 when R = 1 and the primitive experiment p is symmetric. Indeed, when R = 1, then by (10), where the last equality follows from the symmetry of p.
Proof of Corollary 4. The belief formation problem studied is a special case of our binary setting with the primitive experiment p(x | θ) = ρ θ if x = 1, p(x | θ) = 1 − ρ θ if x = 0 and with equally a priori attractive actions, R = 1. Since ρ 0 < ρ 1 , the setting satisfies the monotone likelihood property, and thus by Lemma 5, there exists a solution with σ(x) = x. Since R = 1 ∈ (1/d, d), Proposition 2 implies that the agent's behavior is stochastic, both β * 0 and β * 1 are positive, and the ratio of the search intensities β * 1 /β * 0 satisfies (10). Since R = 1, (10) simplifies to The next result is an auxiliary lemma used in the proof of Proposition 3.
Lemma 9. Suppose that termination probabilities β x are positive for all x ∈ Θ. Then, the optimal effective choice rule r * satisfies for any pair of states θ, θ ∈ Θ: Condition (16) is a strengthening of the second-thought-free condition (5). It requires that the agent who has terminated the decision process with perception θ, and knows that the second run of the process r * terminates with a value θ is indifferent between θ and θ . This condition is stronger than the second-thought-free condition (5), since (5) requires (16) to hold only on average across all θ . This strengthening holds for the special case of this application with a symmetric experiment p.
Proof of Lemma 9. The optimal effective choice rule satisfies the second-thought-free condition (5), equivalent to: which after two algebraic steps gives: The last system of equations is formally equivalent to the system of balance conditions for a Markov chain. To see this, consider an ergodic Markov chain with transition probabilities from θ to θ equal to r * (θ | θ). The balance condition for the stationary distribution μ(θ) of this chain is and thus r * (θ | θ) is proportional to the ergodic probability μ(θ) of the state θ for the chain with transition probabilities r * (θ | θ).
Recall that if a Markov chain with transition probabilities m(θ | θ) is reversible, then its stationary distribution μ(θ) satisfies detailed balance conditions Thus, it suffices to prove that the probabilities r * (θ | θ) constitute a reversible Markov chain.
It states that the decision rate f θ = θ β * θ p(θ | θ) in each state θ is proportional to p 1/2 (θ | θ) and thus high in those states that are reliably identified by the primitive experiment p.

A.4 Proofs for Section 6
Proof of Lemma 7. All rules feasible in P iia are feasible in R(P iia ): R(P iia ) ⊃ P iia . This is immediate since when β a = 1 for all a ∈ A, then r(p, β, σ I ) = p for all p ∈ P iia .
It remains to show R(P iia ) ⊂ P iia . Consider p(γ, σ) ∈ P iia constructed in the setting of example 2 by the use of the generalized termination strategy γ(m, x), and the action strategy σ (m, x). Recall that r(p(γ, σ), β,σ) is the choice rule constructed by repetitions of the rule p (γ, σ) according to the termination strategy β = (β a ) a∈A and by applying the action strategyσ : A −→ A upon the termination. We need to show that there exists γ and σ such that r(p(γ, σ), β,σ) = p(γ , σ ). This is indeed so when the termination probability γ (t | m, x) = γ(t | m, x)β σ (m,x) , the transition probability to the original memory state m 0 is γ (m 0 | m, x) = γ(m 0 | m, x) + γ(t | m, x) 1 − β σ (m,x) , which is the sum of the probabilities that the original process γ transits to m 0 and that the decision process r(p(γ, σ), β,σ) restarts after termination of p(γ, σ). Additionally, for allm = m 0 , γ (m | m, x) = γ(m | m, x). The above choice of γ implies that the process p(γ , σ ) replicates the Markov process over the memory states under r(p(γ, σ), β,σ). Finally, to replicate the choices upon terminations, we set the action strategy σ (m, x) =σ(σ(m, x)) for all (m, x).
Proof of Proposition 4. We extend the definition of the effective experiment to the setting with discounting. Let where ρ x t | θ; p, β is the probability of the signal history x t defined in (1). Thus, s δ (x | θ; p, β) is the discounted probability that the agent's last observed signal is x. It satisfies the recursion: where the first summand is the probability that the decision process terminates with x in the first round and the second summand is the discounted probability that the process terminates with x later. Solving (19) for s δ gives The discounted repeated-cognition problem (12) is thus equivalent to max p∈P,β∈B,σ∈S θ∈Θ,x∈X Consider x with an interior termination probability β * x ∈ (0, 1) and let a = σ * (x). The first-order Further, it must hold that β * 0 = 1 or β * 1 = 1. Otherwise, if both β * 0 < 1 and β * 1 < 1, then the agent can increase both β * x by a same factor. This preserves the conditional action distribution in each state θ and increases the decision rates in both states, and thus it is a profitable deviation.

βxp(x|θ)
1−δ+δ x β x p(x |θ) , this condition simplifies into a quadratic equation for β * 0 . When δ < 1 α+(1−α)R , then this condition does not have an interior solution and the derivative of the value (22) with respect to β 0 at β 0 = 1 is positive. Thus, in this case, the unique β * 0 satisfying the first-order condition is β * 0 = 1. When δ > 1 α+(1−α)R , then the condition has an interior solution and the derivative of the value (22) with respect to β 0 at β 0 = 1 is negative. Thus, for this range of parameters, the unique β * 0 satisfying the first-order condition is the interior value that solves the quadratic equation, solution of which decreases in δ.