Learning biases in person-number linearization

The idea that universal representations of hierarchical structure constrain patterns of linear order is a central to many linguistic theories. In this paper we use Artificial Language Learning techniques to experimentally probe this claim. Specifically, we investigate how a hypothesized hierarchy of φ-features impacts the linearization of person and number affixes by (English-speaking) learners in the lab.


Introduction
The idea that universal representations of hierarchical structure constrain patterns of linear order is a central to many linguistic theories. In this paper we use Artificial Language Learning techniques to experimentally probe this claim. Specifically, we investigate how a hypothesized hierarchy of φ -features impacts the linearization of person and number affixes by (English-speaking) learners in the lab.

Compositional transparency in typology and learning
It is often assumed that linearization strategies are compositionally transparent: linear order is taken to reflect (morpho)syntactic composition such that elements that compose together in the derivation are expected to be linearly adjacent (Kayne 1994, Embick and Noyer 2007, Baker 1985. As illustrated in Fig. 1, if X and Y form a constituent, the phonological content of Y, y, is expected to be linearly adjacent to the exponent of X, x. This should happen independently of whether the language linearizes to the left or the right; that is, whether it's head-initial or head-final. Evidence that compositionally transparent linearizations are preferred comes largely from typological observations. In the domain of word order, for example, it is typically assumed that among nominal modifiers, Adjectives compose with the Noun first, then Numerals, then Demonstratives (Adger 2003, Alexiadou et al. 2008. This is reflected in common patterns of linearization: the vast majority of languages feature an order in which the Adjective is linearly adjacent to the Noun, the Demonstrative is linearly farthest from the Noun, and the Numeral is in between (e.g., N-Adj-Num-Dem Greenberg 1963, Cysouw 2010, Dryer 2018 Interestingly, behavioral experiments reveal that these compositionally X Y Z x-y-z *x-z-y (a) Head-initial Z Y X z-y-x *y-z-x (b) Head-final Figure 1: Illustration of Compositionally Transparent Linearization strategies: Regardless of head-direction, the sisterhood relation is mapped into one of linear adjacency. Orders that do not respect this adjacency are marked with a star. transparent orderings are also the patterns that learners prefer when acquiring a new language in the lab (Culbertson andAdger 2014, Martin et al. 2019a,b). Compositional transparency in linearization has also been argued to hold in morpheme ordering (Baker 1985, Rice 2000. A classical example comes from the relative order of derivational and inflectional affixes with respect to the stem: in complex words, derivational affixes (which are lexeme-forming) are typically placed closer to the stem (e.g., marri-ages). A similar argument has been made with respect to the relative ordering of case and number morphemes (Bybee 1985, Rice 2000. Number is assumed to compose with the stem before case, and accordingly, in languages where there is clear morpheme boundary between case and number marking (e.g., agglutinative languages like Hungarian or Turkish), the expression of number is realised closer to the noun stem than the expression of case (Universal 39, Greenberg 1963). Like nominal word order, recent laboratory language learning experiments confirm a preference for these compositionally transparent patterns of case-number ordering (Saldana et al. 2019).
However, the typological distribution of morpheme ordering patterns does not always appear to be constrained by compositional transparency, at least in its simplest form. Here, we investigate one such case: the relative ordering of person and number markers.

Relative order of person and number marking
The φ -feature bundle including person and number has been argued to have an internal structure wherein each feature occupies a fixed morphosyntactic position , and references therein). Following Harbour (2016), we assume that number (N) syntactically dominates person (P), as exemplified in Fig. 2a. Moreover, we additionally assume that, even though the φ -feature bundle itself can be thought of as a complete structure, the features compose with the stem cyclically: person features have an impact on the stem they attach to before number features do (Béjar 2003). These assumptions apply independently of whether the φ -feature bundle adjoins to a variable (pronoun) or a verbal stem. 1 The main argument for positing the hierarchy in Fig. 2a comes from semantic interpretation: φ -features restrict the range of values that variables like pronouns can take (Kratzer andHeim 1998, Heim 2008). At first glance, the order in which number and person re-1 In the literature, the hierarchy in Fig. 2a is typically represented using different notation: a φ head immediately dominating a person node, which immediately dominates a number node (Noyer 1992, Harbour 2008, 2016. This representation-originally proposed by Noyer (1992) to explain why morphological contrasts in person are more pervasive cross-linguistically than contrasts in number-also serves to derive patterns of morpheme order (see further discussion below Silverstein 1976, Noyer 1992, Harley and Ritter 2002 (a) Hierarchy N P φ -stem φ -stem P N (b) Typological distribution Prefixal (9/58) Suffixal (22/58) Discontinuous (25/58) N-P-stem (1/9) stem-P-N (17/22) P-stem-N (25/25) P-N-stem (8/9) stem-N-P (5/22) N-stem-P (0/25) Suffixal * N-P-stem stem-P-N P-N-stem * stem-N-P  (Trommer 2003); (c) Linearization strategy preferred if a bias for compositional transparency is at play; (d) Linearization strategy preferring if a bias for linearizing person to the left is at play. Note that the latter appears to capture the typology better. strictions apply may not appear consequential. For example, take the English pronoun I. Whether the range of possible values is first restricted to speaker(s) and only afterwards to atoms, or the other way around, does not in principle change the overall interpretation of the pronoun. However, the restrictions imposed by person and number interact such that applying number first, followed by person, would make certain types of distinctions impossible to derive. This happens perhaps most obviously for the inclusive person feature, which picks out groups that include both speaker and addressee. 2 In principle, based on the discussion of compositional transparency above, one might assume that the hierarchical structure posited for φ -features above has straightforward consequences for the linear order of person and number morphemes with respect to the stem (verbal or pronominal). If linearisation directly reflects composition, one might expect person morphology to appear closer to the stem than number morphology. The result of this person-closer linearization strategy is illustrated in Fig. 2c.
This prediction is not borne out by the available typological data (Trommer 2003). A small sample of languages with a boundary between person and number morphemes reveals that, regardless of whether they precede or follow the stem, person morphemes tend to linearly precede number morphemes: in 8/9 of prefixal languages, 17/22 suffixal languages, and 25/25 languages with discontinuous affixes (see Fig. 2b). That φ -feature linearization does not appear to obey the requirements of compositional transparency can be seen most clearly for prefixal languages, where we would expect number to precede person but in almost all cases it is the reverse (bold in Figs. 2c and 2d). These data therefore suggest the possibility that a person-left linearization strategy may be operating in this domain.
There have been several attempts to formally account for the linear order of person and number morphemes (Noyer 1992, Halle 2000, Trommer 2003, Harbour 2008, 2016. In all these theories, ordering person to the left is predicted to be preferredin accordance with the typological data cited above. For example, working within an Optimality-Theoretic framework, Trommer (2003) proposes two violable ordering constraints: a constraint penalizing orders in which person is not maximally leftwards, and another penalizing orders in which number is not maximally rightwards. These constraints predict the discontinuous P-stem-N to be the preferred order across languages because it violates neither constraint. The next best are P-N-stem and stem-P-N, each of which violates one constraint and satisfies the other. These are predicted to arise in languages depending on which constraint is more highly ranked. Other orders involve both markers in non-optimal positions, and are therefore not predicted to arise. Trommer (2003) thus predicts both person-left, and the general preference for discontinuous affixation.
Most accounts, however, focus on explaining the pervasiveness of discontinuous agreement, that is, why there is a general propensity for number to be linearized as a suffix, and for person to be linearized as a prefix.
For instance, Harbour (2016) derives discontinuous agreement from the hierarchy in Fig. 2a via Mirror Theory (Adger et al. 2009). Under this account, because person dominates number within the φ -feature bundle, it must linearly precede it, and because person composes with the stem first, it must also be adjacent to the stem. When person and number are realized by different affixes, the only pattern which satisfies both these requirements is P-stem-N. There is no more general prediction favoring person-left. For example, among prefixal patterns, N-P-stem violates the requirement that person linearly precede number, while P-N-stem violates the required adjacency between person and stem (see Harbour 2016, for the details of this analysis).
Here we are interested in whether the linearization of φ -features is influenced by the straightforward and potentially very general notion of compositional transparency described above, or whether in this domain there is some principle which favors person-left. We therefore focus on the relative order of person and number markers in fully prefixal and suffixal agreement, and set discontinuous agreement aside. 3 While the person-left linearization strategy seems to better capture the typological distribution, the language sample supporting this is very small. Indeed, the critical cases for us are those which involve distinct person and number prefixes; suffixal and discontinuous cases do not allow us to compare the person-closer and person-left strategies directly. In the sample above, we are thus left with 9 languages. At the same time, as mentioned above, there is independent evidence from laboratory language learning studies that learners generally prefer compositionally transparent orders of both words (Culbertson andAdger 2014, Martin et al. 2019a,b) and morphemes (Saldana et al. 2019). Thus it is possible that the typological sample we have in this case is misleading, or at least does not obviously reflect the kind of universal features of the cognitive or linguistic system that are of primary interest to linguists (for a general discussion of the problem of small typological samples for linguistic theorizing see Culbertson 2012, Piantadosi andGibson 2014) We therefore follow these previous studies in seeking independent evidence of learners' biases using Artificial Language Learning. Specifically, we test whether participants learning a miniature artificial language are biased in favor of placing person morphemes closer to the stem than number, or whether they place person to the left of number. If person-left is indeed preferred, then theorists are justified in positing, for example, specialized constraints for the domain of φ -feature linearization. If person-closer is preferred, this would suggest that the general tendency for linear order to reflect composition is at play in this domain as well.

Methods
The artificial language learning experiment described here uses an extrapolation paradigm (aka the 'Poverty of the stimulus' paradigm; Wilson 2006, Culbertson andAdger 2014).
In this paradigm, learners are trained on input that is ambiguous by design, and then must extrapolate beyond this input in a way that disambiguates hypotheses of interest. Here, the two hypotheses of interest are person-closer (person is ordered closer to the stem than number) or person-left (person is ordered to the left of number). In our experiment, all participants are taught a miniature language with three verbal stems, and two affixes, one for second person and one for plural number. Singular and first person agreement are unmarked in the language. The input also indicates whether affixes in the language precede the verbal stem (PREFIX condition) or follow it (SUFFIX condition), and our participants are randomly assigned to one of these two experimental conditions. Crucially, the input does not include any examples in which the two morphemes co-occur. That is, participants are trained on 1SG, 1PL and 2SG verbal agreement, but the 2PL meaning, which requires two overt affixes, is held out by design. The input is therefore ambiguous between the two hypotheses of interest. At test, learners are asked to extrapolate to the unseen meaning: they must choose between two possible ways of ordering morphemes to express 2PL verbal agreement. The order they infer will indicate whether they have a preference for person-left or for person-closer linearization patterns. Note that, in the SUFFIX condition, both person-closer and person-left strategies predict stem-P-N linear order (rather than stem-N-P). Therefore, as with the typology, the crucial case is the PREFIX condition: person-closer predicts N-P-stem order, while person-left predicts P-N-stem order. A summary of the experimental design is given in Fig. 3.
All experimental data reported here are available here, and the pre-registered design and analysis plan is accessible at here. 4 (a) PREFIX condition (b) SUFFIX condition Figure 3: Experiment design for PREFIX and SUFFIX conditions: missing cell represents held-out meaning; arrows indicate morpheme orders participants could infer to express the held-out meaning depending on whether they prefer person-close or person-left.

Participants
A total of 194 English-speaking adults (PREFIX: 96; SUFFIX: 98) were recruited via Amazon Mechanical Turk. 5 Importantly, English does not itself provide evidence for relative ordering of person and number, either on the verb or in the pronominal system. Participants were paid 2.5 USD for a 15 min-long experimental session. As per our pre-registered plan, participants were excluded from the analyses if: (a) they failed to pass two attention checks included in the experiment 6 , and/or (b) they failed to correctly answer more than 2/3 of the taught forms during testing. This resulted in the analysis of 99 participants in total (PREFIX: 45; SUFFIX: 54).

Materials
The lexicon included three semi-nonce verbal stems: 'miti', 'kizi' and 'higi', meaning meet, kiss and hug respectively. In addition, there were two nonce affixes: 'lu' and 'pa'. These were randomly mapped to plural number and second person for each participant. All words were presented in written form. Recall that affixes in the input language could either appear before the verbal stem (PREFIX condition) or after (SUFFIX condition). Person and number markers served to indicate the subject of an event-that is, they instantiated subject-verb agreement. Inflected verbs (1SG, 1PL, 2SG and 2PL verbs) were used as one-word answers to English interrogative sentences of the form 'Who will meet/kiss/hug [celebrity]?', which were randomly drawn from a list of 60 different tokens. For example, if the question was "Who will kiss Cardi B?", the answer might be 'kizilu'(KISS.2SG), indicating that the addressee is the person who will kiss Cardi B.
To express the person and number meanings, we commissioned a cartoonist to draw scenarios involving a family of three sisters and their parents. Each family member has a clearly-defined role in the conversational context. The two older sisters are speech act participants (in all scenarios they are either speaker or addressee). The third (little) sister was spatially close, but never a speech act participant. The parents were seated in the background (serving as additional others). Person and number meanings were expressed by highlighting subsets of family-members, including either the speaker alone (1SG), the addressee alone (2SG), the speaker plus other non-participant(s) (1PL), or the addressee plus other non-participant(s) (2PL). The set-up is illustrated in Fig. 4.

Procedure
Participants were first introduced to the family, including the names of the sisters, and were told they were going to see the sisters playing with a hat that has two magical properties: whoever wears it can see the future, but they also talk in a mysterious ancestral language. Participants were instructed to figure out the meanings of words in this new language. The general structure of trials was as follows: one of the sisters (not wearing the hat) asks a question about which member or members of the family will meet/kiss/hug some celebrity. Participants were explicitly told that the sister wearing the hat would answer the question using a verb in the ancestral language, and that these verbs would vary depending on who, among the family members in the scene, will perform the action described by the verb. 7 The experiment had three phases. Participants were first exposed to the three verbs ('kizi', 'miti', 'higi') in their 1SG, 1PL and 2SG forms. Each exposure trial had two parts: a scene where a question is asked and a scene where the question is answered using the corresponding verb. There were 18 exposure trials (2 repetitions for each form, presented in random order). After this exposure phase, participants were given an initial test on the trained meanings. Participants were presented with a question, just as before, but they were only given the meaning the answer should convey (i.e., by highlighting the specific subset of the family members) but not the verbal form. They were asked to pick the correct form for that meaning among two options: the target form and an alternative one involving the correct verb but the wrong agreement marker. Participants were given feedback on their answers. This phase consisted of 27 trials (3 repetitions for training form, presented in random order).
The critical test phase involved a similar procedure but also included trials for the heldout category (i.e., 2PL). In held-out trials, participants had to choose which of two options they would use to express 2PL verbal agreement. The two possible choices instantiate two alternative orderings of person and number affixes (N-P or P-N) with respect to the stem. This phase consisted of 30 trials (presented in random order), 12 of which tested the heldout meaning. No feedback was provided to participants at this stage. The order of presentation of meanings was fixed by verb in the exposure phase, and fully randomized in both training and test phases. An illustration of the procedure is given in Fig. 4.
At the end of the experiment, participants were asked to complete a questionnaire in which they had to provide meanings for one verbal stem in each of the three training configurations: with no overt marking (1SG), with plural marking only (1PL) and second person marking only (2SG).

Results
Recall that participants were exposed to two markers, one for plural number and one for second person, and they learned that these affixes either preceded (PREFIX condition) or followed (SUFFIX condition) the verbal stem. However, they were provided with no evidence about the relative order of the two affixes. At test, the held-out 2PL meaning was added and participants had to select between the two possible relative orderings of person and number. Fig. 5 shows the frequency with which participants selected person-closer responses (i.e., stem-P-N or N-P-stem). A visual inspection of this figure suggests that participants in both conditions prefer person-closer orders, although participants in the SUFFIX condition appear to chose this order more consistently than participants in the PREFIX condition. Following our pre-registered plan, we ran a logistic mixed-effects regression model predicting the choice of person-closer morpheme order by Condition (PREFIX; SUFFIX). Morpheme order was a binary variable (coded as 1 for person-closer patterns, and 0 for patterns with person at the periphery). Fixed effects were treatment coded (with SUFFIX condition as baseline). Random by-participant intercepts were also included. This model revealed that the log odds of choosing person-closer patterns in the SUFFIX condition were significantly above chance (intercept: β =2; p < .001). 8 Moreover, the effect of Condition was not significant (p = .12), indicating that a preference for person-closer patterns held regardless of whether the morphemes were prefixes or suffixes.

Discussion
In this experiment, we used an artificial language learning paradigm to investigate learners' implicit assumptions about the relative order of person and number markers. The aim was to test whether a general preference for compositional transparency is active in this domain (e.g., as has been previously found for nominal word order and relative order of case and number morphemes), or whether instead there is a preference for linearizing person to the left (e.g., as is suggested by a typological sample). We found that learners were more likely to infer a relative order of person and number morphemes in which person is linearly closer to the stem than number, regardless of whether these morphemes are prefixes or suffixes.
This person-closer linearization is in line with the predictions of the compositional transparency hypothesis: learners are biased in favor of linear orders that mirror composition, as represented in an underlying hierarchical structure. It is worth noting, however, that the bias for compositional transparency we find in this case is relatively weak compared to parallel biases found in similar experiments. For example, Saldana et al. (2019) find a near categorical preference for compositionally transparent ordering of case and number morphemes. Besides a number of methodological differences (i.e., online vs. lab experiments, length of experiment, etc.), there are several possible explanations for the particular pattern of results we see here. All of these have the potential to increase the level of noise in the inferences learners make. We will discuss each in turn.
Before discussing more theoretically interesting explanations for our results, we will first discuss some potentially troubling findings from a qualitative assessment of our debrief questionnaire. Table 1 shows the proportion of participants who accurately provided the intended meanings for the forms that they were trained on. What we see is that participants generally understood the 1SG and 2SG forms as we intended. However, there was significant variation in responses for 1PL, and less than 50% of participants provided the correct interpretation.
Looking more closely, the two most common responses for the 1PL can be classified as 'we' (the expected response), or 'they'. Why might participants have interpreted the 1PL form in this way? Essentially, we think that, because 1SG was unmarked, some participants might have taken the presence of the speaker in 1SG to be unmarked, and the 1PL marker as specifically referring to 'other(s)'. For these participants, both morphemes would then essentially be conveying person information (i.e., second or third), and our predictions for 2PL held-out responses would then not hold.
To investigate whether we nevertheless still see a person-closer preference among participants who do interpret 1PL as expected, we ran a post-hoc logistic regression analysis. The model predicted person-closer choices in held-out trials as a function of whether participants interpreted 1PL as 'we' (PREFIX: 16; SUFFIX: 22) or as 'they' (PREFIX: 11; SUFFIX: 13). Participants who provided some other interpretation were excluded. Interpretation was treatment coded, with 'we' interpretation as the baseline. The model revealed that the log odds of choosing person-closer patterns for participants who interpreted 1PL as 'we' were significantly above chance (intercept: β =1.34; p = .05). The two groups also differed significantly from one another, with participants who interpreted 1PL as 'they' showing a stronger preference for person-closer (estimate: β =2.75; p = .02).
Importantly, this post-hoc analysis suggests that participants prefer what we have called a person-closer order regardless of how they have interpreted these markers. For participants who interpreted 1PL as 'we', this is consistent with the compositional transparency hypothesis as described here. For those who interpreted 1PL as 'they', the explanation is less clear. If the 1PL marker is treated as +others, or as a third person marker, in the held-out forms participants would be showing ordering preferences regarding +addressee and +others morphemes. Then, we would potentially want to consider whether the internal structure of the φ -feature bundle makes predictions about how specific person features might be linearized relative to each other. We leave this aside here for future investigation (although see Noyer 1992).
Above we saw that a preference for person-closer was still observed among participants who correctly interpreted the morphemes they were trained on. Therefore, the question remains as to why this preference is relatively weak compared to that found in other domains (namely nominal word order and case-number morpheme order). One potential explanation is that, unlike in these domains, for φ features, a general bias for compositional transparency and a second more specific bias (for person to the left) may both be influencing learners' behavior. When these are in competition, the latter may temper the former. If this is the case, then we would expect a relatively weak preference for person-closer in the prefix condition (assuming the general preference for compositional transparency is stronger), but a strong preference in the suffixing condition, where the two biases align. However, while there is a trend in this direction in our data (see Section 3), it was not significant. We therefore set this aside as a potential explanation.
A second possibility is that compositional transparency rather than person-left is indeed the main driver of participants' ordering preference, but this effect is weaker for person and number than in other domains due to the specific nature of these morphemes and how they combine with verb stems. In particular, it may be that person and number are more tightly related to one another than number and case morphemes, or different nominal modifiers (like Adjectives and Numerals). For example, as outlined above, these morphemes crucially interact to constrain the set of entities being expressed, and are typically treated as part of a φ -feature bundle in a way that case and number, or nominal modifiers are not. This may lead to weaker preferences regarding the linear order of these elements. Indeed, although learners' ordering preferences are overall stronger in the nominal domain, there is nevertheless evidence that using non-transparent orders is tolerated more for structurally closer modifiers (Culbertson andAdger 2014, Martin et al. 2019a).
In addition to this, it is worth noting that the semantic effect of composition between person and number and the verbal stem is potentially quite distinct from composition of case and number with a noun stem, or of nominal modifiers with a noun. For example, plural number marking has a clear impact on the interpretation of a noun like 'cow', by restricting its denotation: If the bare noun 'cow' denotes a set containing both atoms and sums, the bare plural 'cows' denotes a set of only sums (see Sauerland et al. 2005, Sauerland 2008, for a developed account). In contrast, whether φ -features have an impact on the meaning of the verb is debatable; rather than restricting the denotation of the verb, their inclusion might be purely grammatical (Chomsky 1995, Harbour et al. 2008, andreferences therein). This lack of impact on verb meaning might lead to a weaker preference for having compositional reflected in linear order.
Regardless of its strength, the bias uncovered here contradicts the apparent crosslinguistic tendency for placing person linearly left of number, suggesting the typology may not reflect any principle active during learning. Accordingly, our results are not fully consistent with any of the theories built to account for person-left agreement patterns (Trommer 2003, Harbour 2008, 2016. For example, Trommer (2003) predicts a preference for P-N-stem over N-P-stem order, and Harbour (2008) predicts no asymmetry between them. Neither of these predictions straightforwardly match our learning results.
If there is no person-left bias in learning, then why would 'person-left' patterns be more prevalent in the typology? One possibility is that the tendency is accidental; the difference in counts between P-N-stem and N-P-stem (see Fig. 2b) is the product of a small sample of languages, likely necessarily constrained since many languages do not exhibit a clear split between in person/number morphemes. In our view, this is not totally satisfactory, since it does not address what appears to be a relatively strong preference for person-left in languages with discontinuous affixation (25/25 in Fig. 2b above).
In principle, one could test whether, for discontinuous affixation-where the pressure for compositional transparency is not relevant-behavioral evidence for a person-left bias can be found. However, conducting such an experiment would in practice be problematic: the P-stem-N pattern is superficially similar to the the word order attested in English (as well as in many other languages), where number is expressed as a suffix while person is encoded in pre-verbal pronouns. If English-speaking learners indeed preferred these Pstem-N orders, we would not be able to tease apart whether this is due to a person-left bias or to the similarity with participants' native language. We leave to future research the task of testing learners' preference for discontinuous affixation, perhaps with speakers of a language which does not have any evidence for this order (e.g., a verb-first language).

Conclusion
In this paper, we aimed to provide a new source of evidence regarding how the hypothesized hierarchical structure of φ -features is linearized, using artificial language learning experiments. Such experiments have provided strong evidence, corroborating typology, that learners transparent mappings between hierarchical structure and linear order (Culbertson and Adger 2014, Martin et al. 2019b,a, Saldana et al. 2019. The linear order of φ -features, specifically person and number, presents an interesting extension to this work, since there is evidence from the typology that compositional transparency may not be obviously at work in this domain. In particular, while it is generally agreed that number hierarchically dominates person, in the typology person marking tends to linearly precede number, rather than occurring systematically closer to the stem. The results of our experiment suggest that this apparent cross-linguistic tendency does not correlate with a learning preference: consistent with a general bias for compositionally transparent order, learners were more likely to infer orders with person closer to the verbal stem than number. This suggests the possibility that the 'person-left' tendency in typology may be the result of sparse sampling. We would argue that in such cases, evidence from behavioral experiments is key to providing justification for strong theoretical claims.