Articles in Natural Languages and Artificial Languages

Sunyoung Park1, Jin-young Tak2,
Author Information & Copyright
1Sejong University, Korea
2Sejong University, Korea
Corresponding Author : Department of English Language and Literature, Sejong University 98 Gunja-dong, Gwangjin-gu, Seoul, Korea 143-747. E-mail :

Copyright ⓒ 2017, Sejong University Language Research Institue. This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: Jan 20, 2017; Revised: Feb 19, 2017; Accepted: Mar 2, 2017

Published Online: Mar 31, 2017


The aim of this paper is to investigate the usage of articles (i.e., definiteness and specificity) in natural (i.e., English, Samoan, and Lillooet Salish) and artificial (i.e., Esperanto, Unish, and Sambahsa) languages and then to suggest an optimal pattern of articles in artificial languages. The observation in this paper is supported by the typology of articles in natural languages, language acquisition, markedness of articles, and historical developments of articles. Then, it proposes that articles are a recent and uncommon grammatical realization and show great variation across languages. Finally, this paper proposes the ‘Minimal Realization Principle’ for article uses in artificial languages.

Keywords: articles; definiteness; specificity; universals; typology; language acquisition; artificial language

1. Introduction

An article is referred to a lexical category that co-occurs with a noun to indicate the type of reference (i.e., definiteness, specificity, or genericity) being marked by the noun.1

With respect to distributions, articles are frequently attested in many Indo-European languages (i.e., Romance, Semitic, and Polynesian languages); among these, English and German should mandatorily use both definite and indefinite articles. In Semitic languages, only definite articles occur and the nonexistence of articles expresses the concept of indefiniteness. By contrast, some North Caucasian languages use articles but optionally.2 Even though it is reported that a lot of Indo-European languages facilitate articles, there is still the large number of the languages (i.e., Korean, Chinese, the majority of Slavic and Baltic languages, and Bantu languages) that do not have articles. According to Dryer (1989), an article is an uncommon grammatical phenomenon; based on the empirical findings, he proposes that one third of the world languages would possess articles, and only 8% may have both definite and indefinite articles (Mulder & Carlier 2010).

From the historical viewpoint, it is also believed that a grammatical category called articles is a recent one, supported by evidence that Proto-Indo-European did not have articles (Mulder & Carlier 2010). Additionally, Leiss (2000) and Abraham (2007) report that articles appeared in Middle High German and Middle English around the 11th century in Germanic languages; Harris (1980), Selig (1992), and Putzu & Ramat (2001) show that definite articles first emerged in Late Latin in Romance languages.

Furthermore, from the perspective of typology, Moravcsik (1969) and Heine (1997) posit that definite articles are more unmarked than indefinite articles; that is, if a language has an indefinite article, that language is likely to have a definite article, but not vice versa.3

Based on the above observations on articles, it is accepted in this paper that articles are rather uncommon and recent. However, the use of articles has been treated as an area of special significance in the field of language acquisition since it is characterized by large variation across languages. Given this, this paper carefully investigates the use of articles in natural languages and then proposes the ‘Minimal Realization Principle’ of article uses in future-developed artificial languages.

2. Articles Uses in Natural Languages

2.1. Cross-linguistic Variation in Specificity/Definiteness Marking

Articles encode different semantic features cross-linguistically. This section concerns the definiteness and specificity features. Even though specificity can be defined in several senses, the term will be used in a precise sense as a speaker’s intent to refer throughout this paper (Fodor & Sag 1982). Some languages like English encode ‘definiteness’ via using articles, whereas languages like Samoan and Lillooet Salish encode ‘specificity’ with articles. Let us first examine the definitions of definiteness and specificity.

The features [definite] and [specific] are both related to discourse. In other words, they are related to the knowledge of a speaker and/or a hearer in the discourse. While [+definite] feature reflects the state of knowledge of both a speaker and a hearer, [+specific] feature reflects the state of knowledge of the speaker only. Informal definitions of definiteness and specificity are proposed by Ionin et al. (2004), and illustrated in (1).

In English, [+definite] feature is represented morphologically by English article ‘the’ and [-definite] feature is marked by article ‘a’. Definite use of English article ‘the’ and indefinite use of English article ‘a’ are exemplified in (2).

Considering our knowledge which already provides information that there will be only one winner for the tournament, the uniqueness of the DP is obvious. Thus, the definite article the is used.

As we have seen, two articles in English, the and a, mark definiteness and indefiniteness, respectively. Standard English does not mark ‘specificity’ in their article system. However, in colloquial English, English speakers can mark specificity with the use of the demonstrative this. Consider the examples in (4) and (5), extracted from Lyons (1999) and MacLaran (1982), respectively.

In example (4a), the speaker intends to refer to a unique individual with whom he does not get on at all. Likewise, in (5a), the speaker intends to refer to a particular telephone that has a property of being weird purple. In contrast, in (4b) and (5b), the speakers do not intend to refer to a particular merchant banker nor a particular telephone. One can claim that [+specificity] can encoded by the use of referential this in spoken English. Therefore, English indefinite article a does not bear property of specificity, and thus it can be used in both [+specific, -definite] contexts as in (4a) and (5a) and in [-specific, -definite] contexts as in (4b) and (4b). More detailed discussion on referential this can be found in MacLaran (1982).

While we have seen that the feature [+specific] in English is marked by this, the conditions on specificity can be also met in definite contexts as well.

In (6a), the speaker intends to refer to a unique individual who is (i) the winner of today’s race [+definite] and (ii) also has the property of being the speaker’s best friend [+specific]. In (6b), the speaker intends to refer to a unique individual who is the winner of today’s race [+definite], but the speaker do not intend to refer to a particular individual [-specific]. As we have seen in examples (6a) and (6b), the definite article the can be used in specific context as well as definite context. Therefore, one can conclude that specificity distinction is not dependent on definiteness distinction. We have discussed that English articles are used to distinguish definiteness, then the following logical question would be the existence of any language whose articles mark specificity. Let us now examine Samoan language that uses articles to encode specificity rather than definiteness.

In Mosel & Hovdhaugen (1992), Samoan uses articles le or l to mark [+specific] and se or s to mark the [-specific] feature. According to Mosel & Hovdhaugen (1992), uses of le indicate that the DP refers to a specific/particular entity and it is independent of definiteness. See the example in (7)4.

In (7a), the speaker is beginning to tell a story of a couple with a child whom she or he is acquainted. Since the couple was first mentioned, it is [-definite] and the speaker has already established the existence of the couple in his or her mind, it is [+specific]. Thus, article le was used. In example (7b), the story continued from (7a) and it indicates that the context has become familiar to both speaker and hearer. Notice that even if the existence of the couple has been a shared knowledge to both speaker and hearer, thus [+definite], article le is still used in (7b), regardless of the definiteness of the DP.

Let us now consider the use of se in Samoan. Mosel & Hovdhaugen (1992) stated the use of se as “the nonspecific singular article se/s=ART(nsp. sg.) expresses the fact that a noun phrase does not refer to a particular, specified item, but to any member of the conceptual category denoted by the nucleus of the noun phrase and its adjuncts” (Mosel & Hovdhaugen 1992: 261). The use of se can be found in the examples in (8).

In (8a), the speaker does not refer to a particular coconut in the utterance. Therefore, the context is [-specific, -definite] and se is used. In (8b), there exists a family that the boy belongs to, but the speaker does not necessarily know which family that is. Thus, it is [-specific, +definite] and se is used, regardless of definiteness of the context. Likewise, Samoan articles are used to indicate specificity, whereas English articles are used to indicate definiteness.

Lillooet Salish, also known as St’at’imcets, is the Interior Salishan language spoken in southwest British Columbia, Canada. Matthewson, Brayant & Roeper (2001) investigated specificity distinction in Salish, and showed article uses of Salish in distinguishing “speakers’ familiarity to given entities” (i.e., specificity). It was stated that article ti … a is used when referring to entities whose existence is already known to the speaker, thus specific. The article ku is used when the entity is not known to the speaker, thus non-specific. Consider the following examples in (9).

Considering examples in (9), Salish article ti … a does not encode definite/indefiniteness. The nouns in (9a-c) can be interpreted as both definite and indefinite. The following examples in (10) show Salish article uses of ku:

In (10a), as shown in English translation, there is no one who sang, thus, non-specific. In (10b) the elder is not known to the speaker, thus ku is used. In Salish, as shown in examples, speakers’ familiarity (i.e., specificity) determines the article.

2.2. Difficulties of Article Acquisition in Natural Languages
2.2.1. Cross-sectional Studies

The acquisition of articles is known to be notoriously difficult process for L2 learners (Huebner 1983, Master 1987, Parrish 1987, Murphy 1997, Roberson 2000, Leung 2001, among many others). Since English language is spoken widely, article acquisition has been studied widely with acquisition of English articles. Previous studies regarding acquisition of English articles proven that L2-English learners display errors in using articles by omitting or misusing articles. Such errors seem to be more prevalent among L2 learners whose native language does not have article systems at all.

Among many others, Ionin and her colleagues have conducted a number of studies on the property of English article acquisition (Ionin et al. 2004, Ionin & Montrul 2009, among many others). Ionin et al. (2004) tested whether adult L2 learners whose L1 do not have article systems can acquire the specificity and definiteness distinction in article semantics. They proposed that there is a semantic parameter named the Article Choice Parameter and it determines the distribution of the articles. It is a binary parameter including a definiteness setting and a specificity setting. In the definiteness setting, articles are encoded on the basis of (in)definiteness, whereas in the specificity setting articles are distinguished in accordance with the specificity of DPs. For example, languages like English encode definite features, but languages like Samoan encode specificity features.

Under this assumption, definiteness and specificity determines the use of articles cross-linguistically. It was proposed that without such features in the native language, L2 language learners will not know which article would be appropriate for the language they are learning. Ionin et al. (2004) proposed the ‘Fluctuation Hypothesis’ and it claims that learners would fluctuate between definiteness and specificity settings until they found proper parameter value based on sufficient input of L2 language. In other words, in fluctuation period, L2 learners would use definite article ‘the’ in indefinite contexts and specific contexts and in definite contexts. Also, they are expected to use indefinite article ‘a’ in definite and specific contexts and in indefinite contexts.

In order to test their hypothesis, L2 learners whose native language do not have articles (Korean and Russian) learners were tested with forced elicitation task and production test. According to the results, both Korean and Russian learners showed misuse of articles. The results of production test revealed overuse of ‘the’ in [-definite, +specific] contexts and overuse of ‘a’ in [+definite, -specific] contexts. In other words, learners tend to use ‘the’ to denote specificity.

Since article seems to be one of most difficult properties of language to acquire for adult learners, Ionin & Montrul (2009) conducted a study that tests effect of age in article acquisition. They compared the acquisition patterns between adult L2 learners and child L2 learners. An elicitation task was given to both groups of English learners. The results of this study revealed that child and adult learners showed similar accuracy rates, but their acquisition pattern was slightly different. To be more specific, child learners showed specificity distinction only in indefinite contexts, whereas adult learners displayed specificity distinction both in indefinite and definite contexts. It was concluded that child learners’ pattern is closer to that of native language speakers’.

Park (2014) conducted a study on the acquisition of English article regarding (in)definiteness with the most advanced English learners. In order to investigate whether English article is actually acquirable property or not, most advanced English learners of L1 Korean were tested in this study. While the study is limited to on L1 group of Korean, the result indicated that L2 English learners in end state grammar also showed difficulties by misusing ‘the’ in indefinite contexts and ‘a’ in definite contexts. The study revealed that specificity distinction played a role in deciding definite and indefinite articles.

2.2.2. Longitudinal Studies and Acquisition Process

Huebner (1985) conducted a longitudinal study for one year with a L1 Hmong (Laos) speaker who has been in living in the United States at the time of data collection. Then, a follow-up study was conducted after 20 months later.

Every three weeks, the data was collected during one year of study. At the beginning the subject showed a tendency to use zero article and ‘da5’, but not ‘a’. 6 weeks later, the subject started to overuse ‘da’ in all contexts. At 21 weeks, the learner began stop using ‘da’ in [-specific, -definite] contexts and finally drop using ‘da’ in [+specific, -definite] contexts in 27 weeks. To explain more, the learner started not to use ‘da’ in situation that a native speaker would use ‘a’. Hueber (1985) found later that ‘a’ had started to be used in [+specific, -definite] contexts in the follow up study. This longitudinal study seems to suggest an order of English article acquisition and it is further supported by Parrish (1987).

In his study of Parrish (1987), data was collected from a 19 year old Japanese girl. In the time of data collection, the subject had been in the United States for 3 weeks. She had given English instruction in Japan for 6 years but she was assessed as a beginner. Data were collected through a storytelling and description of a place every ten days for four months. Data collection was focused on the use of articles including ‘the’, ‘a’, and ‘zero article’.

The results are as follows. Firstly, at the first stage, the participant showed a tendency to omit articles (zero article). Secondly, the participant over-used ‘the’ in contexts where the native speaker would have used ‘a’ instead (the overuse). Also, the use of ‘a’ is very low. One can suggest that acquisition of indefinite article ‘a’ emerges later than that of ‘the’.

Considering previous article studies on English articles, even though the participants and data elicitation method were not entirely identical, the literature seems to propose that acquisition of English article might occur incrementally.

Bare NP → the → a

However, one can question whether the process above is related to developmental process or they are just influenced by other variables such as L1, thus this assumption should be further examined. However, according to Hawkins (2001), learners start with simple form of grammar, for instance from V to its projection VP. Likewise, the acquisition of DP structure would occur incrementally from bare NP to its projection DP.

What is more, earlier acquisition of ‘the’ than ‘a’ can be accounted for theoretically. Firstly, ‘the’ takes minimally restricted complements. In other words, ‘the’ can be used with countable, non-countable nouns in both singular and plural forms. On the other hand, ‘a’ takes quite limited forms of complement (i.e., countable singular nouns). Secondly, at early emergence of ‘the’, it seems to mark specificity of NP and specificity is said to be a local modification, whereas definiteness involves D-operator. Thus, it is a logical assumption to presume acquisition of DP (the in definite, a in indefinite) develop incrementally as shown in the acquisition of IP (Hawkins 2001).

As we have seen in previous literature, articles seem to be one of the most difficult properties to acquire because it involves complicated semantic representation and it varies in accordance with different languages with article system. Therefore, for those language learners whose native language does not have articles at all should learn (i) first the existence of articles in certain languages and (ii) should be able to identify the semantic representation of articles in the languages they are intending to acquire.

3. Articles Uses in Artificial Languages

In this section, three artificial languages (i.e., Esperanto, Unish, and Sambahsa) are investigated with respect to the usage of articles.

An artificial language is referred to a language devised for a specific purpose, such as international communication, a secret society or computer programming.6 Especially, when artificial languages are constructed to take place of natural languages, they intend to make communication simpler.7 Therefore, it is easily assumed that typical grammar patterns in artificial languages are as simple as possible.

Given this assumption, first consider Esperanto, one of the most well-known artificial languages; Esperanto uses only the definite article la to specify the definiteness of a noun as in la libro ‘the book’ and la domo ‘the house’. Unlike English, Esperanto does not have indefinite articles; nouns in isolation without articles denote [-definite]. Consider the relevant data as in (11), drawn from MYLANGUES.ORG (2015).8

As depicted in (11), when indefinite articles are not realized in discourse, the nouns indicate [-definite].

Different from Esperanto, Unish, an artificial language developed from Esperanto and 14 natural languages that have 70 million or more native speakers, in principle does not have any articles. Instead, the [+definite] feature is contextually encoded. Consider the following data from Sejong University (2014) and <>:

As seen in the data in (12), dog in the first sentence is realized with the feature [-definite] while the one in the second sentence is translated into ‘the dog’ since dog in the second sentence is a referent which has already mentioned in the first sentence; it is endowed with the feature [+definite]. Even though Unish obliterates articles in its grammar to make grammar simpler, definiteness coded by a noun may be attested from the contexts.

Furthermore, if it is necessarily needed to assign the feature [+specific] to a noun, the demonstrative da can be used in Unish. This is shown in (13).

The expressions in (13) have almost similar meaning to the ones in (12). However, dog in the second sentence when used with the demonstrative da is referred to the very particular and specific referent which has already been mentioned and is located closer to a speaker and a hearer.

Sambahsa, an artificial language constructed in 2007 from the Proto Indo-European language, has a very peculiar characteristic with respect to the usage of articles. This language proposes the same word for definite articles (the in English) and the third personal pronouns. Before introducing the relevant data, the pronouns in this language are illustrated in Table 1, based on Stelo (2011):

Table 1. Pronouns in Sambahsa (Stelo 2011)
Gender Subject Object
Singular Plural Singular Plural
1st - ego wey me us
2nd - tu yu te vos
3rd Male is les iom lens
Female la las iam lans
Neutral id ia id them
Download Excel Table

As shown in Table 1, Sambahsa has very similar pronoun patterns to English.

Even though Sambahsa uses the third pronouns in favor of definite articles instead of coining new vocabulary, it may be an obstacle for language learners to comprehend literature written in Sambahsa (Winter 2015). Consider the following data in Sambahsa9:

As shown in the data in (14b), when the third person pronoun la ‘she’ precedes the noun gwena ‘woman,’ la functions as the definite article. As shown in the data in Table 1 and (14a), the third person pronoun ‘she’ in Sambahsa is la. By contrast, as in (14b), the third person pronoun la must precede the noun gwena ‘woman’ to indicate [+definite] of the noun. In other words, since gwena is a subject in the sentence, the third person subject pronoun la, not the third person object marker iam ‘her’, is used. In the same light, the third person object pronoun iom ‘him’ must be realized to encode definiteness of the object wir ‘man’. At the first glance, it seems to be difficult to acquire the articles in Sambahsa, since the learners are required to understand a complex pronoun system in advance.

4. Implications and Conclusion

In this paper, the article uses have been investigated both in natural languages and artificial languages. Important to the investigation of this paper is the view that articles are a recent grammatical phenomenon, only the small number of natural languages possess both definite and indefinite articles. Especially, the data in (11)-(14) ensure that article uses in artificial languages vary even though it is presumed that artificial languages may have a simplified article system. Since languages vary, it cannot be proposed here what artificial language is better in terms of article systems.

However, based on Krámsky (1972) and Mulder & Carlier (2010), this paper suggests that an article is not the only way to express definiteness or specificity; it may be marked by various grammatical means including word order, case inflection, stress, or intonation. Therefore, this paper proposes the “Minimal Realization Principle” for article uses. In other words, to make speakers lessen a burden on mastering a new language, it is suggested that a simple article system is preferable in newly-developed artificial languages, and therefore definiteness and specificity can be derived from context.

Even though in this paper it cannot be strongly asserted that the proposal is right, it is certain that there still remain other aspects of article uses demanding a future study; it is worth of further investigating article uses in natural and artificial languages.


1. This paper only examines a concept of definiteness and specificity of articles. The generic use of articles is outside of the scope of this paper; this is not further investigated in this paper.

3. Krámsky (1972) and Mulder & Carlier (2010) propose that Turkish is a counterexample to this argument. However, they also attest that 95% of the languages comply with this markedness principle with respect to definite and indefinite articles.

4. The abbreviations that will be used in this paper are as follows: ART = article; DET = determiner; SG = singular; PL = plural; DU = dual number; PRES = present tense; PAST = past tense; POSS = possessive marker; LD = locative case & directional; DIR = direct case; INTR = Intransitive; 1 = first person; 2 = second person; 3 = third person; NOM = nominative case; NEG = negative marker; OBJ = object; HYP = hypothetical mood; CONJ = conjunction.

5. ‘Da’ is a phonological approximation to ‘the’ in native English.

6. The definition of an artificial language is drawn from < browse/artificial-language>.



Abraham, W. 1997. The Interdependence of Case, Aspect, and Referentiality in the History of German: The Case of the Verbal Genitive. In A. Van Kemenade et al. (eds.), Parameters of Morphosyntactic Change 29-61. Cambridge: CUP.


Donnellan, K. 1966. Reference and Definite Descriptions. The Philosophical Review 75, 281-304


Dryer, S. 1989. Article-Noun Order. Chicago Linguistic Society 25, 83-97.


Fodor, J. & I. Sag. 1982. Referential and Quantificational Indefinites. Linguistics and Philosophy 5, 355-398


Harris, M. 1980. The Marking of Definiteness in Romance. In J. Fisiak (ed.), Historical Morphology 141-156. The Hague: Mouton


Hawkins, R. 2001. Second Language Syntax. Oxford: Blackwells Publishing.


Heine, B. 1997. Cognitive Foundations of Grammar. Oxford: OUP.


Huebner, T. 1983. A Longitudinal Analysis of the Acquisition of English. Ann Arbor: Karoma.


Ionin, T., H. Ko & K. Wexler. 2004. Article Semantics in L2-Acquisition: the Role of Specificity. Language Acquisition 12, 3-69


Ionin, T. & S. Montrul. 2009. Second Language Acquisition of Articles: Empirical Findings and Theoretical Implications. Amsterdam: John Benjamins.


Krámsky, J. 1972. The Article and the Concept of Definiteness in Language. The Hague: Mouton


Leiss, E. 2007. Covert Patterns of Definiteness/Indefiniteness and Aspectuality in Old Icelandic, Gothic, and Old High German. In E. Stark et al. (eds.), Nominal Determination: Typology, Context Constraints, and Historical Emergence 73-102. Amsterdam: Benjamins.


Leng, Y-K. 2001. The Initial State of L3A: Full Transfer and Failed Feaures? In X. Bonch-Bruevich et al. (eds.), The Past, Present, and Future of Second Language Research: Selected Proceedings of the 2000 Second Language Research Forum 55-75. Sommerville, MA: Cascadilla Press.


Lyons, C. 1999. Definiteness. Cambridge: CUP


MacLaran, R. 1982. The Semantics and Pragmatics of the English Demonstratives. Ph.D. Dissertation, Cornell University.


Master, P. 1987. A Cross-linguistic Interlanguage Analysis of the Acquisition of the English Article System. Ph.D. Dissertation, University of California, Los Angeles.


Matthewson, L., T. Bryant & T. Roeper. 2001. A Salish Stage in the Acquisition of English Determiners: Unfamiliar 'Definites'. The Proceedings of SULA: The Semantics of Under-Represented Languages in the Americas, University of Massachusetts Occasional Papers in Linguistics 25.


Moravcsik, A. 1969. Determination. Working Papers on Language Universals 1, 63-98.


Mosel, U. & E. Hovdhaugen. 1992. Samoan Reference Grammar. Oslo: Scandinavian University Press.


Mulder, W. & Q. Carlier. 2010. The Emergence of the Definite Article in Late Latin: Ille in Competition with Ipse. In H. Cuykens, K. Davidse & L. van de Lanotte (eds.), Subjectification, Intersubjectification, and Grammaticalization 241-275. The Hague: Mouton De Gruyter.


Murphy, S. 1997. Knowledge and Production of English Articles by Advanced Second Language Learners. Ph.D. Dissertation, University of Texas, Austin.


MYLANGUES.ORG. 2015. Esperanto Articles. Available at <>.


Park, S. 2014. L2 Acquisition of Genericity in English Articles: The Case of Korean Adult Learners of L2 English. Ph.D. Dissertation, University of Sheffield.


Parrish, B. 1987. A New Look at Methodologies in the Study of Article Acquisition for Learners of ESL. Language Learning 37, 361-383


Putzu, I. & P. Ramat. 2001. Articles and Quantifiers in the Mediterranean Languages: A Typological Diachronic Analysis. In W. Bisang (ed.), Aspects of Typology and Universals 99-132. Berlin: Akademie


Robertson, D. 2000. Variability in the Use of the English Article System by Chinese Learners of English. Second Language Research 16, 135-172


Sejong University. 2014. Universal Language: Unish. Seoul: Language Research Center at Sejong University.


Selig, M. 1992. Die Entwicklung der Nominaldeterminanten im Spätlatein. Tübingen: Narr.


Stelo, V. 2011. Understanding Sambahsa Pronoun 1. Available at <>.