Looking at specific areas of the vocabulary of artificial languages (henceforth ALs) can give one an idea of the nature of such languages, at least with respect to the lexicon and perhaps also concerning derivational morphology. In this paper we present data from a semantic field which will be represented in all fully developed artificial languages, as well as in most or all natural languages, terms for bodies of water. This semantic domain may be of particular interest because, at least in natural languages, there are terms for objects which are basically the same, except for their size, e.g., ocean vs. sea and river vs. creek and brook. We will see that different languages have used different strategies for building and expanding vocabulary from this domain. In the second section of the paper we will present data and in the third section we will analyze some aspects of these data. Much more analysis could be done on these data, and the analyses in Section 3 will only serve as an example of the kind of work which could be done on the vocabulary of ALs in this, and other, semantic fields.
There are many ALs which we will not discuss here; some did not get past the stage of being outlines of languages, with only a small number of words. In other cases there is no dictionary or vocabulary list(s), so that it is difficult to find particular words. This is, for example, the case with Donisthorpe (1913) on Uropa (an AL derived in part from Latin); there is a word for ‘river’ in Uropa, riva, but one can only find it by searching through the practice readings in this book. (It would obviously be wise for AL designers to provide vocabulary lists, but this may not have occurred to some of them.)
We are limiting our survey to a posteriori and mixed ALs. There are interesting relevant data from a priori languages, which we hope to discuss in a subsequent paper. We also do not discuss ALs based on a single natural language (e.g., Peano’s Latino sine flexione); they are less interesting from the point of view of one of the issues we will discuss, the choice of words, as all words will come from the one language. Further, as in our previous work, we are only interested in ALs designed for the particular purpose of making international communication easier, or languages whose designers seem to have had the idea of an auxiliary language in mind when creating their language.2 Even within this limited set of ALs, we have not presented data from all of them. Nevertheless we hope that we have given data from a reasonably representative sample of such ALs, and we have included most of the best known international auxiliary languages, and quite a few lesser known ones.3
We also have not included all terms for bodies of water; for one thing, we have left out some words which we believe are less common, e.g., those meaning ‘cove’. We also do not treat terms for temporary bodies of water, such as those equivalents to English puddle.
2. Data and Preliminary Discussion
We now present the data which we have collected. We have divided the languages which we treat into three groups: a posteriori languages which take vocabulary from several languages, languages which have taken over much of Esperanto’s vocabulary, and mixed languages. The three-way classification of ALs into a priori, mixed, and a posteriori languages involves a spectrum rather than discrete categories, and one could dispute our classification of some of the ALs. However, for our purposes this classification does not have much theoretical importance, and is largely a way of breaking the data into more managable chunks.
Table 1 presents terms for stationary bodies of water in many a posteriori languages, in alphabetical order of languages. In some cases a language designer or describer explicitly gives the source for the word (and this is also given in the table), but unfortunately usually this is not the case. One can often guess what the source is, although one cannot be entirely sure of this. An empty cell in the table can mean various things, e.g., that the works at our disposal do not have a term for the concept in question, or that they do have a term, but it is problematic for some reason. The reason is generally that we cannot be very confident that such a term has the meaning in which we are interested. Thus a dictionary of an AL may have a term for ‘stream’, but without any indication that the term means ‘small river’ rather than, or in addition to, some other meaning of stream.
We first look at ALs which take vocabulary from several natural languages and/or use “international” words, i.e., ALs which cannot be said to be based on a single language. Of course the best known of such languages is Esperanto.
Most of the Ardano words here will be unfamiliar to many people, since its designer said that his language “contains words from every natural language in the world” (Elhassi 2008a: 2). Even so, one of the words for ‘sea’, mar, is recognizable to speakers of some major European languages, and similar to words meaning ‘sea’ in many other ALs. Euransi also has many sources for its vocabulary, including English, Spanish, Tajik, Japanese, and Chechen, but several of its words in the above table are not very unlike their equivalents in English or other Western European languages. However, some Euransi words, e.g., buhayri ‘lake’, are quite different from their counterparts in English, and so on. Another AL with a wide range of lexical sources is Ceqli, as one might guess from the three words in the above tables.
Neo Patwa also has vocabulary from many languages. Its word moana covers the meanings of three English terms, perhaps because this language was meant to have a limited (and thus easy to learn) vocabulary.90
Jaque (1944: 24) says, “Olingo is basically Neo-Latin and Anglo-Saxon with roots and words selected from all of the major languages of both the Western and Eastern Hemispheres”, but the words in the semantic field of bodies of water all seem to have a Romance origin.
Romanova draws its vocabulary from French, Italian, Portuguese, and Spanish. Given this, the Romanova words in the tables above are not surprising.
Unish uses 15 languages as sources for its vocabulary and this variety is reflected in the words in the above tables.
Even though English, according to Stadelmann (1945: 1), “provided the base for Voldu”, some of the Voldu words in the tables above do not seem to have come from English, e.g., mer looks like a Romance root. Ling also uses English as its main vocabulary source, but once again the word meaning ‘sea’, mar, seems to be Romance in origin.
Several of the languages in the table, particularly Esperanto, make use of a diminutive suffix to form some of the terms (rather than having completely different words). Most of the Sasxsek words were formed with the diminutive or augmentative suffixes, -im and -is respectively. What is interesting about its terms for moving bodies of waters is that the base term is the word meaning ‘stream’, rivo, rather than the word meaning ‘river’, as in some other ALs. From our sources it is not entirely clear that rivo meant ‘stream’ in the relevant sense (‘a small river’), but the fact that words meaning ‘brook’, ‘creek’, and ‘river’ are built from it, and the fact that it can mean ‘creek’ without a suffix, indicate that it does mean ‘stream’ in this sense.
UNI makes use of its diminutive and augmentative prefixes, BE- and BA- respectively (they can also be independent words), in combination with VAM ‘water’, to form two of the words in the tables above. Diminutive and augmentative affixes are extensively used in E. Courtonne’s langue internationale néo-latine; as seen in Table 1, terms with meanings from ‘pond’ to ‘ocean’ are derived from the root meaning ‘lake’.
Another way of reducing the number of roots in a lexicon is to have one word as an equivalent to two or more words of e.g., English; thus Euransi gelfi covers both bay and gulf. Pandunia goes even further in this direction; the word daria, in addition to meaning ‘lake, sea, ocean’, also means ‘river’.
Let us now look at Esperanto again, and at some ALs whose lexicon (apparently) has been derived from it, or largely so. (Some of the languages already discussed have also taken some material from Esperanto.)
It can be seen that Mondlango’s terms are often the same as those of Esperanto. The Arlipo and Mondlango words for ‘gulf’ are slightly different from that of Esperanto; perhaps they were changed to avoid the homonymy of Esperanto golfo, which can also mean ‘golf’ (the sport). Like Esperanto, Arlipo has the diminutive suffix -et-; perhaps the reason why Arlipo terms for ‘bay’, ‘brook’, ‘creek’, and ‘stream’ are not given in the vocabulary lists is that in general these lists do not contain words containing this suffix. We only know of one Arulo word in this semantic field, maro ‘sea’ (Talmey 1925: 18), i.e., it has the same word for ‘sea’ as Esperanto (and Arlipo and Mondlango).
The Arlipo word for ‘pond’ contains the roots fish- ‘fish’ and lag- ‘lake’, so one might think that it is limited to meaning ‘fish pond’, although to our knowledge there is no indication of this in materials about the language.
Let us now turn to some mixed ALs. The best known mixed AL, Volapük, used English as a vocabulary source more than any other language (although the words taken from English could be considerably changed in form), and some other mixed ALs were based on Volapük, so the lists below might have many items derived from English (although perhaps difficult to see as such). Algilez also draws its vocabulary in large part from English (although again substantial changes have been made to the words).98
The Perio word for lake certainly appears to be a posteriori, but it forms part of an a priori-type set with liko ‘island’. The word for ‘sea’ is also part of such a set, the other members of it being chulo ‘land’ and chilo ‘sky’.129 The word for ‘gulf’ belongs to an a priori set, the other member of which is kabo ‘cape’.
The Spelin word for ‘ocean’, sian, is similar to its equivalent in Volapük, which is not surprising, as Spelin was more or less of an attempt to improve Volapük, although on the surface it appears quite different.
The Tal word rivo is glossed as ‘rivière’ in Couturat & Leau (1907: 15), so it may not correspond exactly to English ‘river’.
The Vela vocabulary shown appears to be a mixture of a posteriori (e.g., lago) and a priori roots. One might note that the word for ‘waterfall’ does not contain the root meaning ‘water’, which is vuvo.
As we have seen, some ALs simplify their lexicon by having only one term for what would be separate words in English or other natural languages; others use derivational affixes to create some words in this semantic field. There are very few, if any, ALs which have underived equivalents of each of the English words at the top of the tables. However, what might be interesting is which terms serve as the base for other terms, and which terms have meanings that correspond to those of more than one English word.
To take a relatively simple example, consider the case of words meaning ‘bay’ or ‘gulf’; for many speakers of English the difference between bay and gulf may be a question of size. There are several possibilities for ALs which have words for both these meanings: 1) the same word can be used for both meanings, and it is the only word (that we know of) for each meaning; 2) there is a set of words which can be used for both meanings, and there is no word which means ‘bay’ that does not also mean ‘gulf’, and there is no word which means ‘gulf’ that does not also mean ‘bay’; 3) there is one underived word for each meaning, and they are they only words (that we know of) for each meaning; 4) there is one underived word for ‘bay’, but several underived words for ‘gulf’, none of which can also mean ‘bay’ 5) there is one underived word for ‘gulf’, but several underived words for ‘bay’, none of which can also mean ‘gulf’; 6) there are several underived words for each meaning, none of which can have the other meaning; 7) there is a single underived word for ‘bay’, and it can also mean ‘gulf’, along with a word which can only mean ‘gulf’; 8) there is a single underived word for ‘gulf’, and it can also mean ‘bay’, along with a word which can only mean ‘bay’; 9) there is a single underived word for ‘bay’, and the only word for ‘gulf’ is derived from it; 10) there is a single underived word for ‘gulf’, and the only word for ‘bay’ is derived from it. (We do not mention some of the more complex possibilities; as we have seen, Volapük forms one of its words for ‘gulf’ by compounding.) The difference between 9) and 10) seems to be linked to which of the two terms (or their meanings) is seen as marked in some sense.
These possibilities are shown in Table 7 (where x ≠ y ≠ z, etc., and where x, y, etc. are all underived words). The situation is more complicated if different authorities disagree on this question, as is the case with Esperanto (see note 10).
However, we can combine some of these possibilities and still retain what we believe to be the most important information, namely whether one or the other of these terms is more basic or unmarked, whether a language makes a lexical distinction between the two meanings, and whether there is any overlap (i.e., whether one term can be used with both meanings). We would conflate 1) and 2) on the one hand, and 3)-6) on the other hand, which would yield Table 8, in which we have placed the languages shown in previous tables. (The difference between e.g., 1) and 2) is not without interest, as it may indicate the extent to which an AL allows synonymy, but we will not deal with that issue here.) Table 8 shows the languages appearing in earlier charts classified along these lines.130
With 1)-2) there is a complete merging of the meanings, while with 3)-6) there is a complete separating of the meanings; the other lines in the table represent some degree of overlap. (Of course there might never be complete separation of meanings, since they are so close and there is a continuum (of size, assuming that the only difference between a bay and a gulf is size), and particular speakers (if there were any) might use a word for ‘gulf’ to describe a body of water for which another speaker might use ‘bay’.)
Some ALs may have a complete separation between ‘bay’ and ‘gulf’ because their source language(s) has/have such a separation; for example, to our knowledge there is no English word such as *gulfet meaning ‘bay’, nor is there an English word for ‘gulf’ consisting of bay and an augmentative suffix. (This also assumes that the only, or at least the main, difference between a bay and a gulf is their size, which may not be the case for all speakers.)
We see from Table 8 that all but one of the possibilities occur in at least one AL, the missing one being the situation where a word for ‘bay’ can also be used to mean ‘gulf’, but where there is also a word which only means ‘gulf’. We cannot claim any statistical validity with our surveys, since, for one thing, some ALs have influenced other ALs, so they are not all totally independent of one another. (Also AL designers may well have been influenced by the natural languages from which they took words.) However, the fact that one AL may have borrowed a property from another AL may mean that the property in question did not seem problematic to the designer(s) of the borrowing AL. For example, the idea of using a diminutive suffix with the root meaning ‘gulf’ to form the word for ‘bay’ was borrowed from Esperanto into Ido; given that the designers of Ido did not accept all the properties of Esperanto (if they had, then there would not have been a language Ido), this particular property of Esperanto may well have seemed reasonable to them (unless they did not notice or pay attention to such a small detail).
In any case, Table 8 shows that, at least among ALs it is not uncommon for ‘bay’ and ‘gulf’ to (apparently) completely merge (possibilities 1)-2)), nor is it rare for them to keep completely separate. The former fact means that it is acceptable in the eyes of some AL designers to have no lexical means of distinguishing between the meanings ‘bay’ and ‘gulf’; one would need to use phrases such as “a large bay/gulf”.132 The fact that there are no languages exemplifying 7), but two languages exemplifying 8) may be an indication that ‘gulf’ is more dominant than ‘bay’, or less marked than it. The fact that there are equal numbers for 9) and 10), on the other hand, may be thought to argue against such a conclusion, but recall that Esperanto also (basically) follows this pattern (as might some Esperantid languages which have not treated), and given such small numbers of languages, we would not venture to base any arguments or counter-arguments on this fact.133 Overall, based on all the data in Table 8 we would not make any claims about the dominance or markedness of either member of the ‘bay’ and ‘gulf’ pair, nor about any tendency among ALs to merge or separate these meanings.
There are some interesting questions about language design which come up here. Does an AL (really) need (lexically distinct) words for both ‘bay’ and ‘gulf’? What are the benefits (if any) of having distinct words for both, and what are the drawbacks (if any)? Such questions become more important or obvious when we look at words for ‘river’ and for smaller river-like bodies of water, as they make up a larger set of words, at least in some languages. We will now turn to such words.
The possibilities for relations will be more complicated, as there are four meanings involved. Table 9 shows some of these possibilities, using the same sort of simplification involved in deriving Table 8 from Table 7 (and with darker lines separating possibilities with different numbers of (sets of) words).
The group of languages involved is smaller, as there are few ALs in which we have found equivalents of all four English words listed,134 and most of the possibilities are not instantiated. Pattern 2) is the most common, which is not surprising ― it instantiates the opposition ‘river’: ‘river-like body but smaller than a river’, which may seem like the most intuitively natural distinction among the two-way distinctions.
Table 9 does not include the possibility of derivation through affixation, and again the range of possibilities would be large. If the most common split is between ‘river’ and smaller, but similar bodies of water, it might be useful to look at this binary distinction with regard to affixation; and we could also see a wider range of languages. Table 10 unifies ‘brook’, ‘creek’, and ‘stream’ and includes ALs which have words for at least one of these (i.e., we are not excluding languages which lack one or more of these, as we did in Table 9).
Again Esperanto is more complicated because x-aff (rivereto) means ‘brook’ or ‘creek’, while either x (rivero) or x-aff can be used for ‘stream’. A somewhat similar situation holds in Sasxsek, in that one can express the meaning ‘creek’ with either rivo or rivo-aff, namely rivimo. We see the same general type of situation in Algilez. American is also more complicated, as all of its words for ‘brook’ or ‘creek’ appear to be derived, but the word from which one of them, fluitu, would have been derived, does not appear in O’Connor (1917) to our knowledge.
Another language that does not fit into this table is Glosa; for one thing, two of its words for ‘brook’/‘stream’/‘creek’ seem to be compounds. Novial shows a different pattern, with underived words meaning ‘river’ and ‘stream’, and a word derived from one of them meaning ‘brook’, i.e., the opposition is not ‘brook’/‘creek’/‘stream’ vs. ‘river’, but ‘brook’ vs. ‘stream’/‘river’.
In any case, ALs which follow the simple patterns of Table 10 or a more complex pattern all make use of affixation to derive words meaning ‘brook’/‘creek’/‘stream’ from words meaning ‘river’; we have seen no AL in which a word for ‘river’ is derived from a word for ‘brook’/‘creek’/‘stream’. Note that this is true even of Afrihili, which, unlike most auxiliary ALs, is not based on European languages. This indicates that ‘river’ is an unmarked meaning with respect to ‘brook’/‘creek’/‘stream’.
For our last sample analysis, we return to standing bodies of water, and look at terms for different sizes of them, although the bodies referred to can differ in more than just size (e.g., oceans consist of salt water, lakes usually do not). Table 11 shows relations in languages that have words for all four meanings ‘Pond’, ‘Lake’, ‘Sea’, and ‘Ocean’. (Here darker lines separate possibilities with different numbers of (sets of) underived words and possibilities involved affixation from those not involving it.)
Here, unlike the ‘brook’, etc. situation, it is not uncommon for there to be different underived terms for each of the meanings; indeed it is the most common pattern. In other words, the differences among ‘pond’, ‘lake’, ‘sea’, and ‘ocean’ are more important and salient than those among ‘brook’, ‘creek’, ‘stream’, and ‘river’. This may be partly due to the just mentioned fact that size is not the only factor differentiating them.
Although not all languages show up in Table 11 (as those which are missing words for one or more of the terms are not there140), it seems rare for languages to combine two or more of these meanings (as shown by the empty cells in the rightmost column in the table), and no AL that we know of has a single term for all for meanings. The Neo Patwa word moana and the Pandunia word daria combine three of them, ‘lake’, ‘sea’, and ‘ocean’ (but a term for ‘pond’ is missing in each language). No language in our sample combines ‘pond’, ‘lake’, and ‘sea’. In addition to the languages in Table 11 which combine ‘sea’ and ‘ocean’, Romanova, UNI, and Voldu have one word for these two meanings. It may be rare for an AL to combine only ‘pond’ and ‘lake’, which would be somewhat surprising, since both generally contain fresh water (recall Sulky’s (n.d.) gloss of the Konya word leki, ‘body of fresh water; lake; pond’), and no AL in our sample combines only ‘lake’ and ‘sea’.
Many other observations could be made. For example, in terms of the forms of words and their sources, it is interesting that even in some ALs that take their lexicon partly from non-European languages there is a word for ‘sea’ which is of Western European origin: Ardano’s mar (cf. etale ‘lake’) and Unish’s mer (although Unish words with similar meanings in Unish also come from Western European languages). We would need far more data (from languages which are not so reliant on European languages for their vocabulary) to be able to say anything definitive, but one might wonder whether certain roots are so prominent (and “international”) that they are often chosen even for languages which strive for equality and a non-European bias in their vocabulary. This particular statement about words for ‘sea’ would probably turn out to be incorrect with respect to other such languages, but it would be interesting to attempt to determine the “strength” of some roots among artificial languages.
We have looked at one semantic domain (bodies of water) in considerable detail and have presented a partial analysis of the data obtained. We believe that such fine-grained analyses can reveal something about the nature of ALs (and perhaps also of natural languages, if comparisons are made), particularly if considered together with analyses of other semantic domains (e.g., how many ALs have distinct words for ‘house’ and ‘mansion’, or for ‘flute’ and ‘piccolo’), and we hope to carry out such analyses in the future. From the present analysis alone, however, some tenative conclusions can be drawn.
For one thing, few ALs have the lexical complexity of at least some natural languages. For example, as we have seen, only a small number of the ALs examined lexically distinguish among ‘brook’, ‘creek’, ‘stream’, and ‘river’ (and this is not even considering other terms which occur in English such as rivulet and rill or the distinction that French makes between fleuve and rivière). In some cases this may be because having fewer (underived) words is seen as a virtue (as it supposedly makes for a simpler language, with less memorization required), in other cases the AL simply has not been developed to the point where the possible need for some such words has been considered, or acted upon, and in still others, it may be due to the fact that there is a set of derivational affixes which can regularly be employed to create words with less basic meanings, should the need arise.
Also, there may be some tendencies in derivation, e.g., that terms for ‘brook’ are often derived from terms for ‘river’, but never the reverse, and there may be some tendencies regarding what meanings are represented by the same word, e.g., that ‘bay’ and ‘gulf’ might more often have a single equivalent in an AL than do ‘pond’ and ‘lake’ (although data from more ALs would be required to determine this). Here, as elsewhere, it would be interesting to compare natural languages to ALs; we suspect that the same tendencies will be found, but confirmation of that awaits further research.