What causes relative frequency of consonants?

So, can you point me to some research, what causes the relative frequency of consonants in various languages?
The fact that vowels are more common than consonants is obviously caused by phonotactics, but I don't see a simple explanation for the fact that some consonants appear to be way more frequent than others. For much of my research, I simply assumed that most of it is caused by syntax, but, evidently, syntax plays only a minor role. As I've explained on this web-page, it's relatively easy to measure the effect syntax has on relative frequency of consonants.

To summarize the relevant part of the web-page, I made a simple computer program in C (source code is available on the web-page) that randomly picks two consonants from a text-file a million times, and counts how many times the two consonants happened to be the same. If you run it on a long English text, it will print that the probability of choosing the same consonant two times in a row is 1/11, and that the most common consonant is t (presumably because of the words like the and that). However, if you run it on an English word-list for a spell-checker, it will print that that probability drops to 1/13, and that the most common consonant is r (probably because of the common English prefix re- and the common English suffix -er). Similarly, if you run it on a long Croatian text, it will print that the probability of choosing two same consonants in a row is 1/13, and, if you run it on a Croatian word-list, the probability will be 1/14 (in both cases, the most common consonant will be n, probably because ne- and na- are very common prefixes forming Croatian words). And, if you run it on a long German text, it will print that the probability of choosing the same consonant two times in a row is 1/12, and that the probability of that happening in a spell-checker word-list is 1/15. In both cases, the most common consonant is n, and I can't really guess why.

So, as you can see from the above data, while syntax indeed plays some role in the relative frequency of consonants, that's not all there is to it. To what extent is the rest of the effect caused by phonology, and to what extent is it caused by morphology?

asked 18 hours ago

FlatAssembler

895

1

For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.

– sumelic
18 hours ago

1

Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?

– sumelic
18 hours ago

I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.

– FlatAssembler
18 hours ago

1

Oh, I see, it's the second-to-last paragraph.

– sumelic
18 hours ago

1

Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...

– Hagen von Eitzen
11 hours ago

|
show 1 more comment

asked 18 hours ago

FlatAssembler

895

1

For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.

– sumelic
18 hours ago

1

Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?

– sumelic
18 hours ago

I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.

– FlatAssembler
18 hours ago

1

Oh, I see, it's the second-to-last paragraph.

– sumelic
18 hours ago

1

Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...

– Hagen von Eitzen
11 hours ago

|
show 1 more comment

asked 18 hours ago

FlatAssembler

895

computational-linguistics linguistic-typology

asked 18 hours ago

FlatAssembler

895

asked 18 hours ago

FlatAssembler

895

asked 18 hours ago

FlatAssembler

895

asked 18 hours ago

FlatAssembler

895

asked 18 hours ago

FlatAssembler

895

1

For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.

– sumelic
18 hours ago

1

Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?

– sumelic
18 hours ago

I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.

– FlatAssembler
18 hours ago

1

Oh, I see, it's the second-to-last paragraph.

– sumelic
18 hours ago

1

Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...

– Hagen von Eitzen
11 hours ago

|
show 1 more comment

1

For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.

– sumelic
18 hours ago

1

Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?

– sumelic
18 hours ago

I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.

– FlatAssembler
18 hours ago

1

Oh, I see, it's the second-to-last paragraph.

– sumelic
18 hours ago

1

Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...

– Hagen von Eitzen
11 hours ago

For consonants, English spelling is different enough from English phonology that I think you won't get very accurate results by looking at letter frequencies. The word "the" doesn't contain the consonant sound /t/, but rather the consonant sound /ð/.

– sumelic
18 hours ago

Unfortunately, I don't see anything about syntax on the page that you linked to, or on other posts by you that I looked at. Can you clarify what you mean by "it's relatively easy to measure the effect syntax has on relative frequency of consonants"?

– sumelic
18 hours ago

I mean, it can be measured by comparing the relative frequencies of consonants in texts versus in word-lists. I thought I was clear enough.

– FlatAssembler
18 hours ago

Oh, I see, it's the second-to-last paragraph.

– sumelic
18 hours ago

Even completely ignoring possibly biological (different simplicity in speaking or hearing different sounds) or linguistic (e.g., historical development) effects, we would probably expect something close to a Mandelbrot distribution (i.e., the nth most common consonant occurring with roughly 1/n the frequency of the most common, or: rank times frequency roughly constant) ...

– Hagen von Eitzen
11 hours ago

|
show 1 more comment

2 Answers
2

active

oldest

votes

Frequency of a thing can be in terms of all languages or a single language; it can be in terms of yes/no existence or in terms of actual use; if the latter it has to be relative to some defined corpus. As an example, [ʕ] is a zero-frequency consonant in English, and a low-frequency consonant in human language. I won't venture a guess about its frequency in Arabic, but it is not the least-frequent consonant of Arabic (Classical, at least). [t] on the other hard is very high frequency in and across languages. There is a vague concept out there of "markedness" that is invoked to encapsulate differing frequency of attestations, whereby it is said that [ʕ] is "marked" relative to [t].

Two factors that have the greatest influence on frequency of attestation are (a) intrinsic phonetic properties and (b) historical precedent. Ejectives are extremely low frequency in Indo-European languages because the proto-language lacked ejectives (I ignore the claim to the contrary), and [p] is low frequency in modern Arabic dialects because Classical Arabic didn't have [p]. However, English has [f] while PIE did not, so languages do develop new sounts.

Factor (a), intrinsic properties, is hard to explain or even establish satisfactorily. A popular idea applied to crosslinguistically low-frequency consonants is that they are "hard to pronounce"; the problem is that this can't be directly measured in an objective way, and seems to reflect the struggle that people have when trying to pronounce a sound that is not in their own language (it's hard). Phonetically based dispreference is the result of aerodynamic, acoustic and articulatory factors. However, it is also hard to separate (a) from (b), that is, I don't find [ʕ] hard in any sense, but it is not part of my native language and I often elide the consonant when pronouncing words of Arabic (names) in an English discourse. It's possible that through massive social change that Arabic could influence English and we would nativize some words containing [ʕ], thus that consonant could be properly a part of the English consonant inventory where it was not one historically. This happened in the case of some Bantu languages of Southern Africa, which adopted click sounds from neighboring Khoisan languages, thus increasing the attestation frequency of clicks.

A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].

The question of possible influence of syntax, phonology or morphology on consonant frequency depends on what frequency you are speaking of. W.r.t. crosslinguistic frequency, the effect is zero. Token frequency within a language can be influenced by syntax, phonology or morphology, and yes/no frequency can be influenced by phonology (people often say that the lack of [ʕ] in English is a fact encoded in the phonological grammar of English). There is no general way to know in advance what the influence of syntax, phonology or morphology is on token frequency, because you don't know if a language is going to have rules deleting g in some context, or turning /k/ into [g] in some context, either of which will influence token frequency. Syntax and morphology can influence token frequency in case e.g. /k/ is figures in widely-used affixes, but again not every language has a ubiquitous affix /s/ or /d/ which increases the token frequency of these sounds in English. Post hoc, you can compute the percentage of tokens that are attributable to some affix or syntagmeme, but there's no predictive power apart from general predictions about samples from a different corpus.

edited 13 hours ago

answered 17 hours ago

user6726

36.1k12471

I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.

– vectory
10 hours ago

Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.

– user6726
9 hours ago

Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.

– vectory
9 hours ago

add a comment |

Following your argument I guess, the frequency in German is pushed up by indefinite articles "ein-", case inflections and regular verb inflections on "-(e)n", and the plural marker "-en". The common mnemonic for most frequent letters in German is "ERNSTL", famously in the wheel of fortune game shows, ordered for ease of pronunciation of the mnemonic. One should wonder why words that are favored by the syntax contain those consonants.

Looking at the Wikipedia article for liquid consonants, we see claimed that they are very frequent. Down the page we see that the term originally described "the sonorant consonants (/l, r, m, n/) of classical Greek"--three of those matching our "ERNSTL". Mind that, while German "R" is nominally a trill, it is often produced as a mere approximant, or plainly elided. There's a lot to say about that, and about relations between "d" and "n" ("d" is just a non-nasalized "n"), that escape me at the moment.

We discriminate mainly two points of articulation, front and back, respectively the tip of the tongue and whatever your local accent prefers (uvular for me, rhotic for many Americans). "ng" is a velar nasal on the other hand, so somewhere in the middle, but we still hear most of it like a dental or alveolar; Those in turn are even represented with the same IPA sign; Many other IPA symbols are reminiscent of n, too.

The heart of the matter that I'm getting at is that those consonants are prefered, where the least effort is expanded in speech. Similarly, written speech optimizes for ease of writing and represents several sounds with the same symbol. Only if trying to be precise--talking clearly, or writing phonetically--will the difference be highlighted. However, we nevertheless hear, or see the difference if we expect it, even if it's hardly even there or only hinted at by context.

I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.

Note that [m], another nasal, is one of the earliest learned sounds of a child (and one researcher figured that was helped by the most basic lip action a baby gets, sucking on the mamaries). Note variants like "nana", "anna", etc. Whereas [p] is learned rather late. This alone implies levels of difficulty.

answered 9 hours ago

vectory

40512

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "312"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f31164%2fwhat-causes-relative-frequency-of-consonants%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].

edited 13 hours ago

answered 17 hours ago

user6726

36.1k12471

I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.

– vectory
10 hours ago

Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.

– user6726
9 hours ago

Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.

– vectory
9 hours ago

add a comment |

A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].

edited 13 hours ago

answered 17 hours ago

user6726

36.1k12471

I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.

– vectory
10 hours ago

Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.

– user6726
9 hours ago

Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.

– vectory
9 hours ago

add a comment |

A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].

edited 13 hours ago

answered 17 hours ago

user6726

36.1k12471

A final consideration for you is that spelling and pronunciation are different, so that the letter t is both [t] and a spelling component of [θ].

edited 13 hours ago

answered 17 hours ago

user6726

36.1k12471

edited 13 hours ago

answered 17 hours ago

user6726

36.1k12471

answered 17 hours ago

user6726

36.1k12471

answered 17 hours ago

user6726

36.1k12471

I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.

– vectory
10 hours ago

Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.

– user6726
9 hours ago

Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.

– vectory
9 hours ago

add a comment |

I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.

– vectory
10 hours ago

Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.

– user6726
9 hours ago

Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.

– vectory
9 hours ago

I'm not quite clear on ejectives. Doesn't e.g. "appointment" eventually have an ejective p' as much as the Anlaut starts with a glottal stop? Any unvoiced p'losive has to close the vocal tract at some point, too, so it's partially ejective. You might say "... did/does not recognize" at any rate.

– vectory
10 hours ago

Ejective doesn't just mean "close the vocal tract", it refers to oral and glottal complete closure and raising the laryx – which we don't do in English. In Navaho, Amharic, Sotho, Salishan, sure, not in English.

– user6726
9 hours ago

Of course you have do, even if it's pulmonic, because if the larynx is closed, pressure from the lungs will push up the larynx. It's not obligatory to close the larynx between a and p in "appointment", and maybe I'm just imagining now that I would, but it stands to reason on grounds of efficiency, because the palate and larynx move together, reflexively, if the palate has to close the nasal air stream to save breath; otherwise p would be nasalized.

– vectory
9 hours ago

add a comment |

I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.

answered 9 hours ago

vectory

40512

add a comment |

I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.

answered 9 hours ago

vectory

40512

add a comment |

I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.

answered 9 hours ago

vectory

40512

I'm not sure what that implies for the development of a language. It obviously has not converged to just two different consonants.

answered 9 hours ago

vectory

40512

answered 9 hours ago

vectory

40512

answered 9 hours ago

vectory

40512

answered 9 hours ago

vectory

40512

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Linguistics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Jtdcftul