Fifty Shades of Wéi (喂): Pronunciation

"Not that there's anything wǎng (往) with that…."

There's no denying it: Chinese is a language full of homophones. And this profusion of words that sound alike but have different meanings can be confusing. But fear not! In the previous post in this series, I offered some reassurance: Mandarin grammar is easy. In that same spirit of optimism and oversimplification, I will now explain why the daunting abundance of homophones is a price well worth paying given what it buys: a simple system of pronuncation.

My main goal is to explain Mandarin pronunciation informally, so I will avoid linguistic terminology and fine distinctions. Words such as "alveolar", "plosive", "labio-dental", and "velar" occur only in this sentence, so you're past them now. (ht2mp) My subsidiary goal is to harvest corrections, so bring 'em on!

There have been many systems for transcribing Chinese sounds into languages that use the Latin alphabet, but there's no question that the dominant, standard system today is Pinyin. Googling "pinyin chart" in your preferred search engine will yield many examples of the conventional Pinyin table, which is a 2-dimensional grid of syllables. My favorite software for associating these syllables with sounds is the downloadable Pinyin Chart from

For pedagogical reasons, I have rearranged the Pinyin table and annotated it. Here's my cheat sheet as a PDF. And here it is as a JPG:

Pinyin Chart Rearranged

I'll refer to it a few times below.

The Good News: syllables!

Here's why learning to pronounce Mandarin is feasible rather than crippling: there is a very small number of syllables to learn, and each is always pronounced in exactly the same way.

1. Ingredients of a syllable: Every syllable in Mandarin consists of three factors: an initial sound, a final sound, and a tone.

2. Initials: There are 22 initial sounds: the null sound (i.e., the lack of an initial sound) and 21 consonants:

b, p, m, f, d, t, n, l
z, c, s, zh, ch, sh, r
g, k, h, j, q, x

The boldfaced, underlined consonants pose a challenge for the native speaker of English, but later I'll explain a way to think about them that I find useful. The rest are pronounced pretty much the way you'd expect. (The "h" is breathy and wet, like the soft "ch" of "ich" in German, for example, but that's precisely the sort of nuance I'm going to ignore for the most part in this post.)

3. Finals: So every syllable either starts with no initial or with one of those 21 consonants. And every syllable continues with one of 35 vowel finals. A final is a vowel sound which may or may not end in a nasal (n or ng):

a, ai, ao (an, ang)
o, ou (ong)
e, ei (en, eng)
u, ua, uai, uo, ui (uan, uang, un, ueng)
i, ia, iao, ie, iu (ian, iang, in, ing, iong)
ü, üe (üan, ün)

Here, too, the boldfaced or underlined items pose a challenge. But knowing that the boldfaced ones are actually contractions helps to make sense of how they're pronounced. More on this later.

In addition to those 35, there's the pirate-"er" sound. Arrrrr!  (Actually, it's like "err" as in "to err is human" when "err" is pronounced like "Burr". Actually, it's somewhere in between!)

4. Do The Math: So then, we have 22 consonants (including null) and 36 finals (including 16 nasals and pirate-"er"). Think about what this means: there are only 22 x 36 = 712 possible syllables! And the great news is that not every logically possible syllable occurs in the language. For example, you won't see f+ao or r+uang or p+uo or zh+ie. They don't happen.

Of the 712 sounds in the Cartesian product, how many do actually occur in the language? Only 404. Four Hundred Four! So if you can learn the correct way to pronounce these 404 syllables, then you can utter any Mandarin there is. (By including some interjectional grunts, this guy (Google-cached here) who knows more than I do comes up with 413. Others count 407 or 409 or 412. Whatever.) By way of comparison, English has 26 letters, 5 (or 6 or 7) of them vowels, but the rules of combination allow for as many as 10,000 distinct syllables. Estimates of how many actually occur range from around 2500 to 4000+, an order of magnitude more than in Chinese. And English diphthongs are faithless, and the consonants untrue.

The Somewhat Less Good News: tones

404 syllables? A cinch! However, there's a hitch: A speaker– especially one whose language has evolved for thousands of years– cannot say everything we humans need to say if he has at his disposal only 404 syllables. The language deals with this in three ways: multi-syllable words, tonality and homophony.

5. Tones: In addition to having an optional initial and a mandatory final, each syllable in Mandarin (by which I mean the 404 that actually occur) may be pronounced in one of four tones or in a neutral way (sometimes called the "fifth tone" though it's actually a wimpy schwa-like afterthought of a tone). Each syllable may have a different meaning, or multiple meanings, in each tone. Tone changes meaning.

For example, wang1 (wāng) means "to ooze" (汪), wang2 (wáng) may mean "king" (王) or "to die" (亡), wang3 (wǎng) may mean "past" (往) or "network" (网) or "spoked rim" (辋), and wang4 (wàng) may mean "to peer into the distance" (望) or "to forget" (忘) or "presumptuous" (妄) or "flourishing" (旺).

Note that the meanings for a syllable from one tone to another may have no etymological connection. This system isn't like the consonantal triad in a semitic language, which serves as the skeleton of a root and helps to delimit the semantic reach of its words. In Mandarin, the meanings are sometimes related, as with wéi (喂, "hello" when answering the phone) and wèi (喂, "hey!" as in "Hey, you! What are you doing with that post-hole digger?"), but usually they are not.

6. Re-reckon them figgers: The neutral pronunciation occurs for very few syllables, so let's bracket that one out. That leaves four tones across 404 syllables for a grand total of 1616! But the good news here is that not every logically possible syllable+tone combination occurs in the language. In fact, only about 215 syllables occur in all four tones. By one count, which includes the neutrals, there are only 1396 syllables in all:

syllables occurring in 1st tone: 349
syllables occurring in 2nd tone: 282
syllables occurring in 3rd tone: 350
syllables occurring in 4th tone: 375
syllables occurring in neutral tone: 40
Total: 1396

Can you learn 404 words? Can you learn the 4 tones? Then you can pronounce more than all the syllables in Mandarin! (Count your blessings. Cantonese has 7 tones….)

7. You already know the tones: Yes, you do:

The first is constant and high-pitched (not absolutely, but in relation to your natural voice): "One is the loneliest number…." or "Shall we dance…."

The second is the sound we native English speakers make at the end of a question: "You want to go where?"

The third descends to a low tone (often with some croaky vocal fry) and then rises a bit, like the sound of skeptical or sarcastic astonishment ("Really?!") or coaxing (to a puppy on a tightrope: "Good… Come get the treat….").

And the fourth descends quickly and sharply: "Stop!"

These sounds (level, interrogative, coaxing/doubting, assertive) occur in the wild in English. Youtube offers many videos that explain theses sounds in detail (here are some: 1, 2, 3, 4), so I'll leave it at that.

One fact is worth emphasizing here: in everyday speech, native speakers of Mandarin tend to munge these tones in certain ways. As a result, there's often a discrepancy between what we're told a phrase should sound like and how it actually sounds when uttered. This happens because the standard descriptions of the four tones tell how they sound when a word is spoken in isolation. But words (other than, say, digits in a phone number) are seldom uttered that way.

8. Tone Sandhi: Look up "tone sandhi" for the nuances of how tones change. I'll just mention one fact of life that makes understanding the spoken language much easier: in practice, the third tone (the one that dips and then rises) hardly ever actually dips and rises; instead, it sounds like a low flat tone, a counterpart to the high first tone. If you hit that low beat (a maneuver called the "half 3rd tone") in your pronunciation during phrases, related tone sandhi will take care of themselves. Only when saying a third-tone syllable in isolation should you exaggerate the dip and rise.

What's the upshot? When learning Mandarin, it's better to exaggerate the tones than to neglect them. And it'll be a heck of a lot easier to execute the 3rd tone without sounding like a clown if you treat it as a drop to a low, flat sound in ordinary speech!

The Really Rather Distressing News: disyllables and homophones

Even with 1396 possible syllables, thanks to the 4 tones and the wimpy 5th tone, it just ain't possible to say every little thing that needs sayin', unless you make words out of multiple syllables. So most Chinese words have two syllables (and only 44% of the 1396 monosyllables can stand alone as words).

To expand the semantic horizon still further, Chinese has a high tolerance for homophones.

Classical Chinese was even more notorious in this regard, and the great, playful 20th-century Chinese-American linguist Zhào Yuánrèn demonstrated this tendency in his well-known-for-this-reason poem The Lion-Eating Poet in the Stone Den.  Would it pique your interest to peek at the peak of Chinese homophony? If I raise this issue, will rays of rebuttal raze it to the ground? If I write right, like a wright, and not as a mere rite, will it instill a whit of wit? Shut up? OK, here's the poem:

In a stone den was a poet called Shi, who was a lion addict, and had resolved to eat ten lions.
He often went to the market to look for lions.
At ten o'clock, ten lions had just arrived at the market.
At that time, Shi had just arrived at the market.
He saw those ten lions, and using his trusty arrows, caused the ten lions to die.
He brought the corpses of the ten lions to the stone den.
The stone den was damp. He asked his servants to wipe it.
After the stone den was wiped, he tried to eat those ten lions.
When he ate, he realized that these ten lions were in fact ten stone lion corpses.
Try to explain this matter.

And here's the gimmick of the poem: every single word in it is pronounced "shi" (which sounds vaguely like "sure") in one or another of the tones! Behold as this brave girl reads it aloud:

Are you shí you want to learn this language? Fear not. For the most part, context will clarify which meaning of an utterance is intended. And if context fails, then there's always writing: in general, each meaning has its own character. But that's another story….

Some tips on pronunciation from a novice

Here's the scoop on the 21 initials and the 35 finals. Please note that proper pronunciation of Mandarin will only come with much practice and frequent exposure to native speech. This is a quick and dirty guide from the perspective of a beginner who speaks native English and whose accent is mostly southern Californian.

Easy Initials:

The initials b, p, m, f, d, t, n, l, s, g, k, and h sound like their English counterparts. The "h" is more emphatic and spitty, and there are nuances about "g" and so forth, but with these you're already in the ballpark.

Difficult Initials:

The initials z, c, zh, ch, sh, r, j, q,  and x require some practice. The 'z' sounds like "dz", the sound at the end of the word "suds". The "c" is always a "ts" sound like the end of the word "nuts".

The "zh", "ch", "sh", and "r" are perhaps the most difficult for a native English speaker. One reason I rearranged the Pinyin chart was to emphasize them as a group. They all require the tongue to be held in "retroflex" position, which basically means curled back so that the its tip touches (or nearly touches) the hard palate just in front of the place where the soft palate begins– at the top center of the dome, so to speak. With the tongue curled back and the teeth together, make the following sounds:

a "j" sound as in "jump" for "zh"
a "ch" as in "chump" for "ch"
a "sh" as in "shop" for "sh"
a "jh" as in "Jacques" for "r"

It feels odd to do this with a retroflexed tongue, but the way the tongue fills the mouth creates a sound different from what one hears when the tongue is lying limp behind the bottom teeth or touching right behind the top ones.

The "r" deserves special attention. Except in the pirate syllable "er", this letter sounds more like a French "j", with one exception: instead of being produced between the upper and lower teeth, it's produced between the curled-back tongue and the middle or rear of the hard palate. Experiment!

The last tricky initials, "j", "q",  and "x", are also called out on my Pinyin chart remix.  These letters represent the following sounds:

a "dy" similar to the English "j" as in "jeep" = "dyeep"
a "ty" similar to the English "ch" as in "cheap" = "tyeap"
a "sy" similar to the English "sh" as in "sheep" = "syeep"

The trick to saying "sy" and getting a sound like "sh" (and so forth) is that all three of these are pronounced by using the middle of the tongue, not its tip, to touch the roof of the mouth behind the front teeth. Called "the blade" of the tongue, this meaty middle (not too far back from the tip!) softens the sounds in a distinctive way. Bending the middle of the tongue up to hit the roof typically requires anchoring the tip of the tongue behind the bottom teeth.

As my rearranged Pinyin chart makes clear, the zh, ch, and sh are alternatives to the j, q, and x; the former go with finals in a, o, e, or u, while the latter go with finals in i or ü. Remembering that these two sets of difficult initials only overlap in one place (the "i" all alone) makes it a lot easier to use the right one at the right time! As a bonus, the "difficult" initials make relatively few syllables, as the chart also shows.



As for the finals, most are straightforward. "a" is like "pa"; "ai" is like "pie"; "ao" is like "pow". The sound of "o" is tricky; it's like a contraction of "uo" or "wo" in which the first sound is quite muted and the second short. But it's neither the short "o" of the English word "fox" nor the longer "o" of "folks"; it's between them. It sounds a bit like "aw" used to express disappointment: "We're not going to the Festivus party? Awww…." Learn it by ear.


The final "ong", like "iong", does not rhyme with the English pronunciation of "Hong Kong"; instead, it's more like the Anglicized pronunciation of "Jung". The vowel sounds like the double-o in "book" and like the French "e" in "le": -ong, -iong. These finals are actually contractions of "ueng" and "üeng", and the vowel sound is coming from that "e" because the Chinese "e" is very much like the French one; it's the sound from "book" and "good" and "hood" (if you're from southern California!).


The "ei" sounds like "day", but "en" and "eng" and "ueng" sound like "dun" and "dung" and "wung". Knowing that the Chinese "e", like the French "e", sounds like the vowel in "good" makes it easier to grasp why "er" sounds like a pirate's interjection. Even so, it sounds more like "arrr" than "errr".


The "u" sounds like "spook" or "Fool of a Took". The "ua" sounds like "wa", the "uai" like "why", and the "uo" like "whoa".  Potentially confusing is the sound "ui", often butchered on HGTV in poor attempts to pronounce "feng shui". It's a contraction of "uei", so it actually sounds like "way". (So feng shui sounds like the pseudo-English "fung shway" and its vowels rhyme, assonantly, with "good day").

"uan" sounds like "Juan" as in "San Juan" which is the capital of my favorite board game, and "uang" sounds like the English "Wong" in The World of Suzy Wong. Get off my lawn.

"un" is potentially confusing because it's a contraction of "uen" and the diphthong sounds like a syllable and a half– the "oo" sound from "spook" and the French "e". As a result, "dun" sounds a little bit like "do one". Which do you prefer? I'll take the blueone. Who gets these here? These're for you'uns.


The Chinese "i" is usually like a short European "i": the "ee" sound as in the English "fee". But after the difficult initials z, c, s,  zh, ch, sh, and r, the "i" is hardly pronounced; it's just a neutral schwa sound emitted through clenched teeth. Usually, these initials followed by "i" are voiced and make a notable "buzz". Youtube it.

"iu" is potentially misleading; it's a contraction of "iou" and rhymes with "joe".

After an "i" or "y", the Chinese "e" sounds like the "e" in the English word "bed"; simple, short, and not at all French. So "ia" sounds like "ja!" and "iao" like "yow!", but "ie" rhymes with "meh". The "a" in "ian" (and in üan below) also sounds like the "e" in "bed".


Finally, the "ü" is just like the French "u" or the German "ü". But on the Pinyin table, the diacritical mark (a diaresis or umlaut) is omitted from the finals when the initial is "y", "j", "q", or "x" — but not when it's "n" or "l". I'm sure the Pinyin Gurus had a great reason for introducing this confusion, and for all the contractions. On my remixed chart, I've highlighted this omission in yellow wherever it occurs. (I've also noted the contractions at the top of the chart, and I've marked the counterintuitive sounds with an exclamation point along the top.) As for the compound finals in this section, the "a" in the final "üan" (and in ian above) sounds like the "e" in "bed". Astonishingly, the final "ün" sounds like… "wean", a clear case of overweening.

Things to come:

In future installments, I'll talk about the writing system, and I'll link to some Youtube videos that clarify the pronunciation system well. I'll also point out the most useful online resources (such as the wonderful, which allows you to draw a character online and then look up its meaning). In the meantime, search online and you'll find a wealth of resources, many of them much better than mine!


Last 5 posts by David Byron


  1. says

    This has always been the easiest part of Mandarin for me. I tell Chinese speakers that I have good 口应 (kou3 ying1), or vocal style, but no 词汇 (ci2 hui4), or vocabulary–that is, when I remember that word and don't have to resort to 知道的字不太多 (zhidao4 de zi4 bu2tai4duo1), "The words I know are not many," which is just as awkward in Chinese as it sounds in English.

  2. K. Chang says

    Oh, the tongue twister, Chinese style. :D That was fun. I think those old Chinese poets are just trying to be exceeding clever.

  3. AlphaCentauri says

    Actually, it looks like a lot of people do spell it post-hold digger now that I check. It does hold the post, after all. Must be like duck tape, where so many people spelled it "duct tape" assuming it was for taping ducts that it's become the proper name for it.

  4. Blaze Miskulin says

    Oh my god, do I hate that damn 'r' sound! :-)

    I'm currently teaching English in China, so this is very interesting for me.

    One thing I'll add: English is spoken mostly from the chest and throat. Chinese is spoken almost entirely from the mouth (if that makes sense). For example: the word "say". Say it in English, and the vowel drops into the chest. In Mandarin, it never grows past the middle of the mouth. Teaching Chinese speakers to talk from the chest is on of the challenges I face. That and the dreaded 'th' sound. :-) I have just the opposite problem-learning to bring the sounds up out of my chest.

    Thanks to the post, now that I'm officially taking classes in Mandarin, these are a helpful way to review and organize the information.

  5. LTMG says

    Wish I had these articles about Chinese language when I was living in Shanghai and struggling with the very basics. The articles really help me visualize what I should have been attempting rather than participating in the expensive and hardly useful classes I took.

  6. says

    It's generally accepted these days that Cantonese in Hong Kong has 6 tones. Otherwise, a very nice summary of Mandarin pronunciation!

  7. wgering says

    Is it bad that this:

    Googling "pinyin chart" in your preferred search engine

    [Emphasis added]

    is my favorite part of the entire post?

  8. says

    This post was awesome, and epic. I loved reading stuff like this; please keep posting!

    > post hole digger

    A Firefly reference?

  9. Wei Wei says

    My wife, a linguist and native Mandarin Chinese speaker, just viewed the embedded "shi" video and assures me that, while clever, in no way translates to the poem text provided in the article.

  10. says

    @Wei Wei That's not an uncommon reaction. However, have her look at this while bearing in mind that the author relied on some archaic vocabulary:



    (Folks who want to follow along: if you're using Chrome, you can install this excellent plug-in which translates characters on mouse-over.)

    See also: Wikipedia's article and many websites and books that suggest she's (understandably) mistaken. It's impenetrable nonsense to the ear, but linguistically correct in a pedantic way, and illustrates through a reductio one of the pitfalls of transliterating Classical and/or contemporary Chinese.

    Please bear in mind that a version of the poem in standard written Mandarin would still have many homophones, but fewer. Also, some meanings of characters have shifted, etc.– facts about which your wife undoubtedly knows more than I.

  11. says

    @Blaze Miskulin
    Thanks for making that point. I think the distinction you're drawing may be the same one that singers have in mind when they distinguish "head voice" from "chest voice".

    Sounds as if you're positioned for a lot of fun and many adventures. Enjoy!

  12. K. Chang says

    What would be interesting is let a Chinese TTS (text to speech) package render that poem. :D

    I remember my first attempt at pronouncing the English word thermometer, and Yosemite. :)


  13. Careless says

    tu tu tu ="[the] pig pushed [the] closet" in Hokkien. It didn't take me long to give up on Chinese.

  14. zanaga says

    As a linguist (yes I can say that finally, woo graduation!) here's some more encouragement!
    It takes a native speaker of a language roughly eight (8) years to learn their language.
    After all, when did *you* start making sense to your elders?
    Granted, adults have a leg up, but they have other, more minor disadvantages as well. :D Go forth, and be awesome!
    (Very nice article, good bloggist!)

  15. DJ says

    You missed a y in the list of consonants, which usually only ever comes before the umlaut'd u (except then it becomes a normal u).

    Also, when I was a 4-year old kid in Chinese school being taught everything you recapped about pingying, the consonants were grouped the exact way you placed them here. We were all fluent speakers already, though, so the teachers just gave us cute associations for all the sounds and letters (fish for umlaut u, lion for sh, pigeon or song for g, watermelon for x, etc.). Also since we were four, just having us repeat all the sounds over and over while looking at the letters was enough–like how they teach you the alphabet in kindergarden XD