Japanese Phonetics and the Power of Pronunciation

Let me set the scene: Japanese class, second semester. Small university in the middle of nowhere. A meter of snow.

The semester had just started, so my 10 classmates and I were arranged in horseshoe formation around the teacher, who was going through a grammar list and asking everyone simple questions.

The teacher looked at me and, exaggerating every word, asked:

(さみさん!せんせいは かわいいと おもいますか?)
Sami! Do you think that I’m cute?

Phew, easy question. Trying not to look too relieved, I nodded and responded:

(うん。とっても こわい です!)
Yeah. You’re incredibly scary!

The teacher, shocked, made what seemed like a choking sound. A few people began laughing. I, having no idea what just happened, raised my eyebrows about three feet in confusion. My friend quickly spoke up in my defense:

(いや!かわいいです!せんせいは かわいい ですよ!)
No, you’re cute! Really, you’re cute!

That was the moment I learned the value of Japanese phonetics and clear pronunciation. If you finish the article, you’ll have all the information you need to pick out the three differences between こわい (scary) and かわいい (cute).

You won’t always get a chance to explain your way out of a situation like that, so in the name of love, heed my words: I want to help you avoid my mistakes, but to do that we need to talk about sounds. Lots of them.

Improving your accent is simple in theory but it isn’t easy in practice. It takes a lot of work to train your ears and mouth and I can’t do that for you. What I can do is point out where you’re probably making mistakes so that you know what areas you need to improve.

Naturally, if you make more of the right sounds and less of the wrong ones, your pronunciation is going to improve. Today I’m going to point out a few of these crucial Japanese phonetic sounds for you.

Why Study Japanese Phonetics?

When I began studying Japanese, I was told that Japanese pronunciation was very easy. The following are a few things I was told within the first few days of class in order to justify why we were spending only one class period on pronunciation:

  • The language is atonal (unlike Mandarin or Vietnamese).
  • Spelling is phonetic and pronunciation is consistent. Words sound like they look and look like they sound. Even someone who’s never studied Japanese before could read a text written in romaji and be understood without trouble (unlike someone studying French, for example).
  • Not only is pronunciation consistent, but it’s also easy for English speakers. English has a lot of vowels, but if you pronounce the vowels like in Spanish, you’ll be just fine.
  • し, I was told, is pronounced just like the word “she” in English. This is my specific bone to pick and I’ll make special note of it in this article.

I think that my experience studying Japanese in the classroom was pretty traditional, so I’m sure that most of you reading this post probably heard similar things. This begs a pretty straightforward question: If Japanese pronunciation is so easy, why would someone devote time to studying the phonetics of Japanese, or what Webster’s English Language Learner Dictionary defines as “the study of speech sounds”?

For a while, I wouldn’t have been able to answer this question. Just as my teacher said, Japanese pronunciation seemed pretty straightforward. I practiced phrases from “Genki” with my Japanese roommate a few times per week and he always understood.

As I improved at Japanese, however, the feeling that something with my pronunciation wasn’t quite right also got stronger. Then, one day, I learned about allophones.

Things to Know About Studying Japanese Phonetics

Allophones and a forewarning

In what might be too simple of a definition, allophones occur when one phonetic unit actually represents more than one sound.

That might sound complicated but it’s actually quite easy to demonstrate. Put your hand in front of your mouth and say “kite” followed by “sky.” Do you notice how when you say “kite,” a puff of air hits your hand but not when you say “sky”? 

This is because the /k/ sound in “kite” is aspirated (aspirated comes from the Latin word aspīrō, meaning “breathe upon”) but the one in “sky” isn’t. They’re actually, technically, two different sounds: aspirated and unaspirated /k/. Some languages use different letters to represent these two sounds but English doesn’t.

The word “allophone” isn’t the perfect word for this case because, well, “she” and し aren’t actually allophones. They’re completely different sounds that are created by physically different means. Producing these two sounds requires different mouth and tongue positions.

That being said, I think that the concept of allophones helps us understand two things:

1. To an untrained ear, “she” and し sound quite similar.

2. They’re not.

If you want to take a deeper dive into this concept, check out the article by John Pasden over at Sinosplice.

To answer my earlier question, then, the value in studying Japanese phonetics is that the sounds of Japanese are indeed not the same as those in English. If we don’t understand what our mouths are doing and therefore what sounds we’re actually making, there’s no way to know whether or not we’re pronouncing Japanese words correctly.

It’s ultimately a decision that’s up to you, but so long as you’re open to learning more, I’d like to spend the rest of the post talking about just how different Japanese and English pronunciation are and how you can begin making progress with improving your Japanese pronunciation.

The International Phonetic Alphabet

Before we move on, I’d like to take a second to introduce you to the International Phonetic Alphabet (IPA), a special alphabet used to accurately represent how words sound in any language. The letter represents different sounds in different languages, but the IPA letter /u/ always represents one sound and one sound only.

Spelling a word out with the IPA allows us to see how words actually sound no matter how they might look.

Go spend a few minutes looking at the IPA pages for English and Japanese. It’s okay if you’ve never used the IPA before and have no idea what sound a letter in the IPA represents. Just take a look. Pick out a sound or two in one language and then try to find it in the other.

My goal with this exercise is to show you that there are sounds in Japanese that aren’t in English, that there are sounds in English that don’t exist in Japanese and lastly, that each language shares a few similar, but not identical, sounds.

This is connected to my ultimate goal for this post, which is simply to point these major areas of similarity that many textbooks define as “close enough.” After understanding that “close enough” doesn’t mean “identical,” you can decide how much effort you’d like to spend learning about Japanese and English phonetics in order to improve your Japanese pronunciation.

So, let’s get started.

How Japanese Phonetics Can Improve Your Pronunciation

Morae: Don’t Forget About ん

If you’ve tried shadowing speech or ever looked at the basics of Japanese pronunciation, you might have seen that each Japanese mora (the building blocks of syllables) gets one beat and should be the same length. Very simply put, one mora is basically one kana (excluding small kanas like the ょ in ぎょ).

This is simple to understand—for as many kana as there are in a given word, you should clap that many times—but it’s also easy to overlook.

As most Japanese sounds are “consonant + vowel” pairs, the language itself sort of forces you to have a relatively consistent rhythm.

But then there’s ん. Just a quick little nasalized sound that’s really easy to quickly tack on to the vowel before it.

Remember that ん counts as one mora and should be vocalized as such. In other words, the word for “now,” 今度 (こんど) should get not two beats—KON-DO—but three: KO-N-DO.

This is point number one, and it’s such for a reason: Balancing out your morae will immediately improve your Japanese pronunciation.

Plus, unlike the things I’ll talk about in the coming sections, it’s something you can do without actually having spend time learning new sounds or paying attention to your mouth. This is where you should start if you haven’t really thought about how you sound in Japanese before.

Vowels: Avoid Turning Single Vowels into Diphthongs

If you compared the IPA pages of English and Japanese like I suggested above, then you probably noticed a pretty staggering difference even if you didn’t look very hard: English’s vowel section is huge compared to that of Japanese.

Part of this comes down to the fact that English simply has more vowel sounds than Japanese, but part of it is because English can be pretty sneaky about the diphthong, a sound where there are two vowels in a single syllable. It’ll take a bit of time to pare down your vowel repertoire and find the ones you should be using in Japanese, but mindfully working to eliminate diphthongs is something you can do right now.

For example, take the English word “no.” Say it. Now say it really, really slowly. You should notice that there are two sounds: You begin with an /o/ sound, but by the end of the word you’re making the u sound /ʊ/. You’re essentially saying “nou.”

Now apply this to Japanese. The no sound in の or 楽しい (たのしい) — “fun” isn’t a diphthong. Say the /o/, but stop before you get to the /ʊ/. This goes for all vowel sounds in Japanese. Every あ, い, う, え, お on its own is always going to be a single sound.

I don’t mean to say that you should never put two vowels together. For example, consider the word 能力 (のうりょく) — “ability,” which does indeed feature an お /o/ and the Japanese う sound /ɯ/ right next to one another. But unlike English, Japanese will clearly tell you when to do so.

Vowels, Part 2: Focus on Consistency with Fewer Sounds

Japanese has five vowel sounds:

  • /a/ as in “palm”
  • /e/ as in the first part of the diphthong in “face”
  • /i/ as in “seed”
  • /o/ as in the first part of that diphthong in “go”
  • /ɯ/, a sound that’s similar to the oo sound in “food”

Aside from the fact that /i/ and /ɯ/ become voiceless when surrounded by certain consonants, these five vowels are always pronounced the same and the first four even exist in English.

(Note: If a sound is voiceless, it means that your vocal cords don’t vibrate when producing it. This is easier to understand when you feel it, though. Put your fingers on your neck as if you were checking your pulse. Say the phrase “Who are you?” out loud and then whisper the same phrase. Do you notice the difference?)

As there are only five sounds, make sure you’re pronouncing these correctly! And the best way to do this is to practice, practice and more practice.

I personally met with a speech pathologist for pronunciation lessons. He pointed out that one of the most easily accessible and impactful ways to increase one’s pronunciation was to spend time getting the vowels down pat.

Here’s the recommended method to practice sounds if you don’t have a speech teacher to consult:

1. Find a video featuring a native Japanese speaker talking, that has accurate subtitles.

2. Read a sentence from the subtitles.

3. Listen to the native speaker say it.

4. Re-read the sentence based on what you hear.

5. Sit in front of a mirror and have a tape recorder rolling. Watch your mouth as you speak and listen to the recording. Compare it to the native speaker and notice any differences.

6. Make appropriate changes based on what you noticed and repeat the sentence again.

7. Keep going until you perfect the sentence, then move on to another.

Sometimes, just hearing the correct pronunciation of a given sound is enough to improve your own pronunciation. Other times, you might hear the mistake but not be sure how to fix it.

If you find yourself stuck with the latter problem, you might want to look into hiring a tutor specifically to work on your pronunciation skills. The missing ingredient here is feedback: You need a teacher to show you what you can’t hear for yourself.

Even if you don’t have access to a professional, though, any native speaker can tell you if your recording sounds correct or if something sounds funny even if they aren’t able to explain exactly why.

If you’re not ready for that sort of commitment, I’d also like to share a really well done YouTube series by Fluent Forever that looks at Japanese vowels, including the difference between Japanese and English’s sounds, in detail.

The first step, after all, is being aware that a difference exists.

Consonants: Differentiating Between Similar Sounds

As I wrote above, while a number of sounds in Japanese are new, none are strikingly new in the way that you simply sit down and say “I’m physically incapable of making this sound” (concluding this week’s episode of “my relationship with a trilled r”). Rather, they’re close enough that I don’t think most people would realize there are differences without actually looking into phonetics (thus, the reason for me writing this post!).

I personally think it helps to see before you hear, so I’d like to share one more IPA chart with you, but this time it’s plotted out more visually. Click on symbols to hear what sound they make.

One idea that’s really important to Japanese pronunciation is palatalization, and while you may not be familiar with the term, it’s something you’re probably comfortable doing without realizing it. Here’s a simple video concerning sound changes in Japanese (in other words, what changes the diacritic markers in は→ば・ぱ are actually representing).

I think you just might get a feel for what “palatalization” means when you see the process in which it’s represented in Japanese—but if not, here’s a video with exercises explaining how to make the palatalized sounds of English.

Crudely put, try pronouncing a /j/ (y as in “you”) at the same time as another consonant, such as /g/ or /b/. This brings me to my first set of consonants.

Palatalized sounds

In the process of learning hiragana and katakana you will have learned that sometimes, “small” kanas can be appended to bigger ones to make new sounds, such as “び + よ = びょ.”

You might have found yourself asking if there was a reason why the little kanas were added to these consonants—b, p, g, k, m and n specifically—and if you did, you were onto something. When you add a small や, ゆ or よ to these consonants, you’re actually representing a palatalized sound.

In other words, Japanese has two sets of these consonants: b and bʲ, p and pʲ, g and gʲ, k and kʲ, m and mʲ and lastly n and ɲ. This isn’t as scary as it sounds because, realize it or not, I’m almost sure that you’ve been making both the normal sound and the palatalized sound correctly.

What I do want to emphasize is something that should hopefully be obvious by now, but it’s going to be important for our next set of sounds: The g sound in ぎょ and ご aren’t the same.

Try it for yourself: repeat the sounds slowly back and forth. Close your eyes and focus your attention generally on your mouth: where are the sounds coming from? What does your mouth feel like? You should feel that the sound in ぎょ seems to come from a bit “higher” of a place than that of ご.

If you’re struggling, I think it helps to whisper the sounds. Again, ぎょ features a palatalized /g/ while the sound in ご is a plain /g/. Once you find the difference, hold onto it as we move onto the next sound.

What in the h? Three different sounds: /h/, /ç/ and /ɸ/

While は, ひ, ふ, へ and ほ are all transcribed as beginning with the letter h, as you can see in this study, there are actually three different initial consonants here: /h/, /ç/ and /ɸ/.

/ɸ/ is a new yet accessible sound that’ll take a bit of playing with your lips while the sound in ひ, /ç/, is a palatalized variant of the /h/ sound.

Go back to ご vs. ぎょ and find the difference in feeling once more, then try again with ほ and ひょ. You should feel a difference in position and if you hold your hand in front of your mouth, you should also notice much less air hitting your hand when you say ひょ.

ふ is the first sound I’ve talked about so far that’ll be completely unfamiliar as neither its consonant nor vowel exists in English: /ɸ/+/ɯ/.

So, what’s the difference between the /f/ (as in “fan”) that we’re all familiar with and /ɸ/ (as in Mt. ふじ — Fuji)?

Going back to that visual IPA chart, we see that the technical term for /f/ is “labiodental fricative” whereas /ɸ/ is “bilabial fricative.” That’s fancy speak for one sound that involves your lip touching your teeth and another that involves both of your lips but not your teeth.

Pretend that you’re blowing out a candle and freeze in the middle of blowing. Pay attention to how your mouth feels, and then maintain that position while saying ふ. If you’re feeling unsure, check out the videos Glossika has produced comparing /f/ and /ɸ/. Next, check out Wasabi Japanese again and compare your pronunciation to a native speaker’s.

The most difficult word in English is “really”: /r/ vs. /ɾ/

The bold words of this section’s header were uttered by my second semester Japanese teacher, and if you were to look at diagrams of what our tongues do for a few of these sounds, the reason is very apparent. In terms of tongue position, the Japanese /ɾ/ is somewhere in between the English /r/ and /l/.

Tofugu has an entire video about this one sound, but it’s thankfully not a terribly difficult sound to figure out.

Pretend you’re singing a Christmas carol—“la la la la la, la la la la!”—and pay attention to where your tongue is. It should be just above your upper teeth, almost touching them. Now ditch the ugly sweater and sing the beginning of an un-creative cheer—“ra ra ra!”—and again, pay attention to where your tongue is.

Now say la and then without stopping your breath, say r, so we get a nonsense la-err type sound. You should notice that you basically trace a line a few millimeters back from your l tongue position to get to your r tongue position.

Now that you’ve got that figured out, pick a position in the middle and say a few Japanese words that begin with this r sound, like ラーメン — “ramen.” If the sound isn’t an l sound, not quite an r sound but also seems to be somewhere in the middle, you’re on the right track!

What し actually said: ɕ , ʑ and 

Almost 3,500 words later and we’ve finally come to the eponym, the namesake, of this post. We’ve learned about allophones and played with the bio-mechanics of your mouth and dove headfirst into the IPA all so that I could share with you why learning about Japanese phonetics is important:

The sound in “she” is called a “voiceless postalveolar fricative” and it looks like this: /ʃ/.

The sound in し is called a “voiceless alveolo-palatal fricative” and it looks like this: /ɕ/.

They’re different sounds.

No matter what your textbook or teacher might have said, pronouncing し like “she” will result in you never learning how to make the right sound. Unfortunately, for whatever reason, I really struggled to find good resources about how to pronounce し.

That’s why I’m going to do something that might sound a little bit uncouth at first: I’d like you to check out a few materials aimed at Mandarin speakers.

Keeping in mind things we’ve already talked about, though, I feel comfortable doing so for a very simple reason: Based on what we’ve learned about the IPA we know that we can consider sounds objectively, outside of how a given language might represent them. Therefore, even though you’re learning Japanese and not Mandarin, the fact that you need to learn to make the sound /ɕ/ doesn’t change. This sound exists in both languages.

Plus, for whatever reason, there are tons of high-quality resources aimed at learning Mandarin phonetics. In other words, there are tons of high-quality resources for learning to make /ɕ/ (し) and /tɕ/ (ち); you just need to look under their pinyin labels of x and j instead of their hiragana labels of し and ち.

You can start by watching a video from OLS Mandarin which compares several Mandarin consonant sounds.

What’s important for you to keep in mind is that pinyin x and hiragana し are the same sound: /ɕ/. Additionally, pinyin j and hiragana ち are also the same sound: /tɕ/.

Try to hear the difference between the two sounds, and then look into a few videos that talk about the sounds more specifically like the ones by Yoyo Chinese (for し and for ち). If you’re willing to go a little bit out of your way for a more precise explanation, insofar as that you’ll also have to learn a bit about Mandarin sh and zh, check out the excellent videos by Litao Chinese (for し and for ち).

Unfortunately, the Japanese じ sound (variably, /ʑ/ and /dʑ/)* doesn’t exist in Mandarin, but we can combine what we learned about し and the idea of voicing (saying “Who are you?” aloud vs. in a whisper)—the only difference between しand じ is thatし is unvoiced (your vocal chords don’t vibrate) while じ is voiced (your vocal chords should vibrate).

Once you figure out how to make the し sound /ɕ/, just play around a little bit—it should be quite natural—and before long, you’ll also be making the じ sound /ʑ/.

(*Note: Remember from earlier how English k actually represents two sounds (kite vs. sky)? Japaneseじ does, also. The core of both sounds is /ʑ/, the voiced version of /ɕ/, but the sound is made harder in some words by tacking a /d/ (as in dog) onto the front. Pay attention with your ears, but don’t worry too much about it).

You’ve got to be kidding me right now, man, about this nihongo: n sounds

That was pretty rough. As we’re wrapping up the consonants, I wanted to finish on an easy note that I think will make you feel pretty smart because you’re going to realize that you’re basically brilliant in a way you didn’t know: There are five different ん sounds in Japanese, and you’re probably making them all correctly.

We’ve already talked about a few of them: The normal n sound /n/ comes before consonants other than い or the little よ, や, ゆ sounds—in this case, it becomes a palatalized /ɲ/. If you’ve ever puzzled over the fact that 頑張る (がんばる) — “good luck/do your best” seems to be frequently misspelled as spelled gambaru in some textbooks or phrasebooks, then you’re also familiar with the rule that /n/ becomes /m/ (as in “mom”) before /m/, /b/ (as in boy) or /p/ (as in pot).

Realize it or not, you probably also pronounce your normal /n/ as a /ŋ/ (the ng sound of -ing in words like “going” or “sing”) when ん comes before a /k/ or /g/ sound.

Lastly, we have /ɴ/, the sound of ん when it’s the last sound occurring before a pause or, as Wikipedia puts it, at the of an utterance (like in すみません… — I’m sorry/excuse me…). This is a two-part sound that involves taking air in through your nose and then blocking it with the back of your tongue.

First, say “hmm”—notice how you can feel the sound in your nose area. This means that it’s a nasal sound—remember that feeling and do the same thing, but don’t pronounce in hmm. Your lips are probably closed while doing this; try to open them and make an nn sound, but don’t touch anything with your tongue. With your mouth open like this, you should feel the sound coming from somewhere around your nose.

Now remember this sound and tack it onto the end of すみません so that you’re left with a sort of ongoing humming sound. Part two of the sound involves ending it, and again, we do so by bringing the back of our tongue towards our throat (uvula) to block the flow of air. It’s difficult to put into words, but the sound should feel progressively more difficult to sustain as you feel your tongue go backward. Let it taper off naturally and as you get the hang of it, end the sound at a normal pace.

Pitch accent

As if all of that wasn’t enough, there’s more. I’m going to leave this section short and sweet partly because it’s a really complex topic, partly because this post is already really long and partly because I’m still working on pitch accent myself. In a nutshell, though, just like in English, each word in Japanese is emphasized in a certain way.

Simply put, English accomplishes this by stressing certain syllables of a word (CER-tain, not cer-TAIN). Japanese words are all stressed equally, but they follow a few special patterns of high and low pitches. 銀行 (ぎんこう) — “bank,” for example, starts with a low pitch followed by morae of three high pitches. Just as it sounds really strange to say cer-TAIN, it also sounds quite off to follow a different pattern of pitch.

There are a few main patterns that words follow but this pattern isn’t fixed and varies depending on what’s going on in a sentence. Different parts of speech (like adjectives or verbs) follow different loose patterns but there are no hard-set rules; the only way to know a word’s pattern is to check it out in a dictionary. At a very basic level, in the Tokyo/standard dialect of Japanese, there are two fundamental rules:

1. The first two morae of a word won’t have the same level of pitch (if the first mora is high then the second will be low and vice versa).

2. Once the pitch of a word drops, it won’t come back again.

Again, if you’re hearing about this for the first time and are freaking out, don’t. Your Japanese hasn’t suddenly been rendered incomprehensible and you don’t necessarily need to go back and relearn all of the 50, 500 or 5,000 words you might know.

What it does mean is that yoU are proBABly talKING like this, which sounds pretty funky.

My suggestion is to spend some time learning about how pitch accent works, learn a handful of really frequent and common words for each pattern so you can get a sense for how each one feels and then just pay attention to the accent of words as you consume media, converse or listen to people talking.

If you’d like to look at this in depth, a guy named Dogen has released a super in-depth series on the topic that’s currently almost 50 episodes long.


Wow, was that a journey. We’ve just covered a ton of information about Japanese phonetics and at this point, you’re probably asking yourself if it’s worth all the effort to figure this stuff out… and honestly, that’s a question for you to answer.

The simple answer is no. You don’t need to figure out all of these sounds and memorize lots of pitch accents; you can completely ignore everything I’ve just written and people will still be able to understand you. Accents are cool: We think that a French accent sounds romantic, after all.

Yours might sound boring to you because you hear it everywhere, but it’s exotic and interesting to others.

However, if your goal is to eventually approach a native level of fluency, learning about Japanese phonetics will get you there. Gambatte, ne!

