Ideas worth spreading

Get the perfect ideas,

selected just for you

TED日本語

TED Talks（英語日本語字幕付き動画）

TED日本語 - ルパル・パテル: 指紋のようにユニークな合成音声

TED Talks

指紋のようにユニークな合成音声

Synthetic voices, as unique as fingerprints

ルパル・パテル

Rupal Patel

内容

重度の言語障害をもつ多くの患者はコンピューターを駆使してコミュニケーションをはかっています。でもその音声の選択肢には限りがあります。そのため、イギリス人のスティーヴン・ホーキングの声はアメリカ訛りで、多くの人達が同じ声を使い、しばしば不似合いな声で我慢しているのです。スピーチ・サイエンティストのルパル・パテルはこの現状をどうにか変えたいと願いました。素晴らしい講演の中で、パテルは声なき人達のためにユニークな声を生み出す方法について紹介します。

カテゴリ

健康と医学

ディスアビリティ

タグ　　: TED日本語

外部リンク: TED｜ルパル・パテル: 指紋のようにユニークな合成音声

字幕

SCRIPT

Script

I'd like to talk today about a powerful and fundamental aspect of who we are: our voice. Each one of us has a unique voiceprint that reflects our age, our size, even our lifestyle and personality. In the words of the poet Longfellow, "the human voice is the organ of the soul." As a speech scientist, I'm fascinated by how the voice is produced, and I have an idea for how it can be engineered. That's what I'd like to share with you.

I'm going to start by playing you a sample of a voice that you may recognize.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

Rupal Patel: That was the voice of Professor Stephen Hawking. What you may not know is that same voice may also be used by this little girl who is unable to speak because of a neurological condition. In fact, all of these individuals may be using the same voice, and that's because there's only a few options available. In the U.S. alone, there are 2.5 million Americans who are unable to speak, and many of whom use computerized devices to communicate. Now that's millions of people worldwide who are using generic voices, including Professor Hawking, who uses an American-accented voice. This lack of individuation of the synthetic voice really hit home when I was at an assistive technology conference a few years ago, and I recall walking into an exhibit hall and seeing a little girl and a grown man having a conversation using their devices, different devices, but the same voice. And I looked around and I saw this happening all around me, literally hundreds of individuals using a handful of voices, voices that didn't fit their bodies or their personalities. We wouldn't dream of fitting a little girl with the prosthetic limb of a grown man. So why then the same prosthetic voice? It really struck me, and I wanted to do something about this.

I'm going to play you now a sample of someone who has,two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

(First voice)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

So what I wanted to do next is, I wanted to find out how we could harness these residual vocal abilities and build a technology that could be customized for them, voices that could be customized for them. So I reached out to my collaborator, Tim Bunnell. Dr. Bunnell is an expert in speech synthesis, and what he'd been doing is building personalized voices for people by putting together pre-recorded samples of their voice and reconstructing a voice for them. These are people who had lost their voice later in life. We didn't have the luxury of pre-recorded samples of speech for those born with speech disorder. But I thought, there had to be a way to reverse engineer a voice from whatever little is left over.

So we decided to do exactly that. We set out with a little bit of funding from the National Science Foundation, to create custom-crafted voices that captured their unique vocal identities. We call this project VocaliD, or vocal I.D., for vocal identity.

Now before I get into the details of how the voice is made and let you listen to it, I need to give you a real quick speech science lesson. Okay? So first, we know that the voice is changing dramatically over the course of development. Children sound different from teens who sound different from adults. We've all experienced this. Fact number two is that speech is a combination of the source, which is the vibrations generated by your voice box, which are then pushed through the rest of the vocal tract. These are the chambers of your head and neck that vibrate, and they actually filter that source sound to produce consonants and vowels. So the combination of source and filter is how we produce speech. And that happens in one individual.

Now I told you earlier that I'd spent a good part of my career understanding and studying the source characteristics of people with severe speech disorder, and what I've found is that even though their filters were impaired, they were able to modulate their source: the pitch, the loudness, the tempo of their voice. These are called prosody, and I've been documenting for years that the prosodic abilities of these individuals are preserved. So when I realized that those same cues are also important for speaker identity, I had this idea. Why don't we take the source from the person we want the voice to sound like, because it's preserved, and borrow the filter from someone about the same age and size, because they can articulate speech, and then mix them? Because when we mix them, we can get a voice that's as clear as our surrogate talker -- that's the person we borrowed the filter from -- and is similar in identity to our target talker. It's that simple. That's the science behind what we're doing.

So once you have that in mind, how do you go about building this voice? Well, you have to find someone who is willing to be a surrogate. It's not such an ominous thing. Being a surrogate donor only requires you to say a few hundred to a few thousand utterances. The process goes something like this.

(Video) Voice: Things happen in pairs.

I love to sleep.

The sky is blue without clouds.

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the target is going to want to say, but the idea is to cover all the different combinations of the sounds that occur in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a dataset or a database. We're going to call this database a voice bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" -- everyone needs to be able to say that? fish through that database and find all the segments necessary to say that utterance.

(Video) Voice: I love chocolate.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

This is Samantha. I met her when she was nine, and since then, my team and I have been trying to build her a personalized voice. We first had to find a surrogate donor, and then we had to have Samantha produce some utterances. What she can produce are mostly vowel-like sounds, but that's enough for us to extract her source characteristics. What happens next is best described by my daughter's analogy. She's six. She calls it mixing colors to paint voices. It's beautiful. It's exactly that. Samantha's voice is like a concentrated sample of red food dye which we can infuse into the recordings of her surrogate to get a pink voice just like this.

(Video) Samantha: Aaaaaah.

RP: So now, Samantha can say this.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

RP: Thank you. (Applause)

I'll never forget the gentle smile that spread across her face when she heard that voice for the first time. Now there's millions of people around the world like Samantha, millions, and we've only begun to scratch the surface. What we've done so far is we have a few surrogate talkers from around the U.S. who have donated their voices, and we have been using those to build our first few personalized voices. But there's so much more work to be done. For Samantha, her surrogate came from somewhere in the Midwest, a stranger who gave her the gift of voice. And as a scientist, I'm so excited to take this work out of the laboratory and finally into the real world so it can have real-world impact. What I want to share with you next is how I envision taking this work to that next level. I imagine a whole world of surrogate donors from all walks of life, different sizes, different ages, coming together in this voice drive to give people voices that are as colorful as their personalities. To do that as a first step, we've put together this website, VocaliD.org, as a way to bring together those who want to join us as voice donors, as expertise donors, in whatever way to make this vision a reality.

They say that giving blood can save lives. Well, giving your voice can change lives. All we need is a few hours of speech from our surrogate talker, and as little as a vowel from our target talker, to create a unique vocal identity.

So that's the science behind what we're doing. I want to end by circling back to the human side that is really the inspiration for this work. About five years ago, we built our very first voice for a little boy named William. When his mom first heard this voice, she said, "This is what William would have sounded like had he been able to speak." And then I saw William typing a message on his device. I wondered, what was he thinking? Imagine carrying around someone else's voice for nine years and finally finding your own voice. Imagine that.

This is what William said: "Never heard me before."

Thank you.

今日皆さんにお話したいのは私たちのあり方を決めるパワフルで基礎的なもの― 「声」についてです私たち一人一人に独特の声紋があり私たちの年齢､体格生活習慣や個性までも映し出しますヘンリー・ワーズワース・ロングフェローは「人の声は心のオルガン（心の臓器）である」と詩でつづりましたスピーチ・サイエンティストである私は発声の仕組みに魅せられこれを人工的に作り出す方法を見つけましたこれを皆さんと共有いたします

まずは皆さんがご存知かもしれない声のサンプルを流します

（音声）スティーヴン・ホーキング：「私が意図することはかなり明確だと思っていました」

お聞きいただいたのはスティーヴン・ホーキング教授の声です皆さんがご存知ないかもしれないのは同じ声をこちらの女の子のような神経疾患で話すことができない子供も使っている可能性があることです実はこのような方々は声の選択肢がごく限られているため同じ声を使っていることがあるのですアメリカだけでも話すことができない人達が 250万人もいますその多くの人達がコミュニケーション手段としてコンピューターを使用します世界規模で数百万の人々が人工音声を使っているのですホーキング教授もその１人でアメリカ訛りの音声を使っていますねこの個性に欠けた合成音声には本当にショックを受けました数年前に障害を持つ人の技術支援に関する会議に参加した時のことです展示ホールに足を入れると小さい女の子から成人男性までそれぞれの機器を使って話しているんですが機器は違えど同じ声でした周りを見回すと私の周りでも同じことが起こっていました文字通り数百人の人達がごく限られた音声を使っていてそれぞれの身体や個性に合っていないんです小さい女の子に成人男性用の義足をあてがうなんて想像できませんよねではなぜ人工音声もそうしないのか？これが大変気に掛かりこの状況を何とかしたいと思ったのです

これからお聞きいただくのは重度の言語障害を患っている２人の音声サンプルですどのように聞こえるかお聞きください同じ内容を発話しています

（第１音声）

（第２音声）話の内容までは分からなかったかもしれませんが２人の個性的な音声はお分かりいただけたでしょう

次に私がやりたかったことはこのように残された発話能力を活かして使用者に合わせてカスタマイズできるテクノロジーつまり彼らのためにカスタマイズできる声を開発することでしたそこで協力者のティム・バンネルに助言を仰ぎましたバンネル博士は音声合成の第一人者で彼がやっているのは事前に録音してあった本人の音声サンプルを用いて音声を復元することで個人用の音声を作っているのです対象となるのは後天性の障害で声を失った人達です生まれながらに言語障害がある人達には「事前に録音した音声サンプル」なんてありませんでも私が考えたのは残されたかすかな声からその人の声を蘇らせることができるはずだと

そこでこれに取り組むことにしたのですアメリカ国立科学財団からわずかな資金援助を受け話者の独特な声の特徴を反映した個人用音声の開発を始めました私たちはこのプロジェクトを “VocaliD”や“vocal I.D.”と名づけました

これから皆さんにこの特注の声がどのように作られ実際の声をお聞きいただく前に音声科学についてのごく簡単な講義をしますいいですか？まず私たちの音声は成長過程において劇的に変化します小さな子供の声は十代の人達と異なりますし成人の人達も異なります皆さんこれを経験しますね２つ目の事実は発声とは皆さんの喉頭から発せられた振動による音源が残りの声道を通過することで起こります皆さんの頭と首の中にあるスペースが振動することで音源をフィルターにかけて母音と子音が発音されるのですつまり音源がフィルターにかかることが発声のメカニズムなのですこれが一人一人に起きているわけです

先ほど申し上げたように私は重い言語障害を患う人達の音源の特性についての理解と研究に長いこと携わってきましたそこで気づいたのは彼らのフィルターに障害があっても音源は調節可能であるということでそれは声のピッチ､大きさ､テンポですこれらはプロソディー（韻律）と呼ばれるもので長年の調査で言語障害者のプロソディーが健在であることを実証してきましたですからこれらの表現が話し手のアイデンティティにも重要だと気づいた時このアイデアを思いついたのですそれは発話させたい人の音源を使い ―これは残っているんですね対象となる人と同じ年齢で同じ体格の人からフィルターを借りてこの明瞭な音声と混ぜたらどうかと考えたのです合成した声はフィルターを借りた代理話者と同じくらい明瞭な声で私たちがターゲットとしている話者のアイデンティティにも類似しているんですこんなに簡単なんですこれが私たちがやっていることの裏にある科学です

ではアイデアが思いついたところでどうやって実際に声を構築したらいいでしょう？まずはフィルターを提供してくれる人を探す必要がありました全然難しいことではないんです提供者になるということは数百から数千の言葉を発声するだけですこの過程はこんな感じです

声：物事は対になって起こります

寝るのが大好きです

雲一つない青い空です

これを３時間から４時間ほど続けますここでのポイントは対象となる人が話したい文章を代理人に言わせるのではなく言葉の中で生じる全ての異なる音の組み合わせを拾っていくことですサンプルが多ければ多いほどより質の良い声を得ることができます収録が終わったら次に必要なのは読まれた文章を解析し言語の要素に分割することです１つの音や２つの音の組み合わせや時には単語全体をデータセットすなわちデータベースに集積していきますこのデータベースを音声バンクと呼びましょう音声バンクのパワフルな点はこの音声バンクから新しい言葉を発声できることで「チョコレートが好き」とかこれは誰でも言いたいですよねデータベースを駆使してその言葉の発声に必要な全ての断片を見つけるのです

声：チョコレートが好きです

これが音声合成です波形接続合成という私たちが使っている手法ですこれは目新しくありませんが新しい点はどうやってこの若い女性が話すような音声にするかです

彼女の名前はサマンサです私が彼女に出会ったのは彼女が９歳の時で私のチームは彼女のための声を構築してきましたまずは代理ドナーを探してサマンサにもいくつかの発声をお願いしました彼女が発声できるのは主に母音だけですが彼女の音源特性を引き出すのには十分な情報でした次のステップは私の６歳の娘が上手く例えています娘は「声を色づかせるために絵の具を混ぜているんだね」ときれいですよねまさにその通りなんですサマンサの声は濃縮された食紅のように彼女の代理ドナーの録音した声に混ぜることでピンク色の声になるのですまさにこんな風に

サマンサ：ああああああ

今ではこんな風に話せます

サマンサ：この声は私だけのもの友達と新しい声で話すのが楽しみ

ありがとう　（拍手）

彼女が最初にこの声を聞いた時の顔いっぱいに広がった優しい笑みはずっと忘れないでしょう世界中には数百万人ものサマンサのような人々がいます数百万ですよ私たちの取り組みはまだまだ始まったばかりですこれまでの取り組みはアメリカ国内で声を提供してくれる人々を数名集めて私たちの初の試みとなる個人用の声の構築に利用していますでもやることは山ほどあります例えばサマンサの代理ドナーは中西部の出身で見ず知らずの他人が声の贈り物をしてくれたのです私が科学者としてとても楽しみなのは研究室でやっていた仕事をついに実用化して実社会に影響を与えることです次に皆さんと共有させていただくのはこの成果をどうやって次のレベルに進めるかです私が考えているのは世界中のあらゆる階層の人々異なる体格や違う年齢層の人々が代理ドナーとなって個性と同じくらい色彩に富んだ声を人々に贈ることですこれを叶えるための第一歩として『VocaliD.org』というウェブサイトを立ち上げました声や専門知識の提供を募るためのサイトで私たちのビジョンをいろいろな形で支援してくれる人たちを集める試みです

献血で他人の命を救うことができますね声を提供することで他人の人生を変えることができますほんの数時間分の代理話者の音声サンプルと声を受け取る人の発声した母音が１つでもあれば独特な声のアイデンティティを作れます

これが私たちがやっている裏にある科学なんですこの仕事にインスピレーションをもたらしてくれた人間的な部分に立ち返ることで締めくくります約５年前のことです私たちが最初に作った声はウィリアムという男の子のためでした母親がこの声を始めて耳にした時「まさにウィリアムの声だもしこの子が話せていたらきっとこんな声だったに違いない」とするとウィリアムが彼の機器でメッセージをタイプするんです私は彼が何を考えているのか思いを馳せました９年間も他人の声を使っていた男の子がついに自分の声を手に入れたのですどんな気分だと思いますか

ウィリアムはこう言いました「自分の声でしゃべったのは初めてだ」

ありがとうございました

（拍手）

―　もっと見る　―

―　折りたたむ　―

品詞分類

主語
動詞
助動詞
準動詞
関係詞等

品詞分類表

TED 日本語

TED Talks

関連動画

洋楽おすすめ

RECOMMENDS

洋楽歌詞