Ideas worth spreading

Get the perfect ideas,

selected just for you

TED日本語

TED Talks（英語日本語字幕付き動画）

TED日本語 - スパソーン・スワジャナコーン: 実在の人物の偽映像の作り方と、その見分け方

TED Talks

実在の人物の偽映像の作り方と、その見分け方

Fake videos of real people -- and how to spot them

スパソーン・スワジャナコーン

Supasorn Suwajanakorn

内容

有名人が実際には言っていないことを言っているように見せかけた偽映像を、あなたはうまく見分けられますか？この驚くべき講演とデモで、それがどのように作られるのかをご覧ください。コンピューター科学者スパソーン・スワジャナコーンは、大学院での研究として、AIと3次元モデリングを使い、本物にしか見えないような音声と同期した人物の映像を作りました。この技術の倫理的問題と創造的可能性、そしてその悪用への対策として行われていることについて学びましょう。

カテゴリ

科学と技術

コンピューター

タグ　　

外部リンク: TED｜スパソーン・スワジャナコーン: 実在の人物の偽映像の作り方と、その見分け方

字幕

SCRIPT

Script

Look at these images. Now, tell me which Obama here is real.

(Video) Barack Obama: To help families refinance their homes, to invest in things like high-tech manufacturing, clean energy and the infrastructure that creates good new jobs.

Supasorn Suwajanakorn: Anyone? The answer is none of them.

None of these is actually real. So let me tell you how we got here. My inspiration for this work was a project meant to preserve our last chance for learning about the Holocaust from the survivors. It's called New Dimensions in Testimony, and it allows you to have interactive conversations with a hologram of a real Holocaust survivor.

(Video) Man: How did you survive the Holocaust?

(Video) Hologram: How did I survive? I survived, I believe, because providence watched over me.

SS: Turns out these answers were prerecorded in a studio. Yet the effect is astounding. You feel so connected to his story and to him as a person. I think there's something special about human interaction that makes it much more profound and personal than what books or lectures or movies could ever teach us.

So I saw this and began to wonder, can we create a model like this for anyone? A model that looks, talks and acts just like them? So I set out to see if this could be done and eventually came up with a new solution that can build a model of a person using nothing but these: existing photos and videos of a person. If you can leverage this kind of passive information, just photos and video that are out there, that's the key to scaling to anyone.

By the way, here's Richard Feynman, who in addition to being a Nobel Prize winner in physics was also known as a legendary teacher. Wouldn't it be great if we could bring him back to give his lectures and inspire millions of kids, perhaps not just in English but in any language? Or if you could ask our grandparents for advice and hear those comforting words even if they're no longer with us? Or maybe using this tool, book authors, alive or not, could read aloud all of their books for anyone interested.

The creative possibilities here are endless, and to me, that's very exciting. And here's how it's working so far.

First, we introduce a new technique that can reconstruct a high-detailed 3D face model from any image without ever 3D-scanning the person. And here's the same output model from different views. This also works on videos, by running the same algorithm on each video frame and generating a moving 3D model. And here's the same output model from different angles.

It turns out this problem is very challenging, but the key trick is that we are going to analyze a large photo collection of the person beforehand. For George W. Bush, we can just search on Google, and from that, we are able to build an average model, an iterative, refined model to recover the expression in fine details, like creases and wrinkles. What's fascinating about this is that the photo collection can come from your typical photos. It doesn't really matter what expression you're making or where you took those photos. What matters is that there are a lot of them. And we are still missing color here, so next, we develop a new blending technique that improves upon a single averaging method and produces sharp facial textures and colors. And this can be done for any expression.

Now we have a control of a model of a person, and the way it's controlled now is by a sequence of static photos. Notice how the wrinkles come and go, depending on the expression. We can also use a video to drive the model.

(Video) Daniel Craig: Right, but somehow, we've managed to attract some more amazing people.

SS: And here's another fun demo. So what you see here are controllable models of people I built from their internet photos. Now, if you transfer the motion from the input video, we can actually drive the entire party.

George W. Bush: It's a difficult bill to pass, because there's a lot of moving parts, and the legislative processes can be ugly.

SS: So coming back a little bit, our ultimate goal, rather, is to capture their mannerisms or the unique way each of these people talks and smiles. So to do that, can we actually teach the computer to imitate the way someone talks by only showing it video footage of the person? And what I did exactly was, I let a computer watch 14 hours of pure Barack Obama giving addresses. And here's what we can produce given only his audio.

(Video) BO: The results are clear. America's businesses have created 14.5 million new jobs over 75 straight months.

SS: So what's being synthesized here is only the mouth region, and here's how we do it. Our pipeline uses a neural network to convert and input audio into these mouth points.

(Video) BO: We get it through our job or through Medicare or Medicaid.

SS: Then we synthesize the texture, enhance details and teeth, and blend it into the head and background from a source video.

(Video) BO: Women can get free checkups, and you can't get charged more just for being a woman. Young people can stay on a parent's plan until they turn 26.

SS: I think these results seem very realistic and intriguing, but at the same time frightening, even to me. Our goal was to build an accurate model of a person, not to misrepresent them. But one thing that concerns me is its potential for misuse. People have been thinking about this problem for a long time, since the days when Photoshop first hit the market. As a researcher, I'm also working on countermeasure technology, and I'm part of an ongoing effort at AI Foundation, which uses a combination of machine learning and human moderators to detect fake images and videos, fighting against my own work. And one of the tools we plan to release is called Reality Defender, which is a web-browser plug-in that can flag potentially fake content automatically, right in the browser.

Despite all this, though, fake videos could do a lot of damage, even before anyone has a chance to verify, so it's very important that we make everyone aware of what's currently possible so we can have the right assumption and be critical about what we see.

There's still a long way to go before we can fully model individual people and before we can ensure the safety of this technology. But I'm excited and hopeful, because if we use it right and carefully, this tool can allow any individual's positive impact on the world to be massively scaled and really help shape our future the way we want it to be.

Thank you.

この映像を見てください本物のオバマ大統領はどれでしょう？

（バラク・オバマ）住宅ローンを借り換える家庭を助けることハイテク製造業クリーンエネルギーインフラといったものに投資することで良い仕事が創出されます

（講演者）分かりますか？答えは全部ニセ者です

（笑）

どれ１つ本物ではありませんこれまでの道のりについてお話ししましょうこの研究をするヒントになったのはホロコーストの生存者たちから学ぶ最後の機会を保存しようというプロジェクトでした「証言の新局面」（New Dimensions in Testimony）という名前でホロコースト生存者のホログラムと対話することができます

（男）どうやってホロコーストを生き延びたんですか？

（ホログラム）どうやって生き延びたか？私が生き残れたのは ― 神が見守っていてくれたからだと私は思っています

（講演者）答えはスタジオであらかじめ録画されたものですがその効果は劇的ですその人の話やその人自身に対する強い結び付きを感じます人間同士のやり取りには特別な力があるのでしょう本や講義や映画などよりもずっと深く個人的な体験を与えてくれます

それで私は思うようになりましたこういうモデルを誰に対しても作れたら？その人自身のように見え語り振る舞うモデルですそれが可能か検討を始め既存の写真や映像だけからその人物のモデルを作る方法を考案しましたこういうその辺にある写真や映像といったありあわせの素材だけでよいなら誰に対してもモデルを作れるようになります

ちなみにこの人物はリチャード・ファインマンでノーベル物理学賞の受賞者であるのみならず優れた教師としてよく知られていましたもしファインマンを蘇らせ何百万という若者に話をして刺激を与えてもらいさらには他の言語でも語らせられたなら素晴らしいでしょうあるいはもうこの世を去ってしまったおじいさんおばあさんにアドバイスや心温まる言葉をかけてもらえたならまたこのツールを使えば存命か否かにかかわらず著者自身に本の朗読をしてもらうこともできるでしょう

これが持つ創造的可能性は限りがなくすごくワクワクさせられますその仕組みをお話ししましょう

まず顔の精細な３次元モデルを 3Dスキャンデータなしに任意の画像から作れる手法を開発しましたこれは同じモデルを別の視点から見たものですこの技術は映像にも使えます映像の各フレームに同じアルゴリズムを適用し動きのある３次元モデルを生成しますこちらは同じモデルを違う角度から見たものです

この問題はとても難しいのですがあらかじめその人物の大量の写真を解析することが鍵になりますジョージ・W・ブッシュなら Googleで画像検索するだけでよくそこから平均モデルを作ることができ段階的にモデルを改善していって皺のような表情の細部を再現しますこれのいいところは写真はごく普通のものでよいということですどういう表情かとかどこで撮られたかとかはあまり問題ではありません大事なのは写真がたくさんあるということですまだ色が付いていないので次に新しいブレンディング技法を開発し平均モデルを改良してくっきりとした顔の質感や色を付けますこれはどんな表情に対しても行えます

これで人物の動かせるモデルができました動きは一連の写真に合わせたものになります表情に応じて皺が現れたり消えたりするのに注意してくださいモデルを動かすのに映像を使うこともできます

（ダニエル・クレイグ）ええしかし私達はどうにかさらに素晴らしい人たちを引き入れました

（講演者）これで面白いことができますここに出ているのはネット上の写真から作った有名人のモデルです入力源となる映像の動きに合わせて全部の顔を動かすことができます

（ブッシュ）これは通すのが難しい法案で構成要素がたくさんあり立法の過程は見苦しいものになるかもしれません

（拍手）

（講演者）少し話を戻すと私達の究極の目標はそれぞれの人が話したり笑ったりする時の独特なやり方や癖を捉えるということですその人物が話している映像を見せるだけでコンピューターがその人の話し方を真似られるようにできるのでしょうか？それでオバマが演説している 14時間の映像をコンピューターに見せることにしましたこれはオバマが話す声だけから生成した映像です

（オバマ）結果は明らかですアメリカの産業界は 75ヶ月にわたり 1450万の新たな仕事を生み出したのです

（講演者）ここで合成されているのは口の部分だけでこんな風にしています私達のシステムはニューラルネットワークを使って入力された音声を口の位置を表す点に変換します

（オバマ）仕事やメディケアメディケイドを通じて得ています

（講演者）それから質感を合成し細部や歯を補い元の映像の頭部と背景に埋め込みます

（オバマ）女性は無料の健康診断を受けられ女性というだけで余分に支払うことはありません子供は26歳になるまで親の保険が使えます

（講演者）結果としてできたものはとてもリアルで興味深いものですが同時に私自身でも怖いと感じます私達の目標は人物の正確なモデルを作ることで誰かを騙ることではありませんしかしこれが悪用される可能性を危惧していますこの問題についてはフォトショップが現れて以来みんなずっと考えてきました研究者として私は対策技術の開発もしていて AI Foundationでの取り組みに参加していますそれは機械学習と人間のモデレーターの組み合わせにより偽物の画像や映像を検出しようという私自身の研究に対抗するものです公開を予定しているツールに Reality Defender がありこれはブラウザーのプラグインで偽物の可能性のあるコンテンツに対して警告を出すようになっています

（拍手）

それでも真偽の確認がなされる前に偽の映像が大きな被害をもたらすこともありうるので現在どのようなことが可能になっているのかをみんな理解していることが重要でそれにより適切な仮定をし批判的な目で物事を見られるようになるでしょう

人物の完全なモデルを作れ安全性も確保できるまでにはまだまだ時間がかかるでしょうでも私は希望と熱意を持っていますこの技術を正しく注意して使うなら誰もが広く世界に良い影響を与えられるようになりみんなが望む未来を築く助けになるはずだからです

ありがとうございました

（拍手）

―　もっと見る　―

―　折りたたむ　―

品詞分類

主語
動詞
助動詞
準動詞
関係詞等

品詞分類表

TED 日本語

TED Talks

関連動画

洋楽おすすめ

RECOMMENDS

洋楽歌詞