Ideas worth spreading

Get the perfect ideas,

selected just for you

TED日本語

TED Talks（英語日本語字幕付き動画）

TED日本語 - エイブ・デイヴィス: 物の隠れた性質を解き明かす新しい映像技術

TED Talks

物の隠れた性質を解き明かす新しい映像技術

New video technology that reveals an object's hidden properties

エイブ・デイヴィス

Abe Davis

内容

音が物に引き起こす小さな振動をはじめ、私たちの周りでは微細な動きが絶えず起きています。最近の技術は、見たところ動きのない物の映像からそのような振動を拾い出し、音や会話を復元することを可能にしていますが、エイブ・デイヴィスはそれをさらに１歩進めています。何の変哲もないビデオから、物の隠れた性質を対話的に探れるソフトウェアのデモを是非ご覧ください。

カテゴリ

科学と技術

コンピューター

タグ　　: TED日本語

外部リンク: TED｜エイブ・デイヴィス: 物の隠れた性質を解き明かす新しい映像技術

字幕

SCRIPT

Script

Most of us think of motion as a very visual thing. If I walk across this stage or gesture with my hands while I speak, that motion is something that you can see. But there's a world of important motion that's too subtle for the human eye, and over the past few years, we've started to find that cameras can often see this motion even when humans can't.

So let me show you what I mean. On the left here, you see video of a person's wrist, and on the right, you see video of a sleeping infant, but if I didn't tell you that these were videos, you might assume that you were looking at two regular images, because in both cases, these videos appear to be almost completely still. But there's actually a lot of subtle motion going on here, and if you were to touch the wrist on the left, you would feel a pulse, and if you were to hold the infant on the right, you would feel the rise and fall of her chest as she took each breath. And these motions carry a lot of significance, but they're usually too subtle for us to see, so instead, we have to observe them through direct contact, through touch.

But a few years ago, my colleagues at MIT developed what they call a motion microscope, which is software that finds these subtle motions in video and amplifies them so that they become large enough for us to see. And so, if we use their software on the left video, it lets us see the pulse in this wrist, and if we were to count that pulse, we could even figure out this person's heart rate. And if we used the same software on the right video, it lets us see each breath that this infant takes, and we can use this as a contact-free way to monitor her breathing.

And so this technology is really powerful because it takes these phenomena that we normally have to experience through touch and it lets us capture them visually and non-invasively.

So a couple years ago, I started working with the folks that created that software, and we decided to pursue a crazy idea. We thought, it's cool that we can use software to visualize tiny motions like this, and you can almost think of it as a way to extend our sense of touch. But what if we could do the same thing with our ability to hear? What if we could use video to capture the vibrations of sound, which are just another kind of motion, and turn everything that we see into a microphone?

Now, this is a bit of a strange idea, so let me try to put it in perspective for you. Traditional microphones work by converting the motion of an internal diaphragm into an electrical signal, and that diaphragm is designed to move readily with sound so that its motion can be recorded and interpreted as audio. But sound causes all objects to vibrate. Those vibrations are just usually too subtle and too fast for us to see.

So what if we record them with a high-speed camera and then use software to extract tiny motions from our high-speed video, and analyze those motions to figure out what sounds created them? This would let us turn visible objects into visual microphones from a distance. And so we tried this out, and here's one of our experiments, where we took this potted plant that you see on the right and we filmed it with a high-speed camera while a nearby loudspeaker played this sound.

(Music: "Mary Had a Little Lamb")

And so here's the video that we recorded, and we recorded it at thousands of frames per second, but even if you look very closely, all you'll see are some leaves that are pretty much just sitting there doing nothing, because our sound only moved those leaves by about a micrometer. That's one ten-thousandth of a centimeter, which spans somewhere between a hundredth and a thousandth of a pixel in this image. So you can squint all you want, but motion that small is pretty much perceptually invisible. But it turns out that something can be perceptually invisible and still be numerically significant, because with the right algorithms, we can take this silent, seemingly still video and we can recover this sound.

(Music: "Mary Had a Little Lamb")

So how is this possible? How can we get so much information out of so little motion? Well, let's say that those leaves move by just a single micrometer, and let's say that that shifts our image by just a thousandth of a pixel. That may not seem like much, but a single frame of video may have hundreds of thousands of pixels in it, and so if we combine all of the tiny motions that we see from across that entire image, then suddenly a thousandth of a pixel can start to add up to something pretty significant.

On a personal note, we were pretty psyched when we figured this out. (Laughter) But even with the right algorithm, we were still missing a pretty important piece of the puzzle. You see, there are a lot of factors that affect when and how well this technique will work. There's the object and how far away it is; there's the camera and the lens that you use; how much light is shining on the object and how loud your sound is. And even with the right algorithm, we had to be very careful with our early experiments, because if we got any of these factors wrong, there was no way to tell what the problem was. We would just get noise back. And so a lot of our early experiments looked like this. And so here I am, and on the bottom left, you can kind of see our high-speed camera, which is pointed at a bag of chips, and the whole thing is lit by these bright lamps. And like I said, we had to be very careful in these early experiments, so this is how it went down.

(Video) Abe Davis: Three,two,one, go. Mary had a little lamb! Little lamb! Little lamb!

AD: So this experiment looks completely ridiculous. (Laughter) I mean, I'm screaming at a bag of chips -- (Laughter) -- and we're blasting it with so much light, we literally melted the first bag we tried this on. (Laughter) But ridiculous as this experiment looks, it was actually really important, because we were able to recover this sound.

(Audio) Mary had a little lamb! Little lamb! Little lamb!

AD: And this was really significant, because it was the first time we recovered intelligible human speech from silent video of an object. And so it gave us this point of reference, and gradually we could start to modify the experiment, using different objects or moving the object further away, using less light or quieter sounds. And we analyzed all of these experiments until we really understood the limits of our technique, because once we understood those limits, we could figure out how to push them.

And that led to experiments like this one, where again, I'm going to speak to a bag of chips, but this time we've moved our camera about 15 feet away, outside, behind a soundproof window, and the whole thing is lit by only natural sunlight. And so here's the video that we captured. And this is what things sounded like from inside, next to the bag of chips.

(Audio) Mary had a little lamb whose fleece was white as snow, and everywhere that Mary went, that lamb was sure to go.

AD: And here's what we were able to recover from our silent video captured outside behind that window.

(Audio) Mary had a little lamb whose fleece was white as snow, and everywhere that Mary went, that lamb was sure to go.

AD: And there are other ways that we can push these limits as well. So here's a quieter experiment where we filmed some earphones plugged into a laptop computer, and in this case, our goal was to recover the music that was playing on that laptop from just silent video of these two little plastic earphones, and we were able to do this so well that I could even Shazam our results. (Laughter)

(Music: "Under Pressure" by Queen)

And we can also push things by changing the hardware that we use. Because the experiments I've shown you so far were done with a camera, a high-speed camera, that can record video about a 100 times faster than most cell phones, but we've also found a way to use this technique with more regular cameras, and we do that by taking advantage of what's called a rolling shutter. You see, most cameras record images one row at a time, and so if an object moves during the recording of a single image, there's a slight time delay between each row, and this causes slight artifacts that get coded into each frame of a video. And so what we found is that by analyzing these artifacts, we can actually recover sound using a modified version of our algorithm. So here's an experiment we did where we filmed a bag of candy while a nearby loudspeaker played the same "Mary Had a Little Lamb" music from before, but this time, we used just a regular store-bought camera, and so in a second, I'll play for you the sound that we recovered, and it's going to sound distorted this time, but listen and see if you can still recognize the music.

(Audio: "Mary Had a Little Lamb")

And so, again, that sounds distorted, but what's really amazing here is that we were able to do this with something that you could literally run out and pick up at a Best Buy.

So at this point, a lot of people see this work, and they immediately think about surveillance. And to be fair, it's not hard to imagine how you might use this technology to spy on someone. But keep in mind that there's already a lot of very mature technology out there for surveillance. In fact, people have been using lasers to eavesdrop on objects from a distance for decades. But what's really new here, what's really different, is that now we have a way to picture the vibrations of an object, which gives us a new lens through which to look at the world, and we can use that lens to learn not just about forces like sound that cause an object to vibrate, but also about the object itself.

And so I want to take a step back and think about how that might change the ways that we use video, because we usually use video to look at things, and I've just shown you how we can use it to listen to things. But there's another important way that we learn about the world: that's by interacting with it. We push and pull and poke and prod things. We shake things and see what happens. And that's something that video still won't let us do, at least not traditionally. So I want to show you some new work, and this is based on an idea I had just a few months ago, so this is actually the first time I've shown it to a public audience. And the basic idea is that we're going to use the vibrations in a video to capture objects in a way that will let us interact with them and see how they react to us.

So here's an object, and in this case, it's a wire figure in the shape of a human, and we're going to film that object with just a regular camera. So there's nothing special about this camera. In fact, I've actually done this with my cell phone before. But we do want to see the object vibrate, so to make that happen, we're just going to bang a little bit on the surface where it's resting while we record this video.

So that's it: just five seconds of regular video, while we bang on this surface, and we're going to use the vibrations in that video to learn about the structural and material properties of our object, and we're going to use that information to create something new and interactive. And so here's what we've created. And it looks like a regular image, but this isn't an image, and it's not a video, because now I can take my mouse and I can start interacting with the object. And so what you see here is a simulation of how this object would respond to new forces that we've never seen before, and we created it from just five seconds of regular video.

And so this is a really powerful way to look at the world, because it lets us predict how objects will respond to new situations, and you could imagine, for instance, looking at an old bridge and wondering what would happen, how would that bridge hold up if I were to drive my car across it. And that's a question that you probably want to answer before you start driving across that bridge. And of course, there are going to be limitations to this technique, just like there were with the visual microphone, but we found that it works in a lot of situations that you might not expect, especially if you give it longer videos.

So for example, here's a video that I captured of a bush outside of my apartment, and I didn't do anything to this bush, but by capturing a minute-long video, a gentle breeze caused enough vibrations that we could learn enough about this bush to create this simulation. (Applause) And so you could imagine giving this to a film director, and letting him control, say, the strength and direction of wind in a shot after it's been recorded. Or, in this case, we pointed our camera at a hanging curtain, and you can't even see any motion in this video, but by recording a two-minute-long video, natural air currents in this room created enough subtle, imperceptible motions and vibrations that we could learn enough to create this simulation.

And ironically, we're kind of used to having this kind of interactivity when it comes to virtual objects, when it comes to video games and 3D models, but to be able to capture this information from real objects in the real world using just simple, regular video, is something new that has a lot of potential.

So here are the amazing people who worked with me on these projects. (Applause)

And what I've shown you today is only the beginning. We've just started to scratch the surface of what you can do with this kind of imaging, because it gives us a new way to capture our surroundings with common, accessible technology. And so looking to the future, it's going to be really exciting to explore what this can tell us about the world.

Thank you.

私たちはみんな動きというのは見えるものだと思っています私がステージの上を歩き話しながら身振り手振りをするそのような動きは目に見えるものですしかしあまりに小さくて人の目には留まらない重要な動きの世界がありますこの何年か私たちはそういった動きが人の目には見えなくともカメラなら捉えられることに注意を払うようになりました

どういうことか説明しましょう左側は人の手の映像で右側は眠っている赤ちゃんの映像ですしかしもし私がビデオだと言わなければ皆さん写真を見ているのだと思ったことでしょうどちらの映像にもほとんど動きがないからですそれでもここには沢山の微細な動きがあります左側の人の手首に触れてみたなら脈を感じるだろうし右側の赤ちゃんを抱きかかえたなら呼吸に応じて赤ちゃんの胸が上下するのを感じられることでしょうこれらの動きは大切なものですがあまりに小さくて見ただけでは分からないため手で直に触って感じ取る必要があるのです

しかし数年前に MITの同僚が「モーション・マイクロスコープ」というのを作りました映像の中のこのような小さな動きを検出して拡大し目で見て分かるようにするソフトウェアですそのソフトウェアを左の映像に使うと手首の脈動が目に見えるようになり脈を数えて心拍数を測定することだってできますそのソフトウェアを右の映像に使ったなら赤ちゃんのする呼吸が目に見えるようになり触れることなく赤ちゃんの呼吸の状況をモニタできるようになります

これはとても強力な技術です通常は触れなければ分からない現象を接触せずに視覚だけで捉えられるからです

２年ほど前から私はこのソフトウェアを考案した人たちと共同で研究するようになり奇想天外なアイデアに挑戦することにしましたこのソフトウェアで小さな動きを可視化してあたかも触覚が拡張されたかのようにできるのはすごいけどこれを聴覚にも適用できないだろうかと思ったのです音による振動というのもまた一種の動きなのだからそれを捉えて目に付くものすべてをマイクに変えてしまうことはできないか？

これはちょっと奇妙なアイデアなので分かるように説明しましょう普通のマイクというのは中にある振動板の動きを電気信号に変換する仕組みになっています振動板は音に敏感に反応して動くようにデザインされていてその動きを音として解釈し記録できるようになっていますしかし音はどんな物でも振動させますただそういった振動はあまりに小さく速いため目に見えないだけです

この振動を高速度カメラで撮影してソフトウェアでその小さな動きを取り出し分析することでその振動を作り出したのがどんな音か知ることはできないだろうか？それができれば離れたところにあるものを視覚的なマイクへと変えることができますそれで試してみましたご覧頂くのは行った実験の１つで右の鉢植えの植物を高速度カメラで撮影しながら近くに置いたスピーカーでこんな音を流しました

（曲『メリーさんのひつじ』）

これが撮影したビデオで毎秒数千フレームで撮っていますが目をこらしてみてもただ葉っぱがじっとしているようにしか見えないでしょう音による葉っぱの動きは１ミクロン程度だからです１センチの１万分の１ですこの映像で１ピクセルの百分の１から千分の１の間というところですだからいくら目をこらしたところでそのような小さな動きは目では捉えられないのですしかし知覚的には感知できなくとも数値的には有意な変化があり適切なアルゴリズムを使えばこの静止しているようにしか見えない映像からこのような音を取り出すことができます

（曲『メリーさんのひつじ』）

（拍手）

どうしてそんなことが可能なのか？そんな小さな動きからどうやってこれほど多くの情報を取り出せるのか？葉っぱの動きがちょうど１ミクロンで映像の中の動きは千分の１ピクセルだったとしましょうこれはわずかなものに見えますがビデオの１フレームの中には何十万というピクセルがありそういった小さな動きを映像全体から集めれば千分の１ピクセルが積み重なって十分大きなものになるのです

個人的なことですがこのことを発見した時にはすごく興奮しましたね（笑）優れたアルゴリズムはありましたがパズルの重要なピースがまだ欠けていましたこの手法がうまくいくかに影響する要因はたくさんあります対象がどんな物でどれくらい離れているかどんなカメラやレンズを使うか物に当てる光の強さや音の大きさはどれくらいかそしてアルゴリズムは優れているにしても初期の実験はすごく慎重にやる必要がありましたそういった要因の何か１つでもまずいと何が悪かったのかも分からずただノイズが出てくるだけだからですですから初期の実験はこのような設定で行ったのです私が写っています左下に高速度カメラが設置されていてポテトチップの袋に向けられていますそして全体が明るい照明で照らされています申し上げたように初期の実験は非常に慎重を期して進めましたこれがその様子です

（男性の声）３２１ハイ（デイヴィスが大声で）メリーさんのヒツジヒツジヒツジ

（笑）

ご覧のように馬鹿みたいに見える実験でした（笑）私がポテトチップの袋に向かって大声を張り上げています（笑）おまけにすごく強い照明を当てていたので最初のポテトチップの袋は熱で文字通り溶けてしまいました（笑）しかしいかに馬鹿みたいに見えようともこの実験はとても重要なものでしたこのような音を取り出すことに成功したからです

Mary had a little lamb! Little lamb! Little lamb!

（拍手）

とても重要な瞬間でした物を撮した音声のない映像から聞き取れる人の声を初めて復元できたからですこの実験を基準点として私たちはいろいろ変化をつけた実験を始めました異なる物を使う物をもっと離れたところに置く光を弱くする音を小さくするそういった実験の結果を分析してこの手法の限界を見極めましたひとたび限界が分かればどう押し広げられるかも分かるからです

そうやってこんな実験にたどり着きましたここでもポテトチップの袋に向けて音を流しますが今回はカメラが５メートル離れていて防音ガラスの背後にあります照らしている光も自然の太陽光ですご覧いただいているのが撮影した映像ですそしてこれが部屋の中でポテトチップの袋の横で流していた音です

Mary had a little lamb whose fleece was white as snow, （メリーさんは小さな羊を飼っていた雪のように白い毛をして） and everywhere that Mary went, that lamb was sure to go. （メリーさんの行くところはどこにでも付いてきた）

そしてこれが窓の背後から撮した無音の映像から取り出した音声です

Mary had a little lamb whose fleece was white as snow, and everywhere that Mary went, that lamb was sure to go.

（拍手）

限界を押し広げる方法は他にもありますこちらはもっと静かな実験でノートPCに繋いだイヤホンを撮していますこの時の目標は２つの小さなプラスチック製イヤホンを撮した無音の映像からかけている曲を復元するということでしたこれはすごくうまくいって結果から曲名を Shazamで当てることさえできました（笑）

（曲クイーン『アンダー・プレッシャー』）

（拍手）

使用するハードウェアという点でも限界を押し広げることができますここまでご覧頂いた実験はどれも高速度カメラを使っていてこれは通常携帯についているカメラよりも 100倍高速に撮影することができますしかし私たちは普通のカメラでこの手法を使う方法も見つけましたローリングシャッターと呼ばれる技術を利用しています多くのカメラは画像を１行ずつ記録しています１枚の画像の記録中に撮影対象が動くと各行に時間的なズレがあるためビデオの各フレームに小さなゆがみが記録されることになりますこのゆがみを分析したところアルゴリズムを改良すればそこから音を復元できることが分かりましたこれが行った実験でキャンディの袋を撮し横では同じ『メリーさんのひつじ』をスピーカーで流していますが今回はお店で買える普通のカメラを使っていますこれから取り出した音をお聞かせします音にひずみがありますがそれでも何の曲かおわかりになると思います

（曲『メリーさんのひつじ』）

音にひずみがあるにしてもこれが意味深いのは家電量販店で買える普通のカメラでこのようなことができたということです

ここまでご覧頂いたことから多くの人が思い浮かべるのはスパイ活動でしょう確かに誰かをスパイするためにこの技術を使うというのは容易に想像できることですが考えてほしいのはスパイ活動に関しては多くの成熟した技術がすでに存在するということです実際盗聴のために遠くから物にレーザーを照射するというのは何十年も前から行われています私たちの技術が本当に新しく違っている点は物の振動を見る方法を手に入れたということでこれは世界を見る新しいレンズになりますこのレンズを使うと物を振動させる音のような力について学べるだけでなく物自体についても学ぶことができます

ここで視野を広げてこれが私たちのビデオの使い方をいかに変えうるかを考えてみましょう通常私たちは物を見るためにビデオを使いますそれから音を聞くためにも使えることをお見せしましたしかし私たちが世界について学ぶ重要な方法がもう１つあります働きかけることによってです押したり引いたりつついたり揺すったりして何が起きるか見るのですこれはビデオではできないことです少なくとも普通のビデオではこれからお見せするのは最新の研究でほんの２、３ヶ月前に思いついたアイデアを元にしています公の場で見せるのはこれが初めてです基本的なアイデアは映像の中の振動をヒントに物の性質を取り出して働きかけて反応を見られるような形にするということです

これが対象とする物で人の形をした針金人形ですこれを普通のカメラでビデオ撮影しますカメラに特別なものは使いません実際以前は私の携帯電話を使っていました振動する様子を見たいので撮影中に人形が置かれている台の上をちょっと叩いてやります

これだけです５秒間の普通のビデオで台を叩いていますこの映像の中の振動を使って物の構造的・物質的な性質について学ぼうというのですそしてその情報を使って新たなインタラクティブな物を作りますそうしてできたものがこれです何の変哲もない画像に見えますがこれは画像ではなくビデオでもありませんこの人形はマウスを使っていじってやることができるんですご覧頂いているのは目にしたことのない新しい働きかけに対して物がどう反応するかいうシミュレーションですこれをたった５秒間の普通のビデオから作ったんです

（拍手）

これは世界を見る新しい強力な方法です新たな状況に対して物がどう反応するかを予測することができるからですたとえば古い橋を前にして車で渡っても大丈夫か分かりかねているという状況を想像できるでしょうこの質問の答えは橋を渡り始める前に知りたいはずですもちろんこの手法にも限界はありその点は視覚的マイクロフォンと同じですしかしこの方法は予想以上に多くの状況で使え長いビデオがある場合には特にそうです

たとえばこれは私のアパートの前の藪を撮したビデオで私は藪に対して何もしていませんしかし１分間撮している間にやさしいそよ風がこの藪について学ぶのに十分な振動を生み出してくれこのようなシミュレーションを作れました（拍手）この技術を手にした映画監督は映像が撮影された後に風の強さや向きを変えるのに使うかもしれませんこちらでは吊されたカーテンを撮していて動きは見られませんが２分のビデオがあれば室内の自然な空気の対流で生じた気付かないような微かな動きや振動からシミュレーションを作るのに十分な情報が得られます

このようなインタラクティブなものはビデオゲームや3Dモデルの中の架空の物として見慣れていると思いますが現実の世界の実際の物から普通のビデオ映像を使ってこのような情報を引き出すというのは新しいことであり大きな可能性があります

このプロジェクトに一緒に取り組んでいる素晴らしい仲間たちです（拍手）

今日お見せしたものは始まりにすぎません私たちはこのような映像技術で可能になることのほんの表面に触れたに過ぎませんこの技術は誰でも手に入れられる道具だけで周りの世界の違った見方を可能にしてくれるからですこの先この技術が世界について教えてくれることを探求していくのは本当に心躍ることだと思います

ありがとうございました

（拍手）

―　もっと見る　―

―　折りたたむ　―

品詞分類

主語
動詞
助動詞
準動詞
関係詞等

品詞分類表

TED 日本語

TED Talks

関連動画

洋楽おすすめ

RECOMMENDS

洋楽歌詞