Wednesday September 21, 2016

Anders Bo Pedersen - The Eye Tribe at CopenX, September 2016

Next Gen Input Modalities in VR/AR. Today, user input in VR is mainly driven by head position tracking and handheld pointing devices. Question is, if these controls are the most intuitive way for a user to interact in VR/AR if such experiences are ever to become truly ‘immersive’? In this talk, Anders Bo Pedersen will present some of the ‘next generation’ VR/AR input controls currently in the works globally. He will give a detailed presentation of eye tracking in VR and discuss how this technology combined with other emerging VR/AR technologies is part of a truly multimodal virtual experience. Lastly, he’ll give his view on what the next generation of VR/AR devices might bring in regards to user control.

Anders works with emerging technologies and explore their role in immersive digital experiences. He was an early adopter of mobile development in the 00’s and have now shifted his focus to what he believes is the next technological breakthrough, VR/AR.

Today he works with VR/AR input modalities at THE EYE TRIBE. The Eye Tribe is a Copenhagen based startup developing eye tracking technology for integration into a wide range of products like mobile devices and VR/AR headsets. Eye tracking in virtual environments enables features like intuitive interaction, gaze driven optical effects and foveated rendering.

Anders holds a M.Sc from the IT University of Copenhagen, is the founder of several startups and plays an active role in the local startup community.

Tags:

Digital

Technology

View transcript

Hello everybody, I'm Anders Bo-Petersen. I come from Danish startup, The iTribe. The iTribe does affordable eye tracking for mass market products. We've done so for several years now. We have developed eye tracking technology that could integrate into laptops, mobile phones, as David just said, and now we're focusing on virtual reality. And slides would be nice now. Any minute. There we go. So what I'm here to talk about today is having a look at the current input modalities that is in this first generation of virtual reality. And then try to take that a step further and look at what's coming. And lastly, talk about where eye tracking in virtual reality fits into all this. So I've just introduced myself, so let's skip that one. So what you see here is state of the art in the first generation of virtual reality. From the top, the HTC Vive, the Oculus Rift, PlayStation VR, and the OS VR. So when you look at these, you can see one thing that they have in common. They all have an external tracker to drive their head tracking and their controllers. Head tracking in first generation virtual reality is really good. It's better than expected, at least what I expected. But it does have some drawbacks. So first of all, there's the notion of having an external tracker. Whenever you set up your system, you also need to set up external trackers. These external trackers have a field of view. If you step outside that field of view, tracking is gone. Also, these first generation systems are not able to support several users in the same room. So there's no notion of multiplayer support unless it's over network and people are standing in separate rooms. Lastly, there's one thing about head tracking that I find incredibly unintuitive. And that's the fact that you have to move your head, controlling a reticle, moving that over buttons that you want to select. That's not an intuitive UX paradigm, and I think we can do better. It gets a little bit better when you then get to the handheld controllers. It is more intuitive to use your hand to do interaction, but these controllers use the same external trackers as the headset tracking, meaning that you can only use one headset per system. If somebody steps in front of an FOV of the external tracker, then it can't track you anymore. So again, dependency on an external tracker. And question is also, again, if interaction with these devices is intuitive. When you hold a handheld controller, point it at something, and press a button to pick something up, is that intuitive? I think we can do better. Just so you know it, this is the state-of-the-art first generation. Could you maybe please turn off sound? Not for me, but the videos. This is pretty interesting because this is the up-and-coming Oculus Touch showing multiplayer features in this first generation of VR. I just said that that's impossible, but that's because this is Toybox by Oculus, where two people are standing in two separate rooms with each their system and using the system. So in that sense, multiplayer support is possible. So instead of looking at the current generation input modalities, let's have a look at what's next. As always, the dear people of Silicon Valley are inventing new stuff, creating cheaper, better sensors all the time. Virtual reality is going to earn a lot of people a lot of money. That means that several smaller companies are looking into all these sensors and the newest versions of them, seeing how they use less power, how they're becoming cheaper. And all these companies are building incredible products because they want a piece of the pie. Let's have a look at some of those products. So without a doubt, a big sub-industry in virtual reality will be social VR. Many people are doing great work there, Allspace VR, for instance, but stuff is missing. You can move around VR right now and hold the controllers. You're able to do limited body language. Imagine having a system that is able to track your facial expressions also, track when you're sad, track when you're happy. You're glad to see somebody in virtual reality. How would you show that? Several companies are working on that using, as you see here, either pressure pads inside the mask or simply having a camera underneath the mask that can film your mouth. And with that, we can have facial expressions in virtual reality, which looks like this. Let's try that again, which looks like this. So imagine having this in next generation VR and being able to show people that you meet in virtual reality, that you're actually happy to see them all the way around. Something else that's up and coming is support of hands and gestures. Imagine a headset that's able to see what's in front of it. So in close vicinity, it'll be able to see your hands as you hold it in front of the device. But why is that smart? Well, I was just arguing that interacting using head tracking or handheld controller is not intuitive. Picking up something with your fingers or interacting with your hands is the most intuitive way to interact with things that a human being can do. Therefore, I strongly believe that what we see here is something that will be in the next generation VR. I'll get more into why later. The last thing that I want to talk about before we move to eye tracking is spatial mapping. So imagine having a mask that's able to see the environment around it. Imagine that there'll be no need for having an external tracker or several mounted in the room in which we use virtual reality, because instead of the environment knowing the mask, the mask knows the environment. That's sort of inside out head tracking. Of course, this is needed in augmented reality. In augmented reality, we'll need to know the surfaces in the room in order to be able to place the augmented layer on top of all those surfaces, placing a cup on a table, etc., or as in this example, playing Minecraft in the living room. For those that have not yet seen the video, this is an example of the HoloLens running doing its spatial mapping. The grid that you see every time the guy in the video does a gesture is a visualization of the point map that this headset generates real time, all the time. That's a truly amazing product. So predictions for the next generation of virtual reality. I predict that we're going to see some notion of depth sensors in the next generation VR. Whether or not it'll just be hand tracking or if we'll have full spatial mapping, that's unclear. So I'm going to go ahead and start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. So we're going to start with the first one. Google has a project called Tango, and they acquired more videos a few days ago. Google has a project called Tango, and they have been running that for several years. That's a project that does depth sensing inside mobile phones. I know that the hand tracking video I showed a few slides back is from a company called Pebbles Interfaces. That company was acquired by Oculus. All in all, all the big boys in the Valley are focusing on these technologies and on these sensors, so I think we're going to see something in virtual reality very soon. If we're going to have depth sensors or even spatial mapping inside the masks, that means that, as I said, there will be no more need for external devices tracking you. It will be the other way around. The headsets will track the environment. That's a very interesting thought, because that brings me to the last topic. The industry wants to get rid of cables. Imagine not being tied down to a super powerful desktop when you use virtual reality. I know several companies are focusing on doing computer backpacks and whatnot at the moment, but imagine that the cables are going all together because the processing units and the batteries are inside the mask. Wouldn't that be hella cool? And that brings me to the last topic here, and that's eye tracking in VR, because getting rid of those cables is something that eye tracking can help with. Let me explain why. But before doing so, a quick run-through of what eye tracking actually is. Many of the technologies I just showed you in some way rely on cameras of some sort. Some on depth sensors combined with cameras, eye tracking is no different. To do eye tracking, you need a camera and a light emitter. On top of that, you need software detecting where the pupil is, and based on that, you can do gaze-sets matching, so estimating where you look all the time. In virtual reality, that amounts to placing one or two cameras inside a mask. You can see here on the image that the camera is placed on the top. It reflects in a mirror and is able to see your eyes, and infrared light emitters projects light into your eyes, creating reflections. And you get a frame rate between 300 and 400 frames per second. But let's not focus on all those technical details. Let's instead talk about what eye tracking can do in virtual reality. Naturally, eye tracking in virtual reality brings another level of intuitive interaction. When somebody presses a button in real life or in virtual reality, they look at it first. Whenever somebody picks up a ball or interacts with things in general, they look at it first. So with eye tracking, we open up for a totally new way of interacting in virtual reality. And to add on that, eye tracking works well with some of the technologies that I just showed. So the level of precision that's currently in the hand tracking I showed you is not always perfect. Imagine combining those eye tracking with that input modality so that the system knows that you are looking at the ball before you try grabbing it. Then we would have a system that with higher success rate would be able to predict the user's intent. Eye tracking virtual reality also adds something else that's also interesting. Getting back to the social VR. Imagine, like I said, meeting other people in social VR and being able to see what they look at. So being able to see who or what they look at. Think about that for a moment. Another thing that eye tracking brings to virtual reality is the notion of auto configuration. If you've tried virtual reality, you know that it involves manually setting the interpupil distance. And that's also the eye relief that needs to be set for many people. Maybe people wearing glasses, especially. Imagine being able to just put on a mask and the notion of the mask doing all these things automatically because it knows the distance between your pupils. Also think about what this means in regards to 3D rendering. If a 3D rendering engine, we have many people who are using it. Imagine that Unity would be able to tell the distance between the virtual cameras inside the 3D world automatically driven by eye tracking. That would cause less nausea for most users. The product that I tried, the company that represents SHIP, also have a unique authentication method. Meaning that when you put on the mask, we are able to identify you, lock you into a system that is able to identify you, lock you into a system right away and load your customized settings. In a multi-user scenario, this is interesting, but it's also interesting in general in regards to simply having a secure system. I'd like to add that our authentication method is based on real eyes, meaning that you cannot hold up a photo in front of the system and it would then authenticate. That's simply impossible using our system. Lastly, I want to talk about the big one in regards to eye tracking virtual reality. How many in here have heard about foveated rendering? So, foveated rendering is quite the buzzword in the industry, so I'll try to give the most basic explanation of what it is now. Foveated rendering is the process of rendering a 3D scene in super high quality, only where the user is looking. That means that a scene is rendered in high quality in the user's focal area and in the peripheral side of the user. All details in regards to resolution, quality, light calculations, whatnot, will be turned down to a lower setting and the user will not even be able to notice it. They'll not be able to notice it because it's in their peripheral side, and it actually gives a far more natural experience in virtual reality. The very positive side of this is not necessarily the fact that the experience becomes better. It's the fact that we theoretically can reduce the load on the GPU between four and eight times. Factor eight people. This is, of course, very, very, very important for the industry because the industry right now is struggling with having powerful enough GPUs to even render in the two screens in the first generation products I was telling you about a minute ago. Next generation headsets, they want to put in two 4K screens in the masks. No system right now is able to run that. Okay, there will be systems out there capable of running that, but no normal person machine would be able to run that without something that would critically reduce the load on the GPU, and that is exactly what foveated rendering does. So if the industry wants the high resolutions in order to reduce nausea in the virtual reality experiences, if they want to reach a stage where they can build in the processing units together with battery in the mask themselves and get rid of cables, they will need foveated rendering. And our technology enables that. So summing up, eye tracking brings a new input modality. It can converge with all the other existing input modalities and simply create a better user experience. We have automated configuration of the headset. You put it on, it configurates itself and locks you in and loads your user-specific settings. And lastly, foveated rendering will hopefully help shortening the path to getting rid of cables, but for now it would be able to bring you, the users, faster frame rates and better experiences. And now I believe we're going to see if the demo cards would allow a live demo, but before, while I set this up, I have a fallback slide that shows foveated rendering in action. There we go. So what you see here is a normal 3D scene running and in a second foveated rendering will be turned on. So inside the center circle now, the scene is rendered in normal quality and as we move to the outer circles, quality is reduced. Right there to the naked eye, when you see the full FOV, it's pretty obvious that the resolution in the outer circles is lower, but inside the mask, when eye tracking is running, you will not be able to notice. And I'm just told that the feed is not working from this computer, so instead, if this talk interests any of you, come meet me at the table in the break and you can have a go at the system. That's all I had. Thank you very much. That's technology, right? I know it's not working every time, but like you said, you can try it outside and I think that's a really good idea. Just tell me this foveated rendering, when will that be possible? When can I try this at home? So first of all, when it comes, you'll not be able to notice because it will be a built-in feature, right? But I think without mentioning anything critical, it will come very, very soon because right now we are at a stage where the processing power of the GPUs cannot follow the desires of the industry in regards to getting higher resolutions inside the mask. I just mentioned 4K inside the mask, but the industry wants to go all the way up to 16K because that's where you want to go if you want to have screens so close to your eyes without pixelation happening. So it's going to happen and several key persons in the industry have been out and said that we'll never get high resolutions in the mask unless we have foveated rendering. Okay. We're looking forward to that. Anders, thank you.