How MR Works In Serious Sam VR

Introduction
This article explains the work it took to implement mixed reality into our game using our own engine. Apart from using the approaches already taken by some other teams, we have developed some new techniques that work great to solve latency issues as well as make camera calibration really simple for the user. We haven’t found anyone describe such solutions to these common problems, so our implementation should be interesting both to people new to the subject and to the people who already did mixed reality and want to improve their approach. 
 
How it all started for us
Mixed reality was already found to be a great way to demonstrate VR experience outside of VR. Naturally, we also wanted to use mixed reality for promotion of Serious Sam VR: The Last Hope. We started implementing mixed reality two weeks before the EGX 2016 show so we could be featured on the Vive stand, capture some promotional footage as well as show off our game in all its glory featuring the physical minigun controller. 
 
The people from HTC already had some experience with mixed reality video and they gave us the requirements for their mixed reality mixing and recording setup. We were supposed to render multiple views from the game, the foreground, the background view etc. All of that plus the chroma keyed player would be mixed in the open source software OBS [1] and videos could be captured. This approach was described in a great article by Kert Gartner [2] and that was our starting point for the research and implementation.
 
Since we use our own Serious Engine, it was fairly straightforward to implement separate rendering of foreground/background view from the spectator camera that could be controlled by a Vive controller (we went for the moving spectator camera from the start). As soon as we did that, we knew we weren’t happy with the approach since it suffered from multiple problems:
 
* There would be sorting issues when enemies, projectiles and particle effects get close to the player.
* Performance would be bad since we had to render the scene several times more than necessary (2 times for the eyes plus at least 2 times more for the foreground and background spectator views).
* We couldn’t light the player with in game lighting.
 
We knew that in order to solve these problems we had to find a way to put the player inside the game instead of relying on mixing software. For that to work we needed to capture the video and feed it into our own engine. After some more research we have found that we weren’t the first ones to come up with that approach. There was a great article by Shaun McCabe [3] that described exactly what we wanted to do.
 
Getting the player inside the game
In [3] Shaun described how they used the example code from Microsoft Media Foundation [4] to capture the video so we implemented video capture in our engine. It worked out nicely using the Logitech HD WEBCAM C270 we had in our office. The video was not good enough for promo recording but it was a start to get a prototype working.
 
The video was uploaded to the GPU where we used a shader to perform chroma keying the same way it is done in OBS (because people were already experienced with the parameters in OBS).
 
In order to make the player from the video visible inside the game, we used a simple technique of rendering a single camera facing rectangle that was alpha keyed (matching the chroma keying). That rectangle is centered around the player’s position in the scene (determined by the tracked headset). Parts of the scene that aren’t around the player are cut away (Garbage matte technique) – very useful in our makeshift studio with green screen that didn’t cover much of the scene.

,

,

Streamlining the camera calibration
Even in the early prototype version, we were really frustrated with the process of manual camera calibration – tweaking the offset parameters (heading, pitch and banking angles, as well as the linear offset) so we could match the position of the physical camera (one providing the video feed) with the in game spectator camera (one rendering the game world). In order to make it easy on us and especially on the people at the trade shows, we came up with an approach that didn’t require any manual tweaking.
 
The idea was simple (but it took a while for it to mature): while viewing yourself in camera feed, simply mark five points displayed on the screen by placing the Vive controller so the center of the hole matches the crosshair on the screen and press the trigger button to take the sample (see image below).

,

,

This is repeated when standing closer to the camera and then further away from the camera. This methods yields five rays in 3D space (ray goes from the far away point through the near point taken for the same sample on the screen) for five known positions in screen space. These information is then used to find the camera position, FOV and rotation angles (relative to the attached controller or relative to the scene origin if extra controller is not used). The involved math will be described in a separate article as this approach should be useful to many developers.
 
This method was a lifesaver, especially for people that were supposed to set it up at the show and were struggling even to get a near match with manual tweaking. This method, if done correctly, by precisely aligning the controller to match the on screen points, yielded results that were a perfect match in less than a minute! Calibration results can then be observed by simply comparing the size, position and orientation of the Vive controllers rendered in game with the Vive controllers in the video.

,

,

Fixing the video delay
Apart from camera calibration, we encountered another interesting problem: in game weapons and controllers were rushing ahead of player’s hands in the video. This was not a problem in sessions involving the physical minigun since the in game weapon was not displayed in mixed reality and the physical minigun was perfectly following the physical player. We knew we wanted to record promotional videos of players wielding the in game weapons and that it would be the most useful for the players who wanted to stream the game once the game was out (not everyone has our physical minigun). Due to the inherent delay of capturing the video and uploading it to the GPU there was a noticeable delay between player’s hands and the controllers or weapons in game and that didn’t look good. This looked even worse when camera (with attached controller) was moving. To fight this problem, we came up with an idea of delaying the controller input so it matches the video feed. This of course made the player feel weird, so instead we just delayed the controller/weapon placement in mixed reality view! This way the weapons were perfectly matched to player movements in the video stream and it appeared as if player was really holding the weapons (at least ones with realistic dimensions). And the delay (around 60 milliseconds in our final setup) was not big enough to notice that the bullets and projectiles are not exactly matched to the delayed weapons.

,

,

Recently we have learned about the method of using the stereo depth camera [8] which makes it possible to add depth to the player in mixed reality. Now we can’t wait to get our hands on such camera so we can do even better!

 

Source: Croteam

more insights