Microsoft has released a paper detailing its Room2Room project, a holographic video calling endeavor which first emerged in January.
Some weeks ago we first learned about Room2Room, a Microsoft Research project which uses kinect depth cameras cameras and projectors to transfer 3D capture of interlocutors in life-size to the other side.
While current videoconferencing applications like Skype and FaceTime are limited to 2D, flat-screen, reduced-scale interaction, Room2Room projects a full rendering of a remote participant into the viewer’s physical environment, allowing for increased use of nonverbal cues (such as pointing, posture or gaze) and more natural interaction to complete tasks.
Microsoft Research has now published both a first publication draft about the current state of research and the following explanatory video about how Room2Room works in detail.
For Room2Room, Microsoft researchers collaborate with scientists from the University of Wisconsin-Madison and the University of Southern California.
“We have developed Room2Room to enable remote participants, represented as life-size virtual copies projected into each other’s physical environment, to engage in real-time, co-present interaction. Our system does not require participants to wear any specialized equipment, it enables them to move freely and view each other from different angles with correct perspective, and implicitly gives them a common reference space where they can interact naturally using nonverbal cues.”
Researchers involved state they could achieve a better 3D effect through the use of shutter glasses and stereo projectors, but chose to do without wearable devices that could impede nonverbal cues and interactions.
The paper describes the task of capturing people and objects as a three-steps process.
- Background acquisistion captures the depth texture of a static room without non-stationary objects and people inside.
- Foreground extraction identifies objects closer to the camera by comparing captured depth with stored background depth.
- During 3D reconstruction the 3D geometry and appearance of captured persons and objects is rendered on the client´s end, using a GPU shader. A textured mesh for the user is created
from all available data and finally projected into the room.
As Room2Room does not just stream 2D video signals but also depth and color information in real-time, the technology requires substantial network bandwidth. As the Microsoft researchers point out color textures are very large due to their high resolution and need to compressed with JPEG algorithms.
While the system is currently limited to one-on-one interactions in limited physical layouts, Microsoft’s goal is to expand it to allow for more interacting parties in larger spaces.
“Rather than relying on technologies which take the user out of their environment (e.g., collaborative VR solutions), we pursue a vision which inserts remote participants into the user’s environment, exploiting the environment’s affordances, and emulating the experience of face-to-face conversation.”
SOURCE: Microsoft Research.