Understanding Extended Reality Technology & Data Flows: XR Functions
This post is the first in a two-part series on extended reality (XR) technology, providing an overview of the technology and associated privacy and data protection risks.
Click here for FPF’s infographic, “Understanding Extended Reality Technology & Data Flows.”
I. Introduction
Today’s virtual (VR), mixed (MR), and augmented (AR) reality environments, collectively known as extended reality (XR), are powered by the interplay of multiple sensors, large volumes and varieties of data, and various algorithms and automated systems, such as machine learning (ML). These complex relationships enable functions like gesture-based controls and eye tracking, without which XR experiences would be less immersive or unable to function at all. However, these experiences often depend on sensitive personal information, and the collection, processing, and transfer of this data to other parties may pose privacy and data protection risks to both users and bystanders.
This blog post analyzes the XR data flows that are featured in FPF’s infographic, “Understanding Extended Reality Technology & Data Flows.” This post focuses on some of the functions that XR devices support today and may support in the near future, analyzing the kinds of sensors, data types, data processing, and transfers to other parties that enable these functions. The next blog post will identify some of the privacy and data protection issues that XR technologies raise.
II. XR Functions
XR devices use several sensors to gather personal data about users and their surroundings. Devices may also log other types of data: data about a person’s location when the device connects to GPS, cell towers, or other connected devices around it; data about the device’s hardware and software; and usage and telemetry data. Devices utilize this data and may further transfer it to enable a variety of functions, which are the technologies that power use cases. For example, eye tracking is a function that enables the use case of optimized graphics.
A. Mapping and Understanding the User’s Environment
Sensors on XR devices may work in tandem to collect various kinds of data—such as surrounding audio, the device’s acceleration, orientation, and environment depth data—to map and find objects in the user’s physical environment. Mapping the space entails constructing three-dimensional representations of the user’s environment in order to accurately situate users and content within a virtual space. Understanding the user’s environment involves identifying physical objects or surfaces in the user’s space to help place virtual content. These functions may enable shared experiences and other use cases.
To map and identify objects in the user’s environment, XR devices collect data across a number of sensors, such as microphones, cameras, depth sensors, and inertial measurement units (IMUs), which measure movement and orientation. When a sensor is experiencing a performance problem or certain sensor data is not available, the device may utilize other sensors, which may implicate a less accurate data proxy or fallback. For example, if photons from a depth sensor fail to indicate a user’s position, the device may use an AI system to fill in the sensory gap using pixels closest to the area where the depth sensor directed the photons.
Once the device has gathered data through its sensors, the device and XR applications may need to further process this data to map and identify objects in the user’s physical space. The kind of processing activity that occurs depends on the features a developer wants its application to have. A processing activity that often occurs after a device collects sensor data is sensor fusion, in which algorithms combine data from various sensors to improve the accuracy of simultaneous localization and mapping (SLAM) and concurrent odometry and mapping (COM) algorithms. SLAM and COM map users’ surrounding areas, including the placement of landmarks or map points, and help determine where the user should be placed in the virtual environment. Some types of XR, including certain MR applications, leverage computer vision AI systems to identify and place specific objects within an environment. These applications may also use ML models to help determine where to place “dynamic” virtual content—virtual objects that respond to changes in the environment caused by adjustments to the user’s perspective. These mapping and object identification functions may also allow for shared experiences. For example, in a theoretical pet simulation, multiple users could toss a virtual ball against a building wall for a virtual puppy to catch.
While XR devices generally do not send mapping and environmental sensor data to other parties, including other users, there are a few exceptions. For example, raw sensor data may be transmitted to XR device manufacturers to improve existing device functions, such as the placement and responsiveness of virtual content that users interact with. An XR device may also process and relay users’ location information, such as approximate or precise geolocation data, to enable shared experiences within the same physical space. For instance, two individuals in a public park could interact with each other’s virtual pets in an AR app, with each player using their own devices that recognize the placement of both the virtual pets and the other player in the park. In other situations, certain parties can observe processed sensor and other data, such as an application developer or an entity controlling the external server that enables an application’s multiplayer functionality. Therefore, the nature of the data, device and application features may affect who can access XR data.
B. Controller and Gesture-Based Interactions with the Environment
Some XR technologies gather and process sensor data to enable controller- and gesture-based interactions with physical and virtual content, including other users. Gesture-based controls allow users to interact with and manipulate virtual objects in ways that are more reflective of real-world interactions. Most devices use IMUs and outward-facing cameras combined with infrared (IR) or LED light systems to gather data about the controller’s position, such as the controller’s linear acceleration and rotational velocity, as well as optical data about the user’s environment. Some manufacturers are introducing new data collection systems that overcome other methods’ deficiencies, such as when the controllers are outside of the cameras’ view. When visual information about a controller’s position becomes unavailable, IMU data may act as a fallback or proxy for determining controller location. For gesture-based controls, devices gather data about the user’s hands through outward-facing cameras.
XR technologies use algorithms and ML models to provide controller- and gesture-based controls. In controller-based systems, algorithms use data about the controller’s position to detect and measure how far away the controllers are from the user’s head-mounted display (HMD). This allows the user’s “hands” to interact with virtual content. For example, MR or VR maintenance training could allow a mechanic to practice repairing a virtual car engine before performing these actions in the real world. Gesture-based controls utilize ML models, specifically deep learning, to construct 3D copies of the user’s hands by processing images of their physical-world hands and determining the location of their joints. The 3D copies may be sent to developers to enable users to manipulate and interact with virtual and physical objects in applications through pointing, pinching, and other gestures.
C. Eye Tracking and Authentication
Eye tracking and, to a lesser extent, authentication power a variety of XR use cases, such as user authentication, optimized graphics, and expressive avatars. An XR device can use data from the user’s eye to authenticate the person using the device, ensuring that the right user profile, with its unique settings, applies during a session. Devices may use inward-facing IR cameras to gather information about the user’s eyes, such as retina or iris data, to this end. ML models can then use the collected eye data to determine whether the person is who they claim to be.
Now and in the future, XR devices will increasingly feature eye tracking to optimize graphics. Graphics quality can affect a user’s sense of immersion, presence, and embodiment in XR environments. One technology that can enhance graphics in XR environments is dynamic foveated rendering, or eye-tracked foveated rendering (ETFR), which tracks a user’s eyes to reduce the resolution that appears in the peripherals of the HMD’s display. This allows the device to display the user’s focal point in high resolution, reduce processing burdens on the device, and lower the chance of motion sickness by addressing a cause of latency. XR devices may also facilitate better graphics by measuring the distance between a user’s pupils or interpupillary distance (IPD), which affects the crispness of the images on the headset display. For example, in a virtual car showroom, a device may utilize the above technologies to enhance the detail of the car feature that the user is looking at and ensure that objects appear at the correct scale.
To determine the parts of the HMD display that should be blurred and help focus the lenses, a device may use inward-facing IR cameras to gather data about the user’s eyes. Some XR devices may use ML models, such as deep learning, to process eye data to predict where a user is looking. These conclusions inform what parts of the display the model blurs. In the future, XR devices may use algorithms to more accurately measure the distance between a user’s pupils, further improving the crispness of the graphics that appear on the HMD’s display.
Eye tracking is a subset of a broader category of XR data collection—body tracking. Body tracking captures eye movements (described above), facial expressions, and other body movements, which can help create avatars that reflect a user’s reactions to content and expressions in real-time. Avatars are the person’s representative in a virtual or other artificial environment. Avatars are already featured in popular shared VR experiences, such as VRChat and Horizon Worlds, but several factors limit their realism. Today’s avatars typically do not reflect all of a user’s nonverbal responses and may lack certain appendages, like legs. Going forward, an avatar may mirror a user’s reactions and expressions in real-time. This will enable more realistic social, professional, and personal interactions. For example, in a VR comedy club, a realistic avatar may display facial and other body movements to more effectively deliver or react to a punchline.
To depict a user’s reactions and expressions on their avatar, XR technologies need data about the eyes, face, and other parts of the user’s body. A device may use IMUs and internal- and outward-facing cameras to capture information about the user’s head and body position, gaze, and facial movements. XR devices may also use microphones to capture audio corresponding with certain facial movements, known as visemes, as a proxy for visuals of the user’s mouth when the latter is unavailable. For instance, the sound of laughter may cause an avatar to show behavior associated with the act of laughing.
As with the other functions, body-based insights on XR devices may need to transmit data to other parties, which may use algorithms to process collected data to create expressive avatars. XR devices may transmit data about a user’s avatar, such as gaze and facial expression information to app developers. Developers may use ML models, including deep learning, to process information about the user’s eyes and face to detect the face and make conclusions about where a user is looking and the expression they are making. For facial movements, a deep learning model may analyze each video frame featuring the user’s face to determine with which expressions their facial movements correspond. These expressions are then displayed on the avatar. Devices may then share the avatar with central servers so that other users can view and interact with the avatar.
In addition to avatar creation and operation, future XR technologies may monitor gaze and pupil dilation, motion data, and other information derived from the user’s body to generate behavioral insights. XR tech may be capable of using sensor data that is generated in response to stimuli and interactions with content to make inferences about user interests, as well as their physical, mental, and emotional conditions. When combined with information processed by other sensors, such as brain-computer interfaces (BCIs), these body-derived data points could contribute to the creation of more granular individual profiles and insights into the user’s health. In a medical XR application, for example, doctors could use gaze tracking to diagnose certain medical conditions. However, other parties may use the functions to learn about other highly sensitive information, such as a user’s sexual orientation, which could harm individuals.
III. Conclusion
XR technologies often rely on large volumes and varieties of data to power device and application functions. Devices often feature several sensors to gather this data, such as outward-facing cameras and IMUs. To enable different kinds of XR use cases, including shared experiences, devices may utilize ML models that process data about the user and their environment and transmit this data to other parties. While the collection, processing, and transmission of this data may be integral to immersive XR experiences, it can also create privacy and data protection risks for users and bystanders. The next blog post analyzes some of these risks.
Read the next blog post in the series: Privacy and Data Protection Risks to Users and Bystanders