We begin with a brief look at the landscape of existing file formats. As of this publication, there are no other vendor-neutral storage formats for body tracking data. In its Kinect Studio software, Microsoft provides a facility to record body tracking data using Kinect sensors, but the format is proprietary and can only be played back using Kinect Studio on Windows operating systems. The OpenNI project as well as Stereolabs (the vendor for the ZED series of depth sensors) both have their own recording file formats, but they are geared towards full RGB video data, not body tracking data. There are specialized motion capture formats, Biovision Hierarchy being a popular example, but their format specifications are not open and their underlying assumptions on technical aspects like frame rate consistency do not necessarily translate to the body tracking data storage use case.
To solve this issue, we have developed a new file format for body tracking data with the following quality criteria in mind:
Our resulting design for a file format is closely inspired by the Wavefront OBJ format for 3D vertex data. It is a text-based format that can be viewed and generally understood in a simple text editor. See Figure 3 for an example of what a PoseViz body tracking recording looks like.
It contains a header section that supports metadata in a standardized format. The body of the file consists of a sequence of timestamped frames, each signifying a moment in time. Each frame can contain one or more person records. Each person record has a mandatory ID (intended for following the same person across frames within one recording) and X/Y/Z position. Optionally, the person record can contain whatever data the sensor provides, most commonly including a numbered list of key body points with their X/Y/Z coordinates. The frame and person record may also be extended with additional fields derived from post-hoc data interpretation and enrichment. As an example, the ZED 2 sensor does not provide an engagement value like the Kinect does, but a similar value could be calculated per frame based on the key point data and reintegrated into the recording for later visualization .
The PoseViz file format has been adjusted and evolved over the year that we have been recording body tracking data in our deployment setup . It is expected to evolve further as compatibility for more sensors, body models and use cases is added. We seek collaborators from the research community who would be interested in co-steering this process.
With the file format chosen, there needs to be a software that accesses sensor data in real time and converts it into PoseViz data files. In our toolset, this function is performed by the Tracking Server, named for its purpose to provide tracking data to consuming applications. Its current implementation is a Python program that can interface with various sensor APIs. It serves two central use cases:
The Tracking Server is capable of persisting is recordings to the file system for later asynchronous access, or it can provide a WebSocket stream to which a PoseViz client can connect across a local network or the internet to view body tracking sensor data in real time. As for sensor interfaces, it can currently fetch body tracking data from Stereolabs ZED 2 and ZED 2i cameras via the ZED SDK (other models from the same vendor are untested) or from generic video camera feeds using Google's MediaPipe framework and its BlazePose component. An interface for Kinect sensors using PyKinect2 is in the process of being developed.
In our current deployment setup, we have the Tracking Server running in automatic recording mode as a background process. Between July 2022 and June 2023, it has generated approximately 40 GB worth of body tracking recordings across our two semi-public ZED 2 sensor deployments at University of the Bundeswehr Munich.
The Tracking Server is not yet publically released.
During our first experiments with capturing body tracking data, we noticed very quickly that the capturing process cannot be meaningfully evaluated without a corresponding visualization component to check recorded data for plausibility. The PoseViz software (not affiliated with the Python module of the same title by István Sárándi) is the result of extending our body tracking visualization prototype into a relatively full-featured visualization tool that gives access to a variety of useful visualizations.
On account of being designed to replay body tracking recordings, the PoseViz graphical user interface is based on video player applications with a combined play/pause button, a progress bar showing the timeline of the current file, and a timestamp showing the current position on the timeline as well as the total duration (see Figure 4). The file can be played at its actual speed using the play button, or it can be skimmed by dragging the progress indicator.
PoseViz can be used to open previously recorded PoseViz files, or it can open a WebSocket stream provided by a Tracking Server to view real-time body tracking data. The current viewport can be exported as a PNG or SVG file at any time.
Users can individually enable or disable several render components, including joints (body key points), bones (connections between joints), each person's overall position as a pin (with or without rotation), the sensor at its true position as well as its field of view (provided it is known), as well as 2D walking trajectories and estimated gaze directions. We are working on a feature to display a 3D model of the spatial context of a specific sensor deployment.
The default camera is a free 3D view that can be rotated around the sensor position. In addition, the camera can be switched to the sensor view (position and orientation fixed to what the sensor could perceive) or to one of three orthographic 2D projections.
These capabilities are geared towards initial explorations of body tracking data. Researchers can use this tool to check their recordings for quality, identify sensor weaknesses, or look through recordings for interesting moments.
For most research questions surrounding body tracking, more specific analysis tools will need to be developed to inquire about specific points of interests. For example, if a specific gesture needs to be identified or statistical measures are to be taken across a number of recordings, this is outside the scope of PoseViz and a bespoke analysis process is needed. However, post-processed data may be added to PoseViz files and visualized in the PoseViz viewport – for example, we have done this for post-hoc interpreted engagement estimations (displayed through color shifts in PoseViz).
PoseViz can be used in any modern web browser.1
In this article we have described the PoseViz file format for body tracking data as well as our Tracking Server for recording body tracking events and the PoseViz visualization software for playing recorded body tracking data. Each of these components can only be tested in conjunction with one another, which is why they have to evolve side by side.
This toolset is currently in use for the HoPE project (see Acknowledgements) and is seeing continued improvement in this context. We feel that it has reached a stage of maturity where external collaborators could feasibly make use of it in their own research contexts. It is still far from being a commercial-level drop-in solution, but making use of this infrastructure (and contributing to its development) may save substantial resources compared to implementing a full custom toolset. Potential collaborators are advised to contact the author.
The intended next step for the toolset is an expert evaluation. Researchers who have previously worked with body tracking data will be interviewed about their needs for visualization tools, and they will have an opportunity to test the current version of PoseViz and offer feedback for future improvements.
Thank you to Jan Schwarzer, Tobias Plischke, James Beutler, and Maximilian Römpler for their feedback and contributions regarding PoseViz and body tracking data recording in general.
This research project, titled “Investigation of the honeypot effect on (semi-)public interactive ambient displays in long-term field studies,” is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 451069094.