Exploring Mobility Behavior Around Ambient Displays Using Clusters of Multi-dimensional Walking Trajectories
#Abstract
Spatial information has become crucial in ambient display research and helps to better understand how people behave in a display’s vicinity. Walking trajectories have long been used to uncover such information and tools have been developed to capture them anonymously and automatically. However, more research is needed on the level of automation during mobility behavior analyses. Particularly, working with depth-based skeletal data still requires significant manual efort to, for instance, determine walking trajectories similar in shape. To advance on this situation, we adopt both agglomerative hierarchical clustering and dynamic time warping in this research. To the best of our knowledge, both algorithms have so far not found application in our feld. Using a multi-dimensional data set obtained from a longitudinal, real-world deployment, we demonstrate here the applicability and usefulness of this approach. In doing so, we contribute insightful ideas for future discussions on the methodological development in ambient display research.
#CCS Concepts
• Human-centered computing → Ubiquitous and mobile computing design and evaluation methods.
#Keywords
Ambient displays, walking trajectories, agglomerative hierarchical clustering, dynamic time warping, mobility behavior
#ACM Reference Format:
Jan Schwarzer, Julian Fietkau, Laurenz Fuchs, Susanne Draheim, Kai von Luck, and Michael Koch. 2023. Exploring Mobility Behavior Around Ambient Displays Using Clusters of Multi-dimensional Walking Trajectories. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA ’23), April 23–28, 2023, Hamburg, Germany. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3544549.3585661
#1. Introduction
With the emergence of the post-desktop era, spatial information has become crucial in the process of designing and evaluating ubiquitous technology [3]. Unsurprisingly, research on large and interactive displays, or ambient displays, has been increasingly embarking on considering interaction in a deployment's wider context [17]. Motivated by advances in motion sensing hardware such as time-of-flight cameras, research has begun leveraging computer vision (e.g., [18]) and depth-based data approaches (e.g., [8]) to investigate how people move through the space in front of real-world deployments. In doing so, more can be learned about, for instance, interacting and non-interacting users [17] as well as real users and simple passersby [14]. Essentially, studying spatial aspects enables the creation of novel tools, methods, frameworks, and theories for future research [3].
Ambient displays are typically investigated through an understanding of their audience behavior. Behavior, here, refers to a performance of some kind such as people approaching a display or moving around it [17]. Research on ambient displays has long been investigating low-level human activities using walking trajectories—a shape-based pattern showing where and when people walk, typically as a top-down 2D projection—as an analytical tool [10]. Walking trajectories have proven so valuable that systems to capture them anonymously and automatically have been proposed (e.g., [14, 17]). These systems are more cost-effective than manual observation [8], easily adapted to other deployments, and readily integrated with other methodologies [17]. Yet, few tools currently exist to track mobility patterns in the wider space [4] and it is called for new tools to do so [3]. Studying ambient displays in the wild remains a challenging endeavor [8, 18], including their audience behavior [4].
Our research is driven by the fundamental interest of identifying viable means that can aid the automatic exploration of said behavior and that can, simultaneously, extend meaningfully the existing repertoire of qualitative (e.g., observations) and quantitative methods (e.g., interaction logs) for evaluations in long-term field deployment studies. Recently, related research has begun developing tools that, to varying extents, distill walking trajectories from depth-based skeletal data (e.g., [8, 14]). Skeletal data contains anonymous multi-dimensional information of one to many joint locations chronologically tracked in a 3D space. It has the advantage that it provides more compact and efficient information when compared to depth images [9] and, simultaneously, due to its nature, ensures the privacy of passersby [8]. While skeletal data has been used in past research, it has so far usually acted as supporting evidence [8] and has required significant amounts of manual effort [8, 14]. Particularly the analysis process, as we found, requires higher levels of automation. Consequently, tasks such as readily determining similarities in mobility behavior or easily deducing conclusions regarding dominant patterns in a data set are challenging from the outset. We argue that this lack of automation prevents skeletal data from being used effectively in long-term field deployment research to, for instance, serve as a useful addition to existing methods such as interaction logs, interviews, and observations. To improve this situation, in this research we adopt in tandem two well-known time series algorithms, agglomerative hierarchical clustering (AHC) [11] and dynamic time warping (DTW) [19]. Recently, the combination of both algorithms has received increasing attention, in part due to its superior clustering performance when compared to related approaches such as k-means [2]. Based on a real-world data set, we demonstrate here how these algorithms were successfully applied to automatically cluster walking trajectories similar in shape and discuss how this approach can be helpful in future endeavors.
#2. Background
Our research concentrates on recent work considering ambient displays embedded in a wider context. Initially, we seek to understand how these studies used walking trajectories as an analytical tool for mobility behavior. Then, attention is drawn to the challenges associated with skeletal data and how the underlying analysis process can be automated.
#2.1. Walking trajectories: a display for mobility behavior
To explore mobility behavior, researchers in the mid-2010s started using camera sensors. For example, Williamson and Williamson [17] leveraged walking trajectories to investigate pedestrian traffic around a public display installation. With the help of motion detection techniques, walking trajectories were extracted to conduct behavioral analyses of non-interacting and interacting users. Their inquiry was geared towards learning more about how technology changes public spaces. A few years later, Williamson and Williamson [18] took on the challenge of how experimenter roles affect evaluations of ambient displays in the wild. Walking trajectories acted as a tool to visualize pedestrian traffic in different observer evaluation setups. The authors considered pedestrian tracking essential in quantifying the observer effect in public evaluations. Elhart et al. [4], as a further example, were motivated by the lack of existing accurate, low-cost tools to track audience mobility. Their custom tool, similar to Williamson and Williamson [17], used walking trajectories to distill how people approached their display installation and how they moved through the space in front of it. According to Elhart et al. [4], the ability to capture mobility patterns such as walking trajectories is vital for evaluations. In a similar manner, Dalton et al. [3] showcased a tablet-based app able to record live mobility behavior in order to promote a better understanding of interaction in the environment. However, instead of relying on camera technology, the app provided a user interface to sketch walking trajectories with a finger or stylus manually. The authors argue that automatic approaches have their limitations when it comes to indoor location finding, while the app allowed for placing the behavior location at the center of observation. Contrary to the previous studies, Monastero and McGookin [10] used actual floor projections of walking trajectories to visualize other people's presence and activities. Their research was aimed at investigating how people's social awareness is affected by displaying mobility patterns of others in situ. They found that many uses of their floor projections were related to sociality and concluded that they enhanced both the curiosity and the connection with the lived environment.
Finally, studies focusing on skeletal data to distill mobility behavior are more scarce. To our knowledge, the studies by Mäkelä et al. [8] and Schwarzer et al. [14] are the only recent examples embarking on this endeavor. Mäkelä et al. [8] introduced a custom process to gather and scrutinize skeletal data and described in detail the individual steppingstones to transcend raw data to interpretable pieces of information. Walking trajectories were applied to, for instance, determine entry and exit directions of passersby. Schwarzer et al. [14], on the other hand, conducted an investigation of spatial and temporal audience behavior. Illustrations of walking trajectories were used extensively to, for example, contrast real users and simple passersby as well as highlight areas with strong user engagement.
#2.2. Automating the analysis process based on skeletal data
Existing research [8, 14] illustrates in great detail how to analyze skeletal data from the standpoint of potential research questions or assumptions about the data. These approaches, however, require becoming well familiar with a data set, involving considerable manual effort such as laboriously going through the data by hand. At the methodological level, techniques are required that allow to examine a data set without pre-existing knowledge and that can rapidly provide insights at first glance. These techniques would help with tasks such as gathering knowledge about mobility behavior more quickly or designing interviews and observations more effectively in mixed-methods research.
The situation described has led us to explore ways to automate the analysis process in our research. Ultimately, we drew attention to time series clustering and AHC in particular. Time series clustering is one of the most important and useful means to analyze walking trajectories [15], whereas shape-based clustering algorithms, building on distance measures such as DTW, are especially increasing in popularity as of late [7]. Both AHC and DTW have been successfully applied in combination to analyze trajectories relating to a variety of problems such as the flyways of birds [6], household electric load curves [2], or heat exchanges during a cooling season [19]. Research has demonstrated that this combination can outperform related approaches such as k-means, k-medoids, and gaussian mixture models [2]. In addition, AHC's unique hierarchical feature enables domain experts to choose at which level clusters make sense [6], while DTW improves the clustering performance compared to distance measures such as Euclidean, Manhatten, or Cosine [2]. Given this recent development, we decided to adopt both algorithms in the present research.
#3. Method
An existing skeletal data set is used in this study [14]. Below, we initially elaborate on this data set and, then, draw attention to the AHC and DTW algorithms. Lastly, we explain the ways in which the algorithms were evaluated.
#3.1. The skeletal data set
The data set originates from a multiple-year deployment of two custom ambient displays in a professional agile software development environment. It incorporates information of more than 30,000 passersby gathered with two Microsoft Kinect v2 cameras throughout 4.5 months in 2017. In total, the data set consists of over 23,000 individual text file records. People's mobility behavior is manifested in the individual frames of a record that were tracked chronologically in a 2D space (excluding values from the y-axis). A frame consists of a timestamp, a body tracking id, a record id, an inverted x-coordinate (the camera's point of view), a z-coordinate, and a value of the .NET framework's engaged property. Analogously to a definition by Shivanasab et al. [15], the term walking trajectory, consequently, refers here to a two-dimensional shape that follows the x and z coordinates of these frames.
We chose to use a subset of this data set, because we wanted to primarily focus in this work on the algorithmic approach. Ultimately, we selected records that were tracked with one of the two aforesaid Kinect sensors and records representing single user cases, resulting in a total of 9,425 records. However, to extract some form of meaningful mobility behavior, we assumed that people needed to be tracked for a sufficient period of time. Thus, we increased the minimum number of frames required for a record to 63 frames (roughly 2 s), leading to a total of 3,523 remaining records (circa 37%, 40 MB in size). Each of these records was assigned a unique identifier and an image file depicting the record's walking trajectory. We left the raw data as is and did not apply any filters.
#3.2. Shape-based clustering algorithm
As will be illustrated in more detail below, we chose DTW because it enabled us to assess individual walking trajectories in terms of shape (dis-)similarities, while AHC allowed us to segment them into coherent groups.
#3.2.1. Dynamic time warping
DTW is an elastic measure and, in contrast to lock-step measures such as Euclidean distance, can deal with temporal drifts in time series (see Figure 1) [1]. It computes a non-linear correspondence between elements of time series [9] and allows for determining similarity in shape [2]. DTW, in fact, next to Euclidean distance, is one of the most commonly applied similarity measures in time series clustering [1, 7]. We consulted a study by Riofrío et al. [12] to implement DTW in this research. In brief, the DTW algorithm works as follows. Considering the two-dimensional time series and with and , a distance matrix with the size of elements is initially created. For each element, the distance is computed using the equation in (1) for -dimensional data (in our case: ) [5].
Next, the DTW algorithm calculates a cost matrix that allows us to determine the alignment costs between two time series. To this end, the equation in (2) is utilized, which computes these costs for each element . The cost matrix helps with finding the so-called warping path , which is the path with the lowest alignment costs. It is obtained by moving through the cost matrix in reverse order (i.e., ). Its individual positions are then applied to the distance matrix to yield the overall alignment costs (i.e., ). Subsequently, the alignment costs in are averaged as shown in equation (3), whereas refers to the total number of elements in the warping path. The lower the value, the more similar two time series are considered in their shape. Finally, after all distances are computed according to equation (3), an updated distance matrix is passed to the AHC algorithm.
#3.2.2. Agglomerative hierarchical clustering
AHC is a variant of hierarchical clustering that creates a binary, rooted hierarchy of clusters from the bottom up, meaning every data point, or walking trajectory as in our case, is a cluster at first [1]. It is able to work with time series of arbitrary shapes and remedies the challenge of poor initial clusters [2]. In a nutshell, the algorithm builds on the computed distance matrix from the DTW algorithm and, then, recursively applies a linkage criterium to update the distance matrix by merging elements with the shortest distance until no more new instances are left to be merged [2]. AlMahamid and Grolinger [2] recently achieved the best clustering performance with the UPGMA (unweighted pair group method with arithmetic mean) linkage criterium in AHC, hence we decided to use it in our research as well. The UPGMA algorithm was implemented following to the equation given in (4). Here, each distance () between a cluster, for instance, and (), and a new cluster , is the result of proportional averaging the distances of and . At each iteration, computed cluster distances are stored temporarily to be ultimately included in a dendrogram relating each and every cluster to one another.
#3.3. Evaluation
Evaluating extracted clusters without assigned data labels is challenging and there is still no universally accepted technique, neither visually nor numerically, to do so [1]. However, as Aghabozorgi et al. [1] note, labeling by a human judge can capture an algorithm's strengths and shortcomings as ground truth in practice. Because AHC has a great visualization power [1], we chose to assess the performance of our implementation visually for the context of this research. To this end, we randomly selected a subset of 352 records from the entire data set (roughly 10% of records). We did so primarily to keep manual efforts (e.g., comparing walking trajectories by hand) and the overall computational costs to a reasonable minimum. As central visual tools for our assessment, we used the AHC algorithm's dendrogram (see Figure 2) and visualizations of walking trajectory clusters (see Figure 4).
#4. Results
The dendrogram in Figure 2 suggests that the data set contains one large, homogeneous group of walking trajectories. Specifically, a lot of trajectories were clustered at around the smallest computed distance of 0.40. Only a few stand in stark contrast to the aforesaid group. In one instance, the difference reaches as much as roughly 929 times the value (371.75). Apart from that, there is no other comparably homogeneous group observable in the data set.
In light of this observation, we then experimented with different cutoff levels in the dendrogram. To demonstrate better, how we went about during analysis and how the algorithm successfully distilled mobility behavior patterns, we, in the end, decided to define three cutoff levels: 12, 55, and 75 clusters, respectively (see Figure 2). While this decision is not conclusive in a mathematical sense, we did so, because, on the one hand, with a total of 12 clusters, intra-cluster distances were reduced notably by about one-sixth, or more. At this cutoff level, we were also able to present a mixture of clusters as an example (see Figure 4). On the other hand, we wanted to vividly illustrate the algorithm's ability to isolate the dominant mobility behavior mentioned before. With the cutoff levels of 55 (see Figure 3b) and 75 (see Figure 3c), were able to do this in an exemplary manner.
Our analysis led to the following conclusions. First, and foremost, the AHC algorithm was able to identify and group together two-dimensional walking trajectories with (dis-)similarity in shape. In our example, it became evident that, as intra-cluster distances gradually minimized, the algorithm increasingly better isolated the mobility behavior selected (see Figure 3). Therefore, what has been observed in the dendrogram initially, now became substantiated as the algorithm correctly merged the many walking trajectories recorded near people's main walking path to the left of both display installations. In this area, both Kinect sensors tracked the most people [14]. Second, at any given cutoff level, the algorithm is able to suggest potential outliers and clusters requiring further examination as shown in the example in Figure 4. Outliers, meaning one to many walking trajectories with rather large inter-cluster distances, can be quickly identified and may point to a unique, novel, or uncommon behavior. For the data set at hand, outliers are, for example, clusters C10 and C11 for which we found no other incidents similar in shape in the data set. On the contrary, cluster C1 indicates that a higher cutoff level would be necessary to unveil patterns underlying this cluster.
#5. Discussion
The present study is the first in its field to adopt both AHC and DTW in an effort to perform a data-driven identification of patterns in skeletal data. By demonstrating both viability and usefulness of this approach, we make a contribution towards the methodological framework for quantitative analyses of mobility behavior in an ambient display's wider context. Specifically, we add to existing research [8, 14] by proposing an approach able to (a) assist in evaluating skeletal data without any pre-existing knowledge and to (b) automatically suggest (dis-)similarities in people's mobility behavior based on walking trajectory characteristics. In our view, fully or partially automated clustering is an indispensable tool for drawing conclusions from larger data sets, which may involve many thousands of individual records. Such conclusions can guide, for instance, mixed-methods field deployment studies at the outset to design interviews and observations. Similarly, during later stages of a research endeavor, clustering information can be used to cross-validate findings from other methods. Throughout our research, we experienced that both the dendrogram and the visualizations at different cutoff levels assisted greatly in becoming familiar with the data set at hand more quickly. While our approach is not a replacement for the visual analysis by human experts, we show that it is of value in tandem with, or as a precursor to, manual analysis. In perspective, research like ours may aid embarking on understudied issues such as how usage changes over time in longitudinal deployments [10]. Furthermore, while we worked with one specific format of skeletal data provided by the Kinect camera, the AHC algorithm can be handily applied to formats of location tracking data drawn from other depth cameras (e.g., Stereolabs ZED 2). In such instances, adjustments only have to be made in terms of selecting the correct axes as input parameters. Skeletal data, such as in the case of the Kinect sensor, may also contain additional information beyond joint coordinates (e.g., user engagement). Such information can be equivalently used in the AHC algorithm to cluster data accordingly.
Finally, attention is drawn to research limitations. First, AHC and DTW have a computational complexity of and , respectively [1, 16]. Both algorithms cannot deal with very large data sets effectively [1, 7] hence we are examining means to optimize their computational runtimes. Previous research [16], for instance, recommends using a warping window parameter in DTW that lowers its overall complexity to . Second, walking trajectories by themselves evoke ambiguity in leaving the reason for the movement open to interpretation [10]. Therefore, qualitative methods such as interviews may be additionally required to cross-validate observations. Third, the quality of our evaluation depended greatly on our subjective judgment. To address this issue mathematically, we are planning to incorporate measures such as within-cluster sum of squares and the silhouette coefficient. In doing so, we prospectively will be able to determine the optimal number of clusters [2, 19]. Forth, our evaluation was conducted with only one particular data set. More examinations with other data sets are required to profoundly underline the algorithm's clustering performance. Fifth, we utilized solely two-dimensional data of cases where one person was tracked. Thus, means are to be developed to process skeletal data containing information about multiple people. Lastly, our random selection of approximately 10% of the data during evaluation may have, to a lesser or greater extent, affected the results as well (e.g., instances of patterns may have been underrepresented in the data set). Considering the above-mentioned optimizations regarding the algorithms' complexity, we are planning to work with larger data sets in the future.
#6. Conclusion
This research envisions guidance on how to practically increase the level of automation during the analysis process of multi-dimensional skeletal data. It proposes using AHC, leveraging DTW as a distance measure, for this purpose. Algorithmically, walking trajectories distilled from a real-world skeletal data set were merged into coherent clusters, individually displaying (dis-)similarities in the underlying mobility behavior. By demonstrating our approach's actual usefulness, it is, arguably, a viable addition to the repertoire of existing qualitative and quantitative methods for longitudinal ambient display research in the wild. We see its main strengths in markedly reducing the manual effort necessary during analysis and in becoming familiar with a data set at hand more quickly. In the future, we will focus primarily on optimizing the computational complexity of the algorithms, evaluating pre-processing steps such as z-normalization, and extending the existing approach to suggest the optimal number of clusters.
#Acknowledgments
This research is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—project number 451069094.