Methods and Tools for Supporting (Semi-)Automated Evaluation in Long-Term In-the-Wild Deployment Studies
Abstract
Human-computer interaction increasingly focuses on long-term evaluation of in-the-wild deployments. With this trend, however, understanding the usage behavior becomes more challenging. Due to the high repeating manual labor involved, existing methods such as in-situ observations and manual video analysis are no promising prospects on this avenue. Automated approaches (e.g., based on body tracking cameras) have been suggested recently to capture the usage behavior in long-term evaluations more efficiently. Still, these approaches may not be the only ones under consideration to move the field forward from here. This workshop gathers and reflects on the current state of the art regarding this trend and outlines perspectives for future research. The contributions cover, among other topics: methods and tools for data collection, noise and errors in sensor data, the correlation of automated observations with ground truth data, and augmenting sensor data with field work (e.g., interviews) for the contextualization of findings.
Keywords
in-the-wild deployments, long-term evaluation, automated data processing, ambient displays, mixed methods, workshop
1. Background
When designing and evaluating technology in human-computer interaction (HCI) research, the increasing complexity and ubiquity of technological artifacts is combined with the emerging need to take entire socio-technical systems into account. Methodologies for collecting, combining, and analyzing data are also increasing in maturity. For example, the use of both quantitative and qualitative usage behavior data in tandem (i.e., mixed methods) is becoming more common throughout in-the-wild (ITW) deployment studies.
Nowadays, this development manifests itself in an increasingly practice-based perspective of HCI research [13], which already found its establishment in the field of Computer-Supported Cooperative Work (CSCW) [25]. Here, a practice describes collective patterns of interaction that are reproduced in specific contexts [25]. At the core of these approaches is the understanding of technology as a flexible entity in an equally flexible environment, from which concrete practices form over time [11].
One area where we can observe this development is ambient display research [1, 2, 9, 18, 20]. Here, research questions have started to arise that can only be meaningfully investigated in the field. These questions encompass topics such as user behavior (e.g., walking paths or interaction phases), user experience, acceptance (e.g., with respect to privacy or data protection), and the social impact of new technologies [1]. In investigating these questions, the ecological validity of the collected data is crucial, i.e., whether data was collected in a realistic environment reflecting authentic usage behavior. There is a need to develop a better understanding of how the interaction of people, their physical environment, and the use of technology differentiates [16].
Unsurprisingly, a recent trend in this field is to increasingly augment and automate the processes of data collection (i.e., by using optical sensors such as 3D cameras) and analysis (e.g., by applying algorithms for pattern discovery) in longitudinal field deployment studies. Questions revolve around, for example, the impact of the presence of interactive systems on user walking paths, how different interaction techniques attract potential users, or how people engage with such systems. In essence, these studies find motivation in learning more about the spatial, temporal, and social behavior of users. The central assumption is that long-term sensor data, on the one hand, complements touch interaction logs adequately (i.e., in terms of cost-effectiveness and richness) and, on the other hand, more holistically makes both passive and active use explicit. However, it remains to be seen whether this methodological choice will prove successful in the long run and, if so, how it affects the HCI community in a broader sense (e.g., regarding an overarching research design). Field deployment research is known for its continually changing environmental conditions such as contextual variables (e.g., team structures and room layouts) or the information demand of the target audience (e.g., introduction of new tools). While these dynamics do indeed point to significant research design challenges, they simultaneously underline the necessity to intensify the dialogue on methodological guidance for our community.
This workshop aims to answer two fundamental questions:
- What is the current state of the art in automated data processing for evaluation in HCI field deployment studies?
- How does this knowledge need to be advanced practically (e.g., development of new tools) and methodologically (e.g., introduction of new means for data analysis)?
In addition, the workshop is also intended to initiate more exchanges and collaborative work in the field – contributions to tool chains, use of tools from other groups, and collaborative development of tools.
2. Related Work
Roughly a decade ago, Alt et al. [2] introduced different kinds of research questions and how to address them methodologically in ambient display research. To this day, obtaining insights in this field has so far mainly relied on two types of methods: first, short-term observations (i.e., the whole spectrum from participant observation to video analysis to surveys) and second, interaction logs (such as touch gestures). Interaction logs were long considered the only data sources that allow deducing statements regarding usage over a longer period of time [4]. Recently, we summarized that the field, however, lacks rigorous procedures to enable a methodology-driven collection and analysis of data [21]. Studies were found to be more likely to use individual data collection methods and less likely to see them as part of an overarching research process (e.g., considering how different methods interconnect).
Recent developments increasingly target the challenge of (automatically) examining user behavior per se in greater detail (e.g., [9, 23]). Studies have criticized such systems for not being understood as part of a broader context (ibid.) Fundamentally, the study of user behavior is considered complex, often resulting in a reliance on manual observations and ethnographic research in the past. Therefore, a discernible trend in these more recent efforts is to successively augment and automate the processes of data collection (i.e., by leveraging 3D cameras) and analysis (e.g., through algorithmic solutions). Such research finds motivation in gaining more in-depth knowledge about the spatial and temporal behavior of users in close proximity to a display installation. The goal is to gain complementary information about content transitions, presentation times, and interactions. To date, however, there are only a few studies that follow this path [9].
The study by Williamson and Williamson [23] identifies several questions to explore in this now emerging research focus. These questions revolve around, for instance, the impact of an ambient display's presence on user walking paths or how different interaction techniques attract potential users. In addition, these following studies may identify data that might be of particular interest in future studies:
- Michelis and Müller [17]: observation of audience behavior revealed recurring behavioral patterns, like glancing at a first display while passing it, moving the arms to cause some effects, then directly approaching on of the following displays and positioning oneself in the center of the display. This was often followed by positioning oneself in the center of the other displays to explore the possibilities of the different effects, and sometimes by taking photographs or videos. From these observations a framework of interaction with gesture-based public display systems was deduced.
- Elhart et al. [9]: capture the spacial and temporal behavior of an audience; time in front of a display; heat map for the distances of passers-by; integration with web analytics (using pheme); presence, distance (changing interaction zones), counter (number of people in scene), and dwell time (time spent in front of a display).
- Wouters et al. [24]: how people interact with a system passively stimulates others to observe, approach, and engage in an interaction as well.
- Azad et al. [3]: investigate behavior on and around large shared displays; a observational field study initially, then a controlled experiment regarding territoriality including three basic zones of inter-personal spaces: the personal, peri-personal, and extra-personal; different moving formations – e.g., (a) simultaneous (several) without connection, (b) led staggered, (c) led line, and (d) led two leaders with some interacting and others actively watching. The question arises whether it is interesting to identify these constellations from body tracking data.
In addition to the automatic collection of data, there is work envisioning other methodological aspects. For example, Claes et al. [7] compared findings from an ITW study and a controlled ITW study (i.e., a merge of the qualities of both lab-based and ITW studies) of an ambient display installation. For the latter, the authors proactively invited participants to an open study on interactive installations, while for the former they just observed what interaction naturally occurred in the field. In both cases, structured interviews were performed and it was concluded that an ITW study was better suited to identify quantitative indications of actual user engagement, whereas a controlled ITW study yielded more valuable insights on why these trends where happening. Overall, when evaluating more complex interactions techniques, a controlled ITW study was found to offer a viable alternative.
3. Toward Automatic Evaluation of In-the-Wild Deployment Studies
In our work, we heavily build on quantitative data (i.e., body tracking and interaction data) as a foundation to guide our research and enrich incremental findings by thorough contextualization through qualitative insights. We believe that only using both kinds of methods in tandem can bring forth sound conclusions regarding how user really behave around display installations. Mäkelä et al. [18] recently introduced a good overview that shows what data is usually available in ambient display deployment studies and how to process it: both body tracking data from a camera and interaction data from the display software itself are the pillars in this overview. Data is processed, combined, and fed into variables that are defined for particular research questions. We have implemented this view in our research. As part of that, we developed a new data format for storing body tracking data [10] as well as an application for Elastic Stack to store both interaction and body tracking data [19].
Our methodological stance finds motivation in the issue of lacking comparability. Without contextualization, it is still a challenge to compare two intervals of interaction data to, for instance, determine whether a new feature changes the usage of the display (e.g., by averaging interaction counts) and if so, how. There is arguably a large variety of context factors that influence the overall interaction process. Examples are holidays, remote work, changing team structures, the current information demand, and so on. In contrast, including data about what is actually happening in front of an ambient display enables us to draw a more holistic picture of an interaction. We are able to be very specific about conversion rates of users, such as Michelis and Müller [17] describe them, to distinguish between real users and simple passers-by as well as to shed light on subtle and direct interaction.
In our view, existing work such as the study by Mäkelä et al. [18] lacks some crucial parts to grasp on interaction as a full concept and consequently fails to provide answers on how to empower researchers toward this goal. To name a few aspects:
- The possibility to readily visualize body tracking data to identify relevant situations (e.g., people aggregating in front of a display as described by the honeypot effect).
- Algorithmic means to easily search for patterns in huge amounts of collected data over time.
- Methodological suggestions on how to include insights from the context gathered through, for instance, interviews and observations.
- Answers to cope with the inherent dynamics of ITW studies in an overarching research design.
In the following, we provide some more in-depth elaborations on these ideas and summarize them in Figure 1.
3.1. Exploration
In reality, we often find ourselves in the situation to determine the right data for addressing a particular research question. We regularly engage in weighing the pros and cons of individual data collection methods to unveil new insights. While in some instances we have clear ideas in mind throughout this exploration process, in other situations we experience the filtering by some parameters to be useful and, with these parameters in mind, look at specific situations and their underlying data. A practical example is one of our research projects where we are investigating the honeypot effect in more detail. Here, we first filter situations to be elaborated on in the body tracking data. Filters can be, but are not limited to, aspects such as the ones described by Azad et al. [3]: How many people enter a scene from the left, the right, or the front? How many people slow down or start interacting? In regards to the honeypot effect, we look at situations where initially only one person was standing in front of a display installation and where, then, others join this person. Next, we try to identify patterns in the underlying body tracking data to, ultimately, find other occurrences algorithmically. While we can obtain one or many instances of the honeypot effect quantitatively this way, we are then required to provide some context for these instances to provide meaning.
3.2. Context data
Context data is required for interpreting what can be really seen in body tracking and interaction data. As said before, it can make a difference if we are looking at a data set collected during holidays or when the needs for information within a company change. Context data can be, but is not limited to:
- What is displayed on the screen – this can change quite quickly such as in our case, where we have data being shown for 10 seconds.
- What functionality does the interactive display offer – this also changes over time (rather in the order of weeks than of seconds).
- What is the weather like today?
- Is today a weekend, a bank holiday, a term break, etc.
- Has an announcement been made to potential users?
- Has the display been used for demonstration purposes?
- Was the display shut down?
This list of context information can be expanded to include more complex aspects such as organisational work processes, for instance, in agile software development teams. Here, questions arise such as: Which sprint are the teams currently working on? When is the next release scheduled? When is the next on-site team meeting? What is the status of the individual teams? Also post-COVID questions emerge such as how can the hybridity of work processes be included in the understanding of the context and data processing? In hybrid work situations, the actors are exposed to the duality of the work space (i.e., both the physical and digital space exist simultaneously as communication and interaction spaces) [15].
Another type of context data is the location of an installation. If we collect data from several screens it might be interesting to document factors relating to the location for every screen separately (e.g., to determine whether the data is complementary or comparable). Context data can be also automatically obtained from calendars or (historical) services like weather services, but it can also be part of research projects in the form of interviews or the documentation of additional observations. We generally try to adhere to a procedure of writing laboratory journals indicating special events and times that might be interesting for interpreting usage data later on. Last but not least, it is worth mentioning that, as Dourish [3] vividly describes, the meaning of a specific context is per definition flexible and in constant negotiation with its participants. We therefore have to regularly review the initial understanding of context during a study to tie it back to the initial goal definition or adapt it to the research process if necessary.
4. Possible Future Work and Questions
The written contributions for this workshop cover what has been addressed in the previous sections: Rohde et al. [19] describe an infrastructure for interaction logging. Fietkau [10] showcases a toolset for logging and visualizing body tracking data. Koch et al. [12] document a long-term ITW deployment of multiple public screens. Cabalo et al. [6] and Lacher et al. [14] propose and test two different approaches for analyzing body tracking data for determining engagement or attention. Buhl et al. [5] report on a limited-time gamification study to check whether such a change in the application leads to different user behavior.
Below are some open questions that are raised in the workshop papers or that emerge from the bigger picture formed by the collective contributions:
- There is a strong need of incorporating context data to better interpret the interaction process.
- It should be possible to create performance indicators for ambient displays from body tracking and interaction data.
- The suitability of body tracking data to pinpoint underlying patterns of user behavior (e.g., by the use of machine learning techniques or algorithmic approaches).
We have attempted to address some of these pressing issues in our field in Figure 1. In the workshop, we aim to discuss this preliminary methodological blueprint and thereby revise it in a meaningful way.
A further important issue to be discussed in the workshop concerns the research data management (e.g., how to manage the collection and storage of interaction logs and qualitative data like interviews), including long-term data storage and making data accessible to others in future studies.
Finally, another interesting topic, which is closely related to long-term ITW deployments, is the “sustainablility” of IT research in practice. Nowadays, research in applied computing requires researchers to engage deeply in the field (e.g., with practitioners) in order to design innovative IT artifacts and understand their appropriation. The problem that has not been solved so far is what happens when the research project is completed (see, for example, Simone et al. [22] for a broader discussion on this matter).
5. Organizers
As a research group, the workshop organizers are currently working on the DFG-funded research project “Investigation of the honeypot effect on (semi-)public interactive ambient displays in long-term field studies.” 1 They are eager to extend their internal discussions beyond the project's scope and to exchange insights with the broader community.
Michael Koch is a professor for HCI at University of the Bundeswehr Munich, Germany. His main interests in research and education are cooperation systems, i.e., bringing collaboration technology to use in teams. In the past decades, he has worked on several projects in the field of public displays and has conducted multiple long-term field studies in this domain.
Julian Fietkau is a post-doc researcher in HCI at University of the Bundeswehr Munich. His recently concluded doctoral project has involved the design and evaluation of public displays of different kinds to support older adults in outdoor activities.
Susanne Draheim is a post-doc researcher and Managing Director of the Research and Transfer Centre “Smart Systems” at Hamburg University of Applied Sciences. She has an academic background in sociology, educational sciences, and cultural sciences. She works on datafication & qualitative social research methods, companion technology, and digital transformation.
Jan Schwarzer is a post-doc researcher in the Creative Space for Technical Innovations (CSTI) group at Hamburg University of Applied Sciences, working on long-term evaluations of user behavior around ambient displays deployed in authentic environments. Recently, he concentrates on algorithmic approaches to distill underlying patterns in quantitative usage behavior data.
Kai von Luck is a professor for computer science at Hamburg University of Applied Sciences and the Academic Director of the CSTI group. His background in artificial intelligence informs and enriches his work on ambient displays and tangible interfaces.
1: https://gepris.dfg.de/gepris/projekt/451069094