Concern with “big data” and ”data at scale” has pervaded many conversations in the past few years. Conversations orient around what “big data” really means, around the relationship of “small data” to “big data”, and around the relationship between “local” data and “global”, aggregate data. In the EPIC context, conversations veer towards anxiety regarding the relationship of ethnographic practice to quantitative data and whether the future of ethnography as an area of interest is in jeopardy. Will all questions regarding people, “users”, and markets be answered by brute force, number crunching?
The answer is absolutely NO.
It’s important to note that the ethnography community has a long-standing relationship with analysis of, and interpretation through, quantitative data at all scales and granularities. “Big” data and “data at scale” are not new. Stealing a snippet of Samuelle Carlson and Ben Anderson’s paper title from 2007, we know “there are many kinds of data”. Further, we know that data is neither good nor bad. It’s what one does with the data that matters. It’s how one understands and works with the benefits and the curses, the strengths and the limitations, of the data in hand that makes for “good” or “bad”.
Following up on that introduction, my perspective for EPIC2017’s call to think about Perspectives is that there has never been a better time for an ethnographic embrace and a reconfiguration of what it means to render meaning into big and small data. We have an opportunity to resist taking data as given, an opportunity to bring an ethnographic lens to the collection, management and curation of data, and an opportunity to make sense of what’s intentionally and artifactually collected from people’s interactions with digital devices and services. Only by looking for meaning in the data traces, the data “fumes”, will we be able to understand what is of value to people, and able to create lasting services that people value. To be able to do this well, to do this better than we are currently doing it, we need better tools for dealing with data at all scales and granularities—from collection to curation to manipulation to analysis to the drawing of defensible insights and conclusions. We also need innovative techniques and tools to better support data triangulation focused on producing high quality, interpretive work. And that’s what I hope we’ll discuss at EPIC2017.
I’d like to situate this key point in my own experiences to give some grounding for the perspective I just shared.
I’ve been working in corporations, specifically in the technology industry, for over 20 years. Much of my work has focused on looking at how digital tools are used in everyday lives. My early work focused on communication and collaboration in graphical and textual virtual worlds, followed by studies of collaboration tools and their effect on work tasks and work relationships, with a specific focus on cross-cultural collaboration. From there I worked with colleagues on creating tools for the support of communities of practice, interest, circumstance, and convenience, addressing how such communities emerge, grow, and sustain their own cultural coherence(s). More recently, I’ve focused on everyday use of consumer-facing technologies and services. Throughout this journey, a perennial theme has been to understand how the “online” and “offline” are woven together, to understand the digital and the tangible as they are threaded together in people’s management and maintenance of their lives.
Since the mid- to late-1990s when I worked with colleagues in the UK, I’ve been a strong enthusiast for and advocate of data triangulation, of mingling data from multiple sources at many levels of granularity—from eye gaze data, to online activity traces garnered from mobile and desktop service interaction in behavioral logs, to data collection from sensors, to observations of offline embodied actions, to spoken explanations. In this approach, I consider my collaborators and I to be “circumstantial activists”, and our approach has always been “multi-sited”, in the sense George Marcus articulated in the paper “Ethnography in/of the world system: The emergence of multi-sited ethnography.” I’ve also always balked at the division of data into qualitative and quantitative, believing that behind every quantitative measure is a qualitative judgement imbued with a set of situated agendae. As Martyn Hammersley concluded in What’s Wrong with Ethnography?, “the distinction between qualitative and quantitative is of limited use and, indeed, carries some danger”. The essence of ethnography is the mixing of the qualitative and the quantitative, and the embracing and connecting of very different representations from disparate sources at multiple levels of granularity. It is in the connections, and in the seeking of deeper insight(s), that we form a picture.
And now is the time for us all to push for and illustrate how an ethnographic lens on data analysis, summarization and triangulation is the future of picture painting. In my world, where we strive to understand not just how and when people use technologies, but also why, an ethnographic perspective on multi-faceted data is the path forward to bringing not just good products and services, but valued and valuable ones.
Why now? Because we are at a time and place where data about people’s behaviors online and offline is plentiful and in many formats.1 Where there is valid frustration that large datasets, aggregates, and correlations across variables of scientific interest do not reveal nuance or people’s individuality and difference effectively, the tools for exploring data and rendering it visible and interrogable are getting better—we have the means now to visualize, manipulate, and interpret data, see relationships, and “think the previously unthinkable”, to borrow a term from Bret Victor, an expert on visualization in support of reasoning based at the MIT Media Lab.
As part of our evolution, we need to establish and foster deeper relationships with our colleagues in the data sciences, statistics, engineering, and data visualization, and to be more effective in communicating what we’d like to achieve. Two papers have been particularly helpful for me when talking to such collaborators about the kinds of “tools for reasoning” I have in mind. The first is from EPIC2009, and the other, although not an EPIC paper, is one that I have discussed with many regular EPIC attendees, and one that is still informing my own work and the work of my research team.
The first is “Numbers Have Qualities Too: Experiences with Ethnomining” by Ken Anderson, Dawn Nafus, Tye Rattenbury, and Ryan Aipperspach. In this paper the authors discuss the ways in which datasets can be interrogated and reviewed with an ethnographic perspective. They propose we conduct analyses which complement datamining—in which the data are taken as given and interpreted with a view to summarization—but focus on an exploratory practice of data investigation with a view to finding out the meaning behind the data, for example, the intent behind the actions that are rendered into behavioral logs. They call this practice “ethnomining”. This concept so beautifully described one aspect of the work I’d been doing around the time the authors were proposing this approach, that I wantonly used the term to intrigue and invite my data science and engineering colleagues to join with me in analysis of data from a dating site (Yahoo! Personals, which was a popular site at the time). The work we did then complemented work with Elizabeth Goodman, which had been more traditionally ethnographic with interviews, observations, site analyses, and which was presented and published in an EPIC2008 paper.
Similarly, the combination of participant observation, extended interviews, and online experimentation with visualizations of activity data led to a successful, multi-sited ethnographic investigation into the practices of DJs who were using a webcasting service to extend their reach to new audiences and distant fans. That work, done with David Ayman Shamma, Nikhil Bobb and Matt Fukuda, led to a series of observations about the value and meaning of certain kinds of online performance that are still relevant today. We were also able to recommend some technical innovations that live on long after the webcasting platform in question was shut down. In another example, Andy Brooks and I conducted a series of interviews with concierges and guests of low- to high-end hostels and hotels to highlight the ways in which online search accomplished through the typing of words into a search text box is so unlike getting recommendations from an expert. Working in partnership with engineers in search and product managers in the Yahoo! Locals team, we trawled through search logs, reviewed aggregate search statistics, and discussed the Yahoo! Locals team’s product roadmaps for assumptions about information seeking behaviors. We then conducted targeted interviews with concierges and guests. Our research led to findings that rendered visible the disjunctures between search requests in a text field and asking someone for geographically and socially situated information. The work illuminated opportunities and led to design guidelines for “social” and “dialogic” search tools focused on supporting exploratory information seeking and finding.
The second paper that has been a key reference for me draws on some favourite theorists and practitioner perspectives from our broader community. In “Trace Ethnography: Following coordination through documentary practices,” R. Stuart Geiger and David Ribes (2011) combine participant-observation with analysis of system logs to reveal the patterns and practices of people contributing to Wikipedia. The authors take these “thin” evidentiary traces to amplify and “thicken” their understanding of the processes of contribution and coordination, and in the process turn data traces that are typically only used for system debugging or quantitative summarizations of site activity into the substrate of rich qualitative analyses. The authors discuss how edit logs on a site like Wikipedia can be revelatory of the social dynamics within a group, revealing tensions and clashes that are only visible in this metadata. Taking up Marcus’ idea of following the actors, the authors invoke Latour and show how the activity log traces don’t “stem from some inherent documentary quality, but rather because they are produced and circulated within a highly standardized sociotechnical infrastructure of documentary practices”. The authors talk of “interviewing” the Wikipedia database—it is perhaps here where the idea of an ethnographically focused, log forensics with the intent of revealing social and cultural formations is most evident.
These ideas of ethnomining and of reading the logs and following the traces, and of interviewing databases, triangulated with more “traditional” ethnographic methods like interviewing and participant observation, have been very powerful in my work and in the work of my teams. An example of this kind of trace analysis in action, when working at eBay, Michael Gilbert and I combined detailed behavioral log trace analysis with data visualizations of account holders’ search practices.2 Interviews revealed the shopping habits and patterns of consumers looking for bargains. The insights from this work would not have been possible with interviews alone, nor from purely studying behavioral logs, nor from aggregates like “daily actives” summaries. Insights included markers to look for in order to reveal shared accounts, to opportunities for social shopping. Our analyses convinced our product counterparts to think beyond “the user” as a single entity, perhaps a single person, and instead conceptualize a social entity—an example being multiple people on one account, or perhaps a single person with multiple accounts trying to maintain boundaries between social roles (e.g., personal and work personae). More than convincing though, we offered a revisioning of personalization algorithms for product recommendations, and some concrete digital traces to look out for when trying to differentiate account types for which alternative kinds of service provision would be appropriate. We understood that the absence of a focus on personhood in algorithmic “personalization” with its emphasis on abstractions and relations among system-knowable variables, was leading to poor conclusions about what people want and need.
This foundational work led to a successful research partnership with a number of colleagues, entitled Putting the Person into Personalization. This effort took a critical look at what data are available and what data would be needed to describe a fully functioning, socially integrated person, acting in multiple social contexts. We combined research methods and data types to reveal nuance and to render visible what was missing in the behavioral logs and data traces. We showcased how, as Abbott has written, “our normal methods…attribute causality to the variables…rather than to agents; variables do things, not social actors”. We also looked into “missing data”, including “missing persons”, and showed the ways in which inferences drawn from aggregate data did not apply to all—or even to any—of the individuals making up the aggregate. Again, following Abbot, we observed that individual variability in such aggregate data, instead of being seen as an opportunity for further analysis and deeper personalization, is dismissed as error. The “data dialect” of data science for recommendation and merchandising did not recognize this as meaningful information.
In the projects mentioned above and in ongoing projects, there are many questions an ethnographic perspective can pose, including:
- What was the “field” or “fields”?
- What has been sampled and investigated?
- When is an apparent relationship in the data a coincidence rather than a consequence? Or, to put it another way, when have we mistaken correlation for causation?
- What is missing? Where are the holes in the data? Who are missing, not represented?
- When is an outlier a signal, not just an “outlier”? When is an outlier a signpost to something we’ve missed or are missing?
- How can we design the data we need in order to derive the understanding that is required to bring the best value for people?
Data is a material for understanding, not a given from which we deduce that which lies latent within the data, waiting to be revealed. Data analyses should be more than incremental refinement on what is already known. We should work with data to challenge what we know, and to actively seek surprise. It’s incumbent upon us in this community to push for grounded meaning from data. Data can lead to information; indeed, the engineering and data science of turning data into information—the merging, joining and combining across datasets—is technically very challenging, and the generation of summary data and fascinating inferences staggeringly impressive. But, to go from information to knowledge we need an understanding of what is meaningful by understanding people in context. We also need to remember that, whatever datasets are involved and no matter how many kinds of analyses are done, there will be multiple perspectives on what’s most meaningful or matters most. So, to go from knowledge to action requires wisdom and an ethical stance. And central to ethnographic practice, especially when it comes to the world of technology, is a perspective that ethics cannot live in technology alone, even if our technologies are intentionally or unintentionally always imbued with ethical and moral codes. Ethics lives in social systems, and can only be explored in collaboration with people centrally involved. An ethnographic understanding will tell us who is in the conversation and who is not. An ethnographic perspective will help us unpack what data dialects are needed in the conversation.
This is where innovation is needed. To reiterate my earlier point, we need better tools to analyse and work with data. We need better techniques for unraveling provenance, for assessing data quality, for translation work between different data formats. We need to develop deeper literacy with and develop new data dialects. We need to understand how different, possibly interrelated, disciplines conceive of, manipulate, and interpret the same data and datasets.
One of the greatest benefits of communities like the EPIC community is that our collective insight is larger than any specific paper, and our collective connections into many kinds of industries gives us an opportunity to create synergies and cross boundaries. Many of us are industry researchers. As such, we are in a powerful position and we need to be proactive. We have access to data and datasets that are just not available within the academy. As Marcus said, “Multi-sited research is designed around chains, paths, threads, conjunctions, or juxtapositions of locations in which the ethnographer establishes some form of literal, physical presence, with an explicit, posited logic of association or connection among sites that in fact defines the argument of the ethnography.” Ask yourself: Where will your next paths and presences be? Ask yourself: How you will contribute to the creation of a truly ethnographic data dialect?
The practice of ethnography and the ethnographic perspective has never been more important, never more needed, and never more brimming with potential to influence industry and the technosphere.
Enjoy EPIC 2017!
Notes
Acknowledgements: I’d like to thank ken anderson and Jennifer Collier Jennings for comments on an early draft of this blog post.
1. Noting that we need to think carefully and ethically about the use of data from online activities. We are in that regard too, at a very interesting and important moment with the soon to be implemented General Data Protection Regulations in Europe, set for implementation in May 2018.
2. Created by Andy Edmonds, then of eBay, now of Adobe
References
Abbott, A. 1992. From Causes to Events: Notes on Narrative Positivism. Sociological Methods and Research 20:428-55.
Anderson, K.; Nafus, D.; Rattenbury, T.; Aipperspach, R. Numbers Have Qualities too: Experiences with Ethno‐Mining. EPIC Proceedings, Chicago, IL, USA, 30 August–2 September, 2009; pp. 123–140.
Carlson, S. and Anderson, B. (2007), What Are Data? The Many Kinds of Data and Their Implications for Data Re-Use. Journal of Computer-Mediated Communication, 12: 635–651. doi:10.1111/j.1083-6101.2007.00342.x
Churchill, Elizabeth F., 2013. Putting the Person back into Personalization. Interactions 20, 5 (September 2013), 12-15. DOI: https://doi.org/10.1145/2504847
Churchill, Elizabeth F. and Sarma, Atish Das. 2014. Data Design for Personalization: Current Challenges and Emerging Opportunities. In Proceedings of the 7th ACM international conference on Web search and data mining (WSDM ’14). ACM, New York, NY, USA, 693-694. DOI=http://dx.doi.org/10.1145/2556195.2556211
Geiger, R. Stuart and Ribes, David. 2011. Trace Ethnography: Following Coordination through Documentary Practices. In Proceedings of the 2011 44th Hawaii International Conference on System Sciences (HICSS ’11). IEEE Computer Society, Washington, DC, USA, 1-10. DOI=http://dx.doi.org/10.1109/HICSS.2011.455
Hammersley, Martyn. What’s Wrong with Ethnography? Routledge, 2013.
Marcus, George E. Ethnography in/of the World System: The Emergence of Multi-sited Ethnography. Annual Review of Anthropology 24.1 (1995): 95-117.
Shamma, D.A.; Churchill, E.; Bobb, N.; Fukuda, M. Spinning Online: A Case Study of Internet Broadcasting by DJs, Communities & Technology, 06/25/09, University Park, PA (2009).
Victor, Bret, “Media for Thinking the Unthinkable, April 4th, 2013, Accessed September 10th 2017.
Google is an EPIC2017 Sponsor.
Related
What We Talk about When We Talk Data: Valences and the Social Performance of Multiple Metrics in Digital Health, Brittany Fiore-Silfvast and Gina Neff
‘It’s Not Just about the Patient’: A ‘360° Feedback’ Ethnography of Chronic Care Knowledge Generation, Jyotirmaya Mahaptara et al
The Domestication of Data: Why Embracing Digital Data Means Embracing Bigger Questions, Dawn Nafus
0 Comments