Slow Searching: Research

Showing posts with label Research. Show all posts

Tuesday, February 10, 2015

Predicting the Future

Yesterday I attended a Microsoft Research meeting on artificial intelligence, where, in addition to discussing a number of grand challenges in AI, attendees celebrated Eric Horvitz's recent AAAI Feigenbaum Prize by writing down our predictions for artificial intelligence in 2050. It was really interesting to read everyone's reflections on what the world might look like in 35 years.

My 2050 prediction: The nature of information work will change drastically. Many information work tasks will be done by intelligent systems, with people only providing input to fill in the gaps that the system cannot perform. Most information workers will either be unskilled or highly specialized, and the work they do will often be decoupled from the larger task to which they contribute. Task subcomponents that require human input will be matched to the worker who is optimally qualified to address it, and only the necessary context to complete the subtask will be provided. Because information tasks will be modularized and self contained, many information workers will be able to integrate paid work into their personal lives, working in their spare moments from home when it is convenient instead of in large chunks from the office. Improved efficiency and automation will mean they will be able to work less or fewer information workers will be needed.

Our emerging ability to transform large tasks into microtasks will drive this change. People once believed a skilled craftsman was needed to build a car, but then they figured out how to decompose the task into repeatable subcomponents that could be completed by unskilled workers and, later, robots. Similarly, while many believe that complex information tasks can only be performed by skilled information workers, we are discovering it is possible to pull out the repeatable subcomponents from these tasks to be performed by the task owner (selfsourcing), the crowd (crowdsourcing), and, eventually, artificial intelligence. The transformation of information work into microwork will change when and how people work, and enable individuals and automated processes to efficiently and easily complete tasks that currently seem challenging.

Now it's your turn. What is your prediction for artificial intelligence in 2050?

Monday, December 22, 2014

2014 Research in Review

This post summarizes the research I published in 2014. The work divides roughly into three components covering: 1) slow search, 2) crowdsourcing, and 3) face-to-face social interaction.

Slow Search
We live in a world where the pace of everything from communication to transportation is getting faster. In recent years a number of "slow movements" have emerged that advocate for reducing speed in exchange for increasing quality. These include the slow food movement, slow parenting, slow travel, and even slow science. Building on these movements we have been exploring the concept of slow search, where search engines use additional time to provide a higher quality search experience than is possible given conventional time constraints.

Paper Summary: A Crowd of Your Own

A Crowd of Your Own: Crowdsourcing for On-Demand Personalization

Peter Organisciak, Jaime Teevan, Susan Dumais, Robert C. Miller, Adam Tauman Kalai

HCOMP 2014 (Notable Paper)

A lot of my research explores personalization. Personalization is a way for computers to support people’s diverse interests and needs by providing content tailored to the individual. However, despite clear evidence that personalization can improve our information experiences, it remains very hard for computers to actually understand individual differences.

A Formula for Academic Papers: Related Work

The Related Work section of an academic paper is often the section that graduate students like writing the least. But it is also one of the most important sections to nail as the paper heads out for review. The Related Work section serves many purposes, several of which relate directly to reviewing:

The person handling the submission will use the referenced papers to identify good reviewers,
Reviewers will look at the references to confirm that the submission cites the appropriate work,
Everyone will use the section to understand the paper's contributions given the state of existing research, and
Future researchers will look to the Related Work section to identify other papers they should read.

Data Banks

Each of us individually create a huge amount of data online. Some of this data we create explicitly, such as when we make webpages or public facing profiles, write emails, or author documents. But we also create a lot of data implicitly as a byproduct of our interactions with digital information. These implicit data includes the search queries we issue, the webpages we visit, and our online social networks.

The data we create is valuable. We can use it to understand more about ourselves, and services can use it to personalize our experiences and understand people’s information behavior in general. But despite the fact that we are the ones who create the data, much of it is not actually in our possession. Instead, it resides with companies that provide us with online services in exchange for it. A handful of powerful companies have a monopoly on our data.

Definition of monopoly: the exclusive possession or control of the supply or trade in a commodity or service

Definition of data monopoly: the exclusive possession or control of the supply or trade in an individual’s personal data

Evidence from Behavior

Doug Oard at the Information School at the University of Maryland is teaching an open online course on information retrieval this fall (INST 734). Above is the brief cameo lecture I recorded using Office Mix for the segment on Evidence from Behavior.

Friday, May 23, 2014

Related Work: Find It If You Can

Find It If You Can: A Game for Modeling Different Types of Web Search Success Using Interaction Data
Mikhail Ageev, Qi Guo, Dmitry Lagun, Eugene Agichtein
SIGIR 2011

A lot of recent search research makes use of large-scale query log analysis. This poses a challenge for researchers who are not affiliated with a major search engine, because there are very few public logs available research community. Find It If You Can, by Ageev et al., provides a nice example of how information retrieval researchers might gather their own search logs. And in doing so, it also addresses a persistent challenge with using logs: the fact that while we know what users are doing, we don't actually know what they are thinking.

The Dangers of Sharing Log Data

A lot of my research relies on analyzing behavioral log data, including query logs (example: personal navigation), web browser logs (example: web revisitation patterns), social media logs (example: #TwitterSearch), IM logs (example: impact of availability state), and GPS traces (example: trajectory-aware search). Behavioral logs provide a picture of human behavior as seen through the lens of the system that captures and records user activity.

However, behavioral logs can also provide a picture of a specific individual, and as such raise privacy concerns. Would you be willing to share your query history with me? I search for fairly mundane things, but even so, there’s no way I’d share an unfiltered version of my queries with you. As a result, despite good intentions several companies that have tried to make behavioral logs available to the research community have ended up in hot water. The two best known examples are AOL and Netflix.

Public Behavioral Logs

Large-scale behavioral log analysis allows us to do many things, including build better search engines, predict health epidemics, and support communities through crises. But they can also be hard to come by. This post covers some of the different types of publicly available behavioral logs available for study.

Turning Our Personal Devices into Social Devices

Our smartphones allow us to connect with the rest of the world from anywhere. Ironically, however, they also tend to disconnect us from the world immediately around us. But rather than fight the creep of technology into our social spaces, there is the opportunity to reframe our personal devices as social devices.

As we start using our phones to interact with the people who are near us, there are a number of unique aspects of our interaction that mobile applications can capitalize on. Co-located collaborators can see each other, talk with each other, and share surrounding context. This is why it is easier to have a conversation with someone while driving if the other person is in the car with you and not on the phone. A person in the passenger seat can pause when the road requires your attention. A person on the phone can’t see when your attention gets diverted or the road conditions change.

A Formula for Academic Papers: Introduction

The Introduction to a paper is a place for you to tell the story of the research that is presented. That story is not what you did to complete the research, but rather why the work is interesting. And while the research you are writing about in a paper might be part of a larger story (e.g., your thesis), the paper’s story is also not necessarily that larger story. Instead, it is the story that frames just the current work and its contributions as clearly as possible. The goal is to capture the reader’s attention, provide context for the included research, and set expectations for what is to come.

A simple, reliable Introduction outline is:

Related Work: Ensemble

Ensemble: Exploring Complementary Strengths of Leaders and Crowds in Creative Collaboration
Joy Kim, Justin Cheng, Michael S. Bernstein
CSCW 2014

This paper looks at crowd-supported creative writing. It presents a system, called Ensemble, that supports structured creative writing. There are a number of examples of massively collaborative writing, including:

Ensemble: Structured creative writing, with the author leading crowd workers in the task.
The Collabowriters: An experiment in collaborative novel writing. Users write short sentence candidates for the next sentence in the novel, and then vote on which one is the best.
FoldingStory: A group storytelling game.
Massively Distributed Authorship of Academic Papers: An experiment in collaborative academic writing by Bill Tomlinson and 29 others for alt.chi at CHI 2012.
Soylent: A crowd-powered word processor that uses Mechanical Turk workers to help writers proofread and shorten their document.

Microwork as a Way to Collaborate

Recently I have been exploring selfsourcing as a way to help someone perform a single large, overwhelming information tasks by breaking it down into hundreds of easy-to-perform microtasks. Many of the microtasks that make up our personal information tasks need to be completed by the task owner, because they require personal knowledge or context. But not all do. In this way, the approach makes it easy to share aspects of a task with others in a way that is not easy to do for traditional complex tasks. For example, if the process of creating a photobook is broken down into subtasks, different family members can perform these subtasks to create a coherent book. Likewise, multiple colleagues could simultaneously create a single presentation if the process allowed them to brainstorm ideas on a topic and labeling them in parallel.

Micro-Habits

In several posts recently, I have discussed the potential benefits of transforming personal information tasks into microtasks (called selfsourcing). It seems picking up habits can likewise be facilitated by transforming the desired behavior change into micro-habits.

According to B.J. Fogg (via an NPR report I listened to recently), there are three steps you should follow if you want to pick up a new habit:

Small: Choose a habit to pick up that only take a few seconds to complete.
Routine: Place this new micro-habit within an existing routine.

Celebrate: Physically celebrate the completion of your new habit whenever you do it.

WSDM 2014 Trip Report

Last week I attended the Seventh ACM International Conference on Web Search and Data Mining (WSDM 2014) in New York City. The conference hotel was located right in Times Square, and I enjoyed visiting Cornell Tech, eating delicious food, and catching up with college friends. I also enjoyed attending sessions. WSDM is single track, which means the research being presented isn’t always directly relevant to everyone, but conference attendees have a shared experience and get exposed to research they might not otherwise.

Making a Task Interruption-Friendly

We are interrupted every 11 minutes. We are interrupted by incoming email, the phone ringing, and colleagues stopping by our office. We even interrupt ourselves sometimes, to go read Facebook or browse the web. You very well may not make it through this post without an interruption. Research suggests that it is hard to recover from an interruption, with it taking us up to 15 minutes to return to focused activity. Given we are interrupted so often that we never achieve full efficiency, interruptions cause a significant loss in productivity.

Selfsourcing Example: Presentation Creation

In addition to playing around with the selfsourcing of photo organization, I am also building a desktop selfsourcing application to support brainstorming and create a presentation with the results. A slide from a selfsourced presentation (on the topic of selfsourcing) is shown above.

Selfsourcing Example: Photo Organization

People have amassed large archives of digital photographs that have significant value to them but are difficult to use because they lack structure. For example, I have thousands of pictures of my four boys that I never get to enjoy because they sit in an unorganized mess of flat folders labeled with dates. To help people make better use of their photographs, Dan Liebling and I are developing a mobile selfsourcing application that supports the creation of a photobook from an unorganized set of photographs. I can’t wait to use it!

CSCW 2014 Trip Report

CSCW 2014 in Baltimore, MD was a lot of fun. There was so much good work presented that I often had a hard time choosing which sessions to attend. However, the common themes I paid attention to while there were:

Selfsourcing

Peer review is an important part of the scientific process. It is used to identify high quality research and establish norms and common practices within the scientific community. But it is also my least favorite part of my job. Reviewing seems to require long uninterrupted periods of effort to make meaningful progress, and it feels too overwhelming to get started unless I absolutely have to get the task done. I find it almost impossible to review an article without a deadline looming over my head.

You probably have tasks like this, too. Perhaps you find it hard to sit down to write a blog post, or to edit an academic article, or to make a scrapbook with the thousands of photos you’ve taken. The task is very important to you, but actually getting it done is almost impossible.

Good news: We can fix this!

Quick LInks

Tuesday, February 10, 2015

Monday, December 22, 2014

Wednesday, December 10, 2014

Monday, November 10, 2014

Wednesday, October 22, 2014

Wednesday, August 13, 2014

Friday, May 23, 2014

Tuesday, May 6, 2014

Monday, May 5, 2014

Wednesday, April 16, 2014

Tuesday, April 1, 2014

Tuesday, March 18, 2014

Tuesday, March 11, 2014

Sunday, March 9, 2014

Tuesday, March 4, 2014

Wednesday, February 26, 2014

Sunday, February 23, 2014

Saturday, February 22, 2014

Friday, February 21, 2014

Thursday, February 20, 2014