By Topic

IBM Journal of Research and Development

Issue 3.4 • Date May-June 2012

This is Watson

In 2007, IBM Research took on the grand challenge of building a computer system that could compete with champions at the game of Jeopardy!. In 2011, the open-domain question-answering system dubbed Watson beat the two highest ranked players in a nationally televised two-game Jeopardy! match. This special issue provides a deep technical overview of the ideas and accomplishments that positioned our team to take on the Jeopardy! challenge, build Watson, and ultimately triumph. It describes the nature of the question-answering challenge represented by Jeopardy! and details our technical approach. The papers herein describe and provide experimental results for many of the algorithmic techniques developed as part of the Watson system, covering areas including computational linguistics, information retrieval, knowledge representation and reasoning, and machine leaning. The papers offer component-level evaluations as well as their end-to-end contribution to Watson's overall question-answering performance.

Filter Results

Displaying Results 1 - 19 of 19
  • Front Cover

    Publication Year: 2012 , Page(s): C1
    Save to Project icon | PDF file iconPDF (1554 KB)  
    Freely Available from IEEE
  • Table of Contents

    Publication Year: 2012 , Page(s): 1 - 2
    Save to Project icon | PDF file iconPDF (49 KB)  
    Freely Available from IEEE
  • Introduction to “This is Watson”

    Publication Year: 2012 , Page(s): 1:1 - 1:15
    Cited by:  Papers (5)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (3008 KB) |  | HTML iconHTML  

    In 2007, IBM Research took on the grand challenge of building a computer system that could compete with champions at the game of Jeopardy!™. In 2011, the open-domain question-answering (QA) system, dubbed Watson, beat the two highest ranked players in a nationally televised two-game Jeopardy! match. This paper provides a brief history of the events and ideas that positioned our team to take on the Jeopardy! challenge, build Watson, IBM Watson™, and ultimately triumph. It describes both the nature of the QA challenge represented by Jeopardy! and our overarching technical approach. The main body of this paper provides a narrative of the DeepQA processing pipeline to introduce the articles in this special issue and put them in context of the overall system. Finally, this paper summarizes our main results, describing how the system, as a holistic combination of many diverse algorithmic techniques, performed at champion levels, and it briefly discusses the team's future research plans. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Question analysis: How Watson reads a clue

    Publication Year: 2012 , Page(s): 2:1 - 2:14
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (914 KB) |  | HTML iconHTML  

    The first stage of processing in the IBM Watson™ system is to perform a detailed analysis of the question in order to determine what it is asking for and how best to approach answering it. Question analysis uses Watson's parsing and semantic analysis capabilities: a deep Slot Grammar parser, a named entity recognizer, a co-reference resolution component, and a relation extraction component. We apply numerous detection rules and classifiers using features from this analysis to detect critical elements of the question, including: 1) the part of the question that is a reference to the answer (the focus); 2) terms in the question that indicate what type of entity is being asked for (lexical answer types); 3) a classification of the question into one or more of several broad types; and 4) elements of the question that play particular roles that may require special handling, for example, nested subquestions that must be separately answered. We describe how these elements are detected and evaluate the impact of accurate detection on our end-to-end question-answering system accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Deep parsing in Watson

    Publication Year: 2012 , Page(s): 3:1 - 3:15
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (631 KB) |  | HTML iconHTML  

    Two deep parsing components, an English Slot Grammar (ESG) parser and a predicate-argument structure (PAS) builder, provide core linguistic analyses of both the questions and the text content used by IBM Watson™ to find and hypothesize answers. Specifically, these components are fundamental in question analysis, candidate generation, and analysis of passage evidence. As part of the Watson project, ESG was enhanced, and its performance on Jeopardy!™ questions and on established reference data was improved. PAS was built on top of ESG to support higher-level analytics. In this paper, we describe these components and illustrate how they are used in a pattern-based relation extraction component of Watson. We also provide quantitative results of evaluating the component-level performance of ESG parsing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Textual resource acquisition and engineering

    Publication Year: 2012 , Page(s): 4:1 - 4:11
    Cited by:  Papers (3)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1275 KB) |  | HTML iconHTML  

    A key requirement for high-performing question-answering (QA) systems is access to high-quality reference corpora from which answers to questions can be hypothesized and evaluated. However, the topic of source acquisition and engineering has received very little attention so far. This is because most existing systems were developed under organized evaluation efforts that included reference corpora as part of the task specification. The task of answering Jeopardy!™ questions, on the other hand, does not come with such a well-circumscribed set of relevant resources. Therefore, it became part of the IBM Watson™ effort to develop a set of well-defined procedures to acquire high-quality resources that can effectively support a high-performing QA system. To this end, we developed three procedures, i.e., source acquisition, source transformation, and source expansion. Source acquisition is an iterative development process of acquiring new collections to cover salient topics deemed to be gaps in existing resources based on principled error analysis. Source transformation refers to the process in which information is extracted from existing sources, either as a whole or in part, and is represented in a form that the system can most easily use. Finally, source expansion attempts to increase the coverage in the content of each known topic by adding new information as well as lexical and syntactic variations of existing information extracted from external large collections. In this paper, we discuss the methodology that we developed for IBM Watson for performing acquisition, transformation, and expansion of textual resources. We demonstrate the effectiveness of each technique through its impact on candidate recall and on end-to-end QA performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic knowledge extraction from documents

    Publication Year: 2012 , Page(s): 5:1 - 5:10
    Cited by:  Papers (5)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (839 KB) |  | HTML iconHTML  

    Access to a large amount of knowledge is critical for success at answering open-domain questions for DeepQA systems such as IBM Watson™. Formal representation of knowledge has the advantage of being easy to reason with, but acquisition of structured knowledge in open domains from unstructured data is often difficult and expensive. Our central hypothesis is that shallow syntactic knowledge and its implied semantics can be easily acquired and can be used in many areas of a question-answering system. We take a two-stage approach to extract the syntactic knowledge and implied semantics. First, shallow knowledge from large collections of documents is automatically extracted. Second, additional semantics are inferred from aggregate statistics of the automatically extracted shallow knowledge. In this paper, we describe in detail what kind of shallow knowledge is extracted, how it is automatically done from a large corpus, and how additional semantics are inferred from aggregate statistics. We also briefly discuss the various ways extracted knowledge is used throughout the IBM DeepQA system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Finding needles in the haystack: Search and candidate generation

    Publication Year: 2012 , Page(s): 6:1 - 6:12
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (2104 KB) |  | HTML iconHTML  

    A key phase in the DeepQA architecture is Hypothesis Generation, in which candidate system responses are generated for downstream scoring and ranking. In the IBM Watson™ system, these hypotheses are potential answers to Jeopardy!™ questions and are generated by two components: search and candidate generation. The search component retrieves content relevant to a given question from Watson's knowledge resources. The candidate generation component identifies potential answers to the question from the retrieved content. In this paper, we present strategies developed to use characteristics of Watson's different knowledge sources and to formulate effective search queries against those sources. We further discuss a suite of candidate generation strategies that use various kinds of metadata, such as document titles or anchor texts in hyperlinked documents. We demonstrate that a combination of these strategies brings the correct answer into the candidate answer pool for 87.17% of all the questions in a blind test set, facilitating high end-to-end question-answering performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Typing candidate answers using type coercion

    Publication Year: 2012 , Page(s): 7:1 - 7:13
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (2267 KB) |  | HTML iconHTML  

    Many questions explicitly indicate the type of answer required. One popular approach to answering those questions is to develop recognizers to identify instances of common answer types (e.g., countries, animals, and food) and consider only answers on those lists. Such a strategy is poorly suited to answering questions from the Jeopardy!™ television quiz show. Jeopardy! questions have an extremely broad range of types of answers, and the most frequently occurring types cover only a small fraction of all answers. We present an alternative approach to dealing with answer types. We generate candidate answers without regard to type, and for each candidate, we employ a variety of sources and strategies to judge whether the candidate has the desired type. These sources and strategies provide a set of type coercion scores for each candidate answer. We use these scores to give preference to answers with more evidence of having the right type. Our question-answering system is significantly more accurate with type coercion than it is without type coercion; these components have a combined impact of nearly 5% on the accuracy of the IBM Watson™ question-answering system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Textual evidence gathering and analysis

    Publication Year: 2012 , Page(s): 8:1 - 8:14
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (208 KB) |  | HTML iconHTML  

    One useful source of evidence for evaluating a candidate answer to a question is a passage that contains the candidate answer and is relevant to the question. In the DeepQA pipeline, we retrieve passages using a novel technique that we call Supporting Evidence Retrieval, in which we perform separate search queries for each candidate answer, in parallel, and include the candidate answer as part of the query. We then score these passages using an assortment of algorithms that use different aspects and relationships of the terms in the question and passage. We provide evidence that our mechanisms for obtaining and scoring passages have a substantial impact on the ability of our question-answering system to answer questions and judge the confidence of the answers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Relation extraction and scoring in DeepQA

    Publication Year: 2012 , Page(s): 9:1 - 9:12
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1439 KB) |  | HTML iconHTML  

    Detecting semantic relations in text is an active problem area in natural-language processing and information retrieval. For question answering, there are many advantages of detecting relations in the question text because it allows background relational knowledge to be used to generate potential answers or find additional evidence to score supporting passages. This paper presents two approaches to broad-domain relation extraction and scoring in the DeepQA question-answering framework, i.e., one based on manual pattern specification and the other relying on statistical methods for pattern elicitation, which uses a novel transfer learning technique, i.e., relation topics. These two approaches are complementary; the rule-based approach is more precise and is used by several DeepQA components, but it requires manual effort, which allows for coverage on only a small targeted set of relations (approximately 30). Statistical approaches, on the other hand, automatically learn how to extract semantic relations from the training data and can be applied to detect a large amount of relations (approximately 7,000). Although the precision of the statistical relation detectors is not as high as that of the rule-based approach, their overall impact on the system through passage scoring is statistically significant because of their broad coverage of knowledge. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structured data and inference in DeepQA

    Publication Year: 2012 , Page(s): 10:1 - 10:14
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (752 KB) |  | HTML iconHTML  

    Although the majority of evidence analysis in DeepQA is focused on unstructured information (e.g., natural-language documents), several components in the DeepQA system use structured data (e.g., databases, knowledge bases, and ontologies) to generate potential candidate answers or find additional evidence. Structured data analytics are a natural complement to unstructured methods in that they typically cover a narrower range of questions but are more precise within that range. Moreover, structured data that has formal semantics is amenable to logical reasoning techniques that can be used to provide implicit evidence. The DeepQA system does not contain a single monolithic structured data module; instead, it allows for different components to use and integrate structured and semistructured data, with varying degrees of expressivity and formal specificity. This paper is a survey of DeepQA components that use structured data. Areas in which evidence from structured sources has the most impact include typing of answers, application of geospatial and temporal constraints, and the use of formally encoded a priori knowledge of commonly appearing entity types such as countries and U.S. presidents. We present details of appropriate components and demonstrate their end-to-end impact on the IBM Watson™ system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Special Questions and techniques

    Publication Year: 2012 , Page(s): 11:1 - 11:13
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (407 KB) |  | HTML iconHTML  

    Jeopardy!™ questions represent a wide variety of question types. The vast majority are Standard Jeopardy! Questions, where the question contains one or more assertions about some unnamed entity or concept, and the task is to identify the described entity or concept. This style of question is a representative of a wide range of common question-answering tasks, and the bulk of the IBM Watson™ system is focused on solving this problem. A small percentage of Jeopardy! questions require a specialized procedure to derive an answer or some derived assertion about the answer. We call any question that requires such a specialized computational procedure, selected on the basis of a unique classification of the question, a Special Jeopardy! Question. Although Special Questions per se are typically less relevant in broader question-answering applications, they are an important class of question to address in the Jeopardy! context. Moreover, the design of our Special Question solving procedures motivated architectural design decisions that are applicable to general open-domain question-answering systems. We explore these rarer classes of questions here and describe and evaluate the techniques that we developed to solve these questions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identifying implicit relationships

    Publication Year: 2012 , Page(s): 12:1 - 12:10
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (171 KB) |  | HTML iconHTML  

    Answering natural-language questions may often involve identifying hidden associations and implicit relationships. In some cases, an explicit question is asked by the user to discover some hidden concept related to a set of entities. Answering the explicit question and identifying the implicit entity both require the system to discover the semantically related but hidden concepts in the question. In this paper, we describe a spreading-activation approach to concept expansion, backed by three distinct knowledge resources for measuring semantic relatedness. We discuss how our spreading-activation approach is applied to address these questions, exemplified in Jeopardy!™ by questions in the “COMMON BONDS” category and by many Final Jeopardy! questions. We demonstrate the effectiveness of the approach by measuring its impact on IBM Watson™ performance on these questions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fact-based question decomposition in DeepQA

    Publication Year: 2012 , Page(s): 13:1 - 13:11
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (503 KB) |  | HTML iconHTML  

    Factoid questions often contain more than one fact or assertion about their answers. Question-answering (QA) systems, however, typically do not use such fine-grained distinctions because of the need for deep understanding of the question in order to identify and separate the facts. We argue that decomposing complex factoid questions is beneficial to QA systems, because the more facts that support an answer candidate, the more likely it is to be the correct answer. We broadly categorize decomposable questions into two types: parallel and nested. Parallel decomposable questions contain subquestions that can be evaluated independent of each other. Nested questions require decompositions to be processed in sequence, with the answer to an “inner” subquestion plugged into an “outer” subquestion. In this paper, we present a novel question decomposition framework capable of handling both decomposition types, built on top of the base IBM Watson™ QA system for Jeopardy!™. The framework contains a suite of decomposition rules that use predominantly lexico-syntactic features to identify facts within complex questions. It also contains a question-rewriting component and a candidate re-ranker, which uses machine learning and heuristic selection strategies to generate a final ranked answer list, taking into account answer confidences from the base QA system. We apply our decomposition framework to the particularly challenging domain of Final Jeopardy! questions, which are found to be difficult even for qualified Jeopardy! players, and we show a statistically significant improvement in the performance of our baseline QA system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for merging and ranking of answers in DeepQA

    Publication Year: 2012 , Page(s): 14:1 - 14:12
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1601 KB) |  | HTML iconHTML  

    The final stage in the IBM DeepQA pipeline involves ranking all candidate answers according to their evidence scores and judging the likelihood that each candidate answer is correct. In DeepQA, this is done using a machine learning framework that is phase-based, providing capabilities for manipulating the data and applying machine learning in successive applications. We show how this design can be used to implement solutions to particular challenges that arise in applying machine learning for evidence-based hypothesis evaluation. Our approach facilitates an agile development environment for DeepQA; evidence scoring strategies can be easily introduced, revised, and reconfigured without the need for error-prone manual effort to determine how to combine the various evidence scores. We describe the framework, explain the challenges, and evaluate the gain over a baseline machine learning approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Making Watson fast

    Publication Year: 2012 , Page(s): 15:1 - 15:12
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (3389 KB) |  | HTML iconHTML  

    IBM Watson™ is a system created to demonstrate DeepQA technology by competing against human champions in a question-answering game designed for people. The DeepQA architecture was designed to be massively parallel, with an expectation that low latency response times could be achieved by doing parallel computation on many computers. This paper describes how a large set of deep natural-language processing programs were integrated into a single application, scaled out across thousands of central processing unit cores, and optimized to run fast enough to compete in live Jeopardy!™ games. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simulation, learning, and optimization techniques in Watson's game strategies

    Publication Year: 2012 , Page(s): 16:1 - 16:11
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (964 KB) |  | HTML iconHTML  

    The game of Jeopardy!™ features four types of strategic decision-making: 1) Daily Double wagering; 2) Final Jeopardy! wagering; 3) selecting the next square when in control of the board; and 4) deciding whether to attempt to answer, i.e., buzz in. Strategies that properly account for the game state and future event probabilities can yield a huge boost in overall winning chances, when compared with simple rule-of-thumb strategies. In this paper, we present an approach to developing and testing components to make said strategy decisions, founded upon development of reasonably faithful simulation models of the players and the Jeopardy! game environment. We describe machine learning and Monte Carlo methods used in simulations to optimize the respective strategy algorithms. Application of these methods yielded superhuman game strategies for IBM Watson™ that significantly enhanced its overall competitive record. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • In the game: The interface between Watson and Jeopardy!

    Publication Year: 2012 , Page(s): 17:1 - 17:6
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (734 KB) |  | HTML iconHTML  

    To play as a contestant in Jeopardy!™, IBM Watson™ needed an interface program to handle the communications between the Jeopardy! computers that operate the game and its own components: question answering, game strategy, speech, buzzer, etc. Because Watson cannot hear or see, when the categories and clues were displayed on the game board, they were also sent electronically to Watson. The program also monitored signals generated when the buzzer system was activated and when a contestant successfully rang in. If Watson was confident of its answer, it triggered a solenoid to depress its buzzer button and used a text-to-speech system to speak its response. Since it did not hear the host's judgment, it relied on changes to the scores and the game flow to infer whether its answer was correct. The interface program had to use what were sometimes conflicting events to determine the state of the game without any human intervention. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IBM Journal of Research and Development is a peer-reviewed technical journal, published bimonthly, which features the work of authors in the science, technology and engineering of information systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Clifford A. Pickover
IBM T. J. Watson Research Center