Skip to Main Content
Human beings and the computer systems they design generally operate best with information sources that are organized. Information in these sources is typically stored in a predefined format, facilitating its search, retrieval, and analysis. In real life, however, for every source of structured information (such as a database of purchasing records), there are many sources of unstructured information (such as natural language documents, still images, and video files). It is estimated that 80 to 85 percent of all corporate information is unstructured, and with the growth of the Internet and corporate Intranets, the volume and heterogeneity of this information has increased prodigiously.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.