Image Understanding (IU) is the process to understand the content of images in order to automate visual tasks by computers.

A visual task is some activity which relies on vision. Usually the "input" to the activity is an image or a sequence of images, and the "output" may be decisions, descriptions, actions, or reports.

 There are several reasons that computers are more suitable than humans for visual tasks:

  • Fatigue - Tasks which suffer if the human get tired or loose concentration.
    Examples are: Industrial inspection, video-based security systems.
  • Too Expensive - Tasks which require specialized training, resulting in human resources that are rare and costly. Often, the visual task represents only a portion of their training.
    Examples are: Medical screening for tumors, Intelligence gathering from satellite imagery.
  • Quantification - Tasks that humans do poorly because visual items need to be measured accurately.
    Examples are: progress of disease, efficiency of medication, growth of cracks in welding, number of specific cells in a microscope slide.
  • Excessive Data - Tasks which have too much data for effective application of humans.
    Examples are: counting the potholes in highways, inspection of every bottle in a bottling plant, counting of blood cells
  • Too Dangerous - Tasks which put humans into dangerous situations are likely candidates for automation.
    Examples are Power plant inspection, Oil production platforms

The technical challenge is to make the computer understand the content of the images and act accordingly. In order to solve the problems involved, we usually operate in specific "domain" of interest. Typically, in a domain there are named objects and characteristics that can be used  to make decisions or perform actions. Obviously, there is a wide gap between the nature of a digital image (essentially arrays of numbers) and the description of the content. It is the bridging of this gap that has kept researchers busy over the last two decades in the fields of Artificial Intelligence, Scene Analysis, Image Analysis, Image Processing, and Computer Vision. We summarize these fields into the term Image Understanding.

In order to make the link between raw image data and image understanding, a low and intermediate level of processing is introduced.  The low level processing usually involves preprocessing such as noise- and distortion reduction and certain important aspects of the imagery are emphasized. Then, in the intermediate level, the image is segmented. Typically, these segments are blobs, edges, lines, corners, regions, etc.  The segments are usually  free of domain information - they are not specifically objects or entities of the domain of understanding, but they contain  spatial, geometric and other general types of information. It is this intermediate  information that can be analyzed in terms of the domain in order to extract features and classify the content:

The image processing pipeline.

Various techniques are used for understanding the content of the image . One example is "model matching" where stored geometric descriptions of objects of the domain are matched with extracted features from the images, another example is classification using Neural Networks.

 Techniques are called "bottom-up" when the primary direction of flow of processing is from lower abstraction levels (images) to higher levels (objects), and conversely "top-down" when the processing is guided by expectations from the domain.


Last modified: 20.06.2006 by Per Kristen Fredlund Copyright © 2002-2006  Trollhetta AS