Image Understanding (IU) is
the process to understand the content of images in order to automate visual tasks
by computers.
A visual task is some activity which relies on vision.
Usually the "input" to the activity is an image or a sequence of images, and
the "output" may be decisions, descriptions, actions, or reports.
There are several reasons that computers are more
suitable than humans for visual tasks:
- Fatigue - Tasks which suffer if the human
get tired or loose concentration.
Examples are: Industrial inspection, video-based security systems.
- Too Expensive - Tasks which require
specialized training, resulting in human resources that are rare and costly.
Often, the visual task represents only a portion of their training.
Examples are: Medical screening for tumors, Intelligence gathering from
satellite imagery.
- Quantification - Tasks that humans do poorly
because visual items need to be measured accurately.
Examples are: progress of disease, efficiency of medication, growth of cracks
in welding, number of specific cells in a microscope slide.
- Excessive Data - Tasks which have too much
data for effective application of humans.
Examples are: counting the potholes in highways, inspection of every bottle
in a bottling plant, counting of blood cells
- Too Dangerous - Tasks which put humans into
dangerous situations are likely candidates for automation.
Examples are Power plant inspection, Oil production platforms
The technical challenge is to make the
computer understand the content of the images and act accordingly. In order to
solve the problems involved, we usually operate in specific "domain" of interest. Typically, in a
domain there are named objects and characteristics that can be used to make
decisions or perform actions. Obviously, there is a wide gap between the nature of
a digital image (essentially arrays of numbers) and the description of the
content. It is the bridging of
this gap that has kept researchers busy over the last two decades in the
fields of Artificial Intelligence, Scene Analysis, Image Analysis, Image
Processing, and Computer Vision. We summarize these fields into the term Image
Understanding.
In order to make the link between raw image data and
image understanding, a low and intermediate level of processing is introduced.
The low level processing usually involves preprocessing such as noise- and distortion reduction and certain important aspects
of the imagery are emphasized. Then, in the intermediate level, the image is
segmented. Typically, these segments are
blobs, edges, lines, corners, regions, etc. The segments are usually
free of domain information - they are not specifically objects
or entities of the domain of understanding, but they contain spatial,
geometric and other general types of information. It is this intermediate information that can be
analyzed in terms of the domain in order to extract features and classify the
content:
The image processing pipeline.
Various techniques are used for understanding
the content of the image . One example
is "model matching" where stored geometric descriptions of objects of
the domain are matched with extracted features from the images, another example
is classification using Neural Networks.
Techniques are
called "bottom-up" when the primary direction of flow of processing is
from lower abstraction levels (images) to higher levels (objects), and
conversely "top-down" when the processing is guided by expectations
from the domain.