Data Science
> Frontier's 95 Poster Paper

Iterative Parallel Region Growing on the MasPar MP-2

		      James C. Tilton
		NASA Goddard Space Flight Center
		       Mail Code 935
		    Greenbelt, MD 20771
		       (301) 286-9510
		 tilton@chrpisis.gsfc.nasa.gov
Abstract. At Frontiers' 90 we reported on our port of an Iterative Parallel Region Growing (IPRG) algorithm onto the MasPar MP-1 from the now decommissioned Massively Parallel Processor (MPP) at the NASA Goddard Space Flight Center. Since then we improved the efficiency of our implementation, the MasPar MP-1 has been upgraded to a MasPar MP-2, and we have implemented additional dissimilarity criteria for region growing. We have also further investigated the use of the algorithm in Earth science applications. After a brief background section, we describe the IPRG algorithm, and give some implementation considerations together with timing results from different implementations. Finally we show segmentation results for a Landsat Thematic Mapper image.

Background

Early approaches to region growing had the disadvantage that the regions produced depended on the order in which portions of the image were processed. But Schachter, et al (1979) suggest that implementing region growing as "an iterative parallel process" would overcome the order dependence problem. An ideal solution to this problem would be an iterative parallel process that would merge at each iteration the single most similar pair of spatially adjacent regions over the entire image. This is the approach suggested by Tilton and Cox (1983) and by Beaulieu and Goldberg (1989). However, this ideal approach is impractical for all but very small images, even when implemented on a massively parallel computer.

A compromise solution to this problem is to merge a selected set of region pairs, in parallel, each iteration. The problem reduces to finding an effective approach for finding some best set of region pairs to merge at each iteration. Our solution is to define subimages centered on each region (the subimages may overlap), and to merge the most similar pair of regions within each subimage each iteration. The specification of the subimages is given in the description of the Iterative Parallel Region Growing (IPRG) algorithm given in a later section.

IPRG Algorithm

The basic outline of the IPRG algorithm has changed very little from our earliest work (see Tilton and Cox, 1983; Tilton, 1984; Tilton 1988; and Tilton, 1991b). The basic outline is as follows:

  1. Initialize by labeling each pixel as a separate region.
  2. Merge all spatially adjacent pixels that have identical feature values.
  3. Calculate a (dis)similarity criterion between each pair of spatially adjacent regions.
  4. Merge pairs of regions that meet the merge constraints.
  5. Check for convergence. If converged, stop. Otherwise return to step 3.
The key elements of the IPRG algorithm are the similarity criterion (step 3), the merge constraints (step 4) and the convergence criterion (step 5). We will describe each of these elements in turn in the following sections.

Similarity Criterion

In general, the best similarity criterion depends on the application the resulting segmentations will be used for, and on the characteristics of the image data. Nevertheless, we have developed and studied a few general purpose similarity criterion for use with the IPRG algorithm, and it is of general interest to determine how the IPRG algorithm behaves with these criterion. As the IPRG algorithm is used in more applications, we expect that other similarity criterion tailored to specific applications will be developed.

The original IPRG algorithm (then called SCC, see Tilton and Cox, 1983) used a similarity criterion based on a combination of a pair of statistical hypothesis tests. Both tests consider whether a pair of regions come from the same probability distribution. The first test checks the mean values of the regions, assuming unequal region variances, and the second test checks the variances of the regions. The current MasPar implementation of IPRG does not employ this similarity criterion. How to define a variance for a one pixel region is one reason this criterion is not currently implemented. The earlier implementation of IPRG could use this criterion without complication since it started from 2x2 pixel regions rather than 1 pixel regions.

The current MasPar implementation of IPRG uses simpler similarity criteria (actually, dissimilarity criteria) based on minimizing the increase in mean square error and on minimizing the change in image entropy as regions grow. (These are dissimilarity criteria rather than similarity criteria because their values increase with increasing dissimilarity between a pair of regions.) For a complete description of these dissimilarity criteria see Tilton (1991b).

Baraldi and Parmiggiani (1994b) propose a dissimilarity criterion based on "vector degree of match." They construct their Normalized Vector Distance (NVD) as follows: They define the factors s1 and s2 based on the match, respectively, between the modulus and the angle of the two vectors. These factors are constructed such that s1 = 1 and s2 = 1 signify vector equivalence. Both s1 and s2 range from 0 to 1. They then define s as the product of s1 and s2. Finally, NVD is defined as 1-s.

Schoenmakers (1994) proposes a dissimilarity criterion based on Euclidean spectral distance, defined simply as (for regions i and j):
Cij = [(µ1i - µ1j)2 + ... + (µci - µcj)2]0.5 where c is the number of channels.

The current IPRG implementation has an option to select any of the dissimilarity criterion discussed above, viz. Euclidean Spectral Distance, Normalized Vector Distance, band average Normalized Mean Squared Error, band maximum Normalized Mean Squared Error, and Entropy Difference (summed over bands).

Both Baraldi and Parmiggiani (1994b) and Schoenmakers (1994) note that their dissimilarity criterion are not influenced by the size of the regions being compared. In contrast, the Mean Squared Error and Entropy based criterion explicitly contain reference to region size in their formulation such that merges of small regions into larger ones are favored. We will see how this bias against small regions affects the segmentation results in the "Segmentation Results" section below.

Merge Constraints

The original IPRG algorithm simply merged the single most similar pair of spatially adjacent regions over the entire image in each iteration (Tilton and Cox, 1983). It is more efficient, however, to perform parallel merges of a set of region pairs each iteration. The set of regions pairs merged each iteration are selected through merge constraints. The IPRG merge constraints are defined with respect to (possibly) overlapping subimages centered on each region. Before we can further discuss merge constraints, we must first define these subimages and merge constraint levels.

Subimages. A subimage with respect to a particular region can be defined recursively as follows: A level 0 subimage for any region is the empty set. A level 1 subimage, with respect to a region, is the region itself. A level 2 subimage, with respect to a region, is the level 1 subimage, with respect to that region, plus all regions that are spatially adjacent to the level 1 subimage. Finally, a level n subimage, with respect to a region, is the level n-1 subimage, with respect to that region, plus all regions that are spatially adjacent to the (n-1)st level subimage.

Merge Constraint Levels. Merge constraint level n signifies that merges are constrained to be the best merge within the union of the level n subimages with respect to each of the potentially merging region pairs. Merge constraint level n-0.5 signifies merges are constrained to the best merge within the level n subimage with respect to only one of the potentially merging region pairs. (By the definition of a level n subimage, the level n-1 subimage of a region is contained within the level n subimage of an adjacent region. Thus, a n-0.5 merge constraint level, is equivalent to constraining to the best merge within the union of the level n subimage for one region and the level n-1 subimage for the other region. The notation "n-0.5 merge constraint level" thus refers to the average level of the two subimages controlling the merging process.)

Employing merge constraint level 0.5 in the IPRG algorithm is equivalent to performing the best merge for each region in each iteration, without regard to what the best merge is for any neighboring region. This means that any particular region can be involved in more than one merge each iteration: its best merge plus the best merge of any other neighboring region. A combination of a 0.5 merge constraint level with an ad hoc global threshold is equivalent to the merge constraint used by an earlier version of the IPRG algorithm (see Tilton, 1988).

The merge constraint level 1.0 is equivalent to performing the pairwise mutually best merges for all pairs of spatially adjacent regions (with ties broken arbitrarily). The IPRG algorithm can also employ merge constraint levels higher than 1.0, in addition to levels 0.5 and 1.0. The higher the merge constraint level, the closer IPRG comes to matching the scheme of one best merge per iteration over the whole image. However, a merge constraint level of 1.0 is often high enough to give reasonable results.

Convergence Criterion

A satisfactory general convergence criterion has not been developed for the IPRG algorithm. This is an area of active research. One can avoid using an automatic convergence criterion by running the IPRG algorithm until the entire data set has been merged into one all-encompassing region (while recording the edge map at each step), and then back-track through the hierarchical segmentation obtained from this process and select the most satisfactory segmentation based on visual criteria. This approach is useful only if the segmentation obtained a one particular iteration is satisfactory for the entire image. Generally, however, no one iteration produces a segmentation that is totally satisfactory for the entire image. While a particular iteration may produces the most satisfactory segmentation in one part of the image, another iteration will most likely produce the most satisfactory segmentation in another part of the image. Ideally, one would select the segmentation from one iteration on one part of the image, and from another iteration in another part of the image. We have experimented with an interactive tool, called HSEGEXP, which facilitates this exploration of the hierarchical segmentation produced by the IPRG algorithm (Tilton, 1991a).

This approach is generally too tedious to use except for relatively small images (say, 256-by-256). This approach can be made somewhat less tedious by recording the hierarchical segmentation at selected points in the region growing process. We have experimented with recording the segmentations after the IPRG algorithm converged at sets of increasing dissimilarity criterion values. See the Segmentation Results section below.

Implementation Considerations

The IPRG algorithm was originally using "cut-and-stack" virtualization on the MasPar MP-2 SIMD computer. This type of virtualization was used only because it was the only virtualization available at the time the IPRG algorithm was first implemented. Unfortunately, it is a very inefficient virtualization for the IPRG program, since the IPRG program relies heavily on nearest neighbor data movement. For pixels lying across a cut from a neighbor, a data movement across the entire parallel array is necessary to access information from that neighbor under cut-and-stack virtualization.

The hierarchical virtualization supported by MasPar is a much better virtualization for the IPRG program. Under this virtualization, a pixel's nearest neighbor is always either in the same processor or in a physically adjacent processor. We used MasPar's Image Processing Library and Pointwise Routine Generator to implement the IPRG program under this virtualization.

Some comparative timing results:

                     Comparative Timing Results
   (Merge Control Level = 1.0, band average Mean Squared Error)

                                   Virtualization          Speed-up
     Data Set           Cut-and-Stack      Hierarchical     Factor
-------------------     -------------      ------------    --------

2048-by-2048 pixels
7 spectral band        5 hours 28 minutes   42 minutes       7.8   
(down to about 
3300 regions)

1024-by-1024 pixels
7 spectral bands 	 5576 seconds	   969 seconds       5.7
(down to 1 region)     (about 1.5 hours)  (about 16 minutes)

368-by-468 pixels
7 spectral bands	  656 seconds      150 seconds       4.4
(down to 1 region)

512-by-512 pixels
4 spectral bands	  781 seconds	   195 seconds       5.0
(down to 1 region)
The IPRG program can produce (at the user's option) region label maps, region mean value image, and/or a complete segmentation history (in the form of a hierarchical edge map). To make it possible to store the complete hierarchical segmentation history in byte arrays (limited to 255 levels), the IPRG program performs edge map compaction. This compaction is done such that the segmentation order is preserved, though the absolute iteration at which a particular merge occurred is lost. The original implementation of this edge map compaction was very inefficient, partly because it was a separate subroutine that was called only when the edge map levels reached 255. A new very efficient method of edge map compaction has now been implemented which compacts the edge map "on the fly" during each iteration.

Until recently, edge map compaction was thought to be very important for IPRG segmentation of moderate to large images. Edge map was required in order to effectively inspect the hierarchical segmentation and select the appropriate final segmentation. However, recently a new approach was devised in which the IPRG segmentation is run to convergence at a sequence of dissimilarity function threshold values, and the hierarchical edge map is stored only at these points (typically 10 to 20 hierarchical levels). Obviously, edge map compaction is not required with this new approach.

Nevertheless we report here the substantial improvement in processing speeds with the new edge map compaction method:

                     Comparative Timing Results
   (Merge Control Level = 1.0, band average Mean Squared Error)

		          Old edge           New edge      Speed-up
     Data Set          map compaction     map compaction    Factor
-------------------    --------------     --------------   --------

2048-by-2048 pixels
7 spectral bands       18 hours 42 min.    1 hour 48 min.    10.4   
(down to 1 region)

1024-by-1024 pixels
7 spectral bands 	 969 seconds         841 seconds     1.15
(down to 1 region)    (about 16 minutes) (about 14 minutes)

368-by-468 pixels
7 spectral bands	 150 seconds	     113 seconds     1.33
(down to 1 region)

512-by-512 pixels
4 spectral bands	 195 seconds	     153 seconds     1.27
(down to 1 region)
The combined effect of the efficiency improvements in edge map compaction and virtualization are as follows: An approximate 80 time speed-up was seen for a 2048-by-2048 pixel 7-band image (current absolute runtime: 1 hour, 48 minutes). A speed-up of 6.7 times was seen for a 1024-by-1024 pixel 7-band image (current absolute runtime: 14 minutes). A speed-up of 5.1 times was seen for a 512-by-512 pixel 4-band image (current absolute runtime: 2.5 minutes). Absolute run-times are given for processing down to 1 region. Specifying an earlier convergence will give faster execution times.

Segmentation Results

The figures show segmentations (with merge control level 1.0) for three of the dissimilarity criteria: band maximum Mean Squared Error, band average Mean Squared Error, Entropy difference, Normalized Vector Distance (aka Vector Degree of Match), and Euclidean Spectral Distance. Segmentations for 16 different convergence thresholds are shown for each criteria. The convergence thresholds were chosen to give a sequence of segmentations with roughly the same number of regions for corresponding segmentations for each convergence criteria. The image data is a Landsat Thematic Mapper image from June 1991 from over Portugal (provided by Eurimage via R. Schoenmakers (1994)). The image size is 256-by-256 pixels with 6 spectral bands. In the figures, the region boundaries are shown in white overlaying a red-green-blue composite image using bands 5, 4 and 3, respectively.

(NOTE: The images linked to below have been lost. Sorry!)

Image segmentation sequence for band maximum Mean Squared Error: 65507 regions, 32549 regions, 19846 regions, 9801 regions, 5193 regions, 2619 regions, 1310 regions, 618 regions, 324 regions, 197 regions, 99 regions, 53 regions, 27 regions, 13 regions, 7 regions, and 1 region.

Image segmentation sequence for band average Mean Squared Error: 65507 regions, 32609 regions, 19647 regions, 9802 regions, 5257 regions, 2603 regions, 1302 regions, 657 regions, 326 regions, 197 regions, 99 regions, 52 regions, 26 regions, 13 regions, 8 regions, and 1 region.

Image segmentation sequence for Entropy difference: 65507 regions, 32937 regions, 19660 regions, 9872 regions, 5221 regions, 2627 regions, 1317 regions, 657 regions, 333 regions, 198 regions, 98 regions, 53 regions, 26 regions, 13 regions, 7 regions, and 1 region.

Image segmentation sequence for Normalized Vector Distance (Vector Degree of Match): 65507 regions, 33020 regions, 19301 regions, 9798 regions, 5289 regions, 2627 regions, 1329 regions, 660 regions, 328 regions, 197 regions, 94 regions, 50 regions, 23 regions, 14 regions, 7 regions, and 1 region.

Image segmentation sequence for Euclidean Spectral Distance: 65507 regions, 32732 regions, 19802 regions, 9843 regions, 5227 regions, 2612 regions, 1318 regions, 651 regions, 326 regions, 197 regions, 98 regions, 52 regions, 26 regions, 13 regions, 7 regions, and 1 region.

The results clearly show the bias of the Mean Squared Error based dissimilarity criterion against small regions. However, which dissimilarity criterion is best still depends on the particular application being pursued.

References

Baraldi, A. and F. Parmiggiani. 1994a. "Segmentation Driven by an Iterative Pairwise Mutually Best Merge Criterion," submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence.

Baraldi, A. and F. Parmiggiani. 1994b. "Region Growing Based on the Vector Degree of Match," submitted to IEEE Transactions on Geoscience and Remote Sensing.

Beaulieu, J.-M. and M. Goldberg. 1989. "Hierarchy in Picture Segmentation: A Stepwise Optimization Approach," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 2, pp. 150-163.

Schachter, B. J., L. S. Davis and A. Rosenfeld. 1979. "Some Experiments in Image Segmentation by Clustering of Local Feature Values," Pattern Recognition, Vol. 11, No. 1, pp. 19-28.

Schoenmakers, R. P. H. M. 1994. "Improved Best Merge Region Growing," draft of Chapter 6 of a Ph. D. Thesis, University Nijmegen, The Netherlands.

Tilton, J. C. 1984. "Multiresolution Spatially Constrained Clustering of Remotely Sensed Data on the Massively Parallel Processor," Digest of the 1984 International Geoscience and Remote Sensing Symposium, Strasbourg, France, Aug. 27-30, 1984, pp. 661-666.

Tilton, J. C. 1988. "Image Segmentation by Iterative Parallel Region Growing with Applications to Data Compression and Image Analysis," Proceedings of the 2nd Symposium on the Frontiers of Massively Parallel Computation, Fairfax, VA, Oct. 10-12, 1988, pp. 357-360.

Tilton, J. C. 1991a. "A Tool for Interactive Exploration of a Hierarchical Segmentation," Proceedings of the 1991 International Geoscience and Remote Sensing Symposium, Helsinki, Finland, June 3-6, 1991, pp. 1099-1101.

Tilton, J. C. 1991b. "Experiences using TAE-Plus Command Language for an Image Segmentation Program Interface," Proceedings of the TAE Ninth Users' Conference, New Carrollton, MD, Nov. 5-7, 1991, pp. 297-312.

Tilton, J. C. and S. C. Cox. 1983. "Segmentation of Remotely Sensed Data using Parallel Region Growing," Digest of the 1983 International Geoscience and Remote Sensing Symposium, San Francisco, CA, Aug. 31 - Sept. 2, 1983, pp. 9.1-9.6.

Willebeek-LeMair, M. and A. P. Reeves. 1988. "Region Growing on a Highly Parallel Mesh-Connected SIMD Computer," Proceedings of the 2nd Symposium on the Frontiers of Massively Parallel Computation, Fairfax, VA, Oct. 10-12, 1988, pp. 93-100.