Part of the draft C++ standard, STL provides the
framework for building generic, highly reusable algorithms
and data structures
Alexander Stepanov
In every programming language, there's a need for various
data structures, such as vectors, lists, and associative
arrays. Programmers also need fundamental algorithms -- for
sorting, searching, and copying -- defined for the data
structures. It has long been lamented that C++ doesn't
provide a good set of standard data structures.
But at last this problem has been remedied. The Standard
Template Library is a framework of data structures (called
containers in STL) and algorithms accepted as part
of the draft C++ standard. A reference implementation of STL
has bee n put into the public domain by Hewlett-Packard (it
can be downloaded from butler.hpl.hp.com), and a growing
number of commercial vendors are now shipping STL.
In the short time since its release, STL has generated
many emotional -- and conflicting -- assessments. On one
hand, for example, Bjarne Stroustrup of Bell Laboratories
calls it a "large, systematic, clean, formally sound,
comprehensible, elegant, and efficient framework." On the
other hand, Pamela Seymour of Leiden University writes that
"STL looks like the machine language macro library of an
anally retentive assembly language programmer."
Goal: Generality + Efficiency
STL is not an attempt to impose yet another standard on a
suffering humanity. And it was not designed by or for a
committee. It is the result of over 15 years of research in
generic programming that I've done in different places, with
different collaborators, and in different programming
languages. I did this research with a concrete goal in mind:
to find a way to write algorithms in the most general way,
but in such a way that their abstractness would not impose
any performance penalty.
What do I mean by "in the most general way"? Simply that
an algorithm works on all data types for which it makes
sense. For example, a linear-search algorithm is written in
the most general way if it can search any data structure for
which the operations of looking at data, going to the next
data element, and indicating the end of the search range are
defined. So, it should work for an array, a singly linked
list, a doubly linked list, a file, and even a binary tree.
An algorithm should also work for portions of such
structures. For example, you might want to search half a
list or sum the set of elements in an array that are n
spaces apart (i.e., a stride).
What do I mean when I say that an algorithm does not
"impose any performance penalty"? In other words, how do you
know that a generic algorithm is efficient? An algorithm is
called rel atively efficient if it's as efficient
as a nongeneric version written in the same language, and
it's called absolutely efficient if it's as
efficient as a nongeneric assembly language version.
For many years, I tried to achieve relative efficiency in
more advanced languages (e.g., Ada and Scheme) but failed.
My generic versions of even simple algorithms were not able
to compete with built-in primitives. But in C++ I was
finally able to not only accomplish relative efficiency but
come very close to the more ambitious goal of absolute
efficiency. To verify this, I spent countless hours looking
at the assembly code generated by different compilers on
different architectures.
I found that efficiency and generality were not
mutually exclusive. In fact, quite the reverse is true.
If a component is not efficient enough, it usually means
that it's not abstract enough. This is because efficiency
and abstractness both require a clean, orthogonal design. A
similar phenomenon occurs in mathematics: Making a proof
more abstract makes it more concise and elegant.
Orthogonal Component Space
The past 25 years have seen attempts to revolutionize
programming by reducing all programs to a single conceptual
primitive. Functional programming, for example, made
everything into a function; the notions of states,
addresses, and side effects were taboo. Then, with the
advent of object-oriented programming (OOP), functions
became taboo; everything became an object (with a state).
STL is heavily influenced by both functional programming
and OOP. But it's not a single-paradigm library; rather,
it's a library for general-purpose programming of von
Neumann computers.
STL is based on an orthogonal decomposition of component
space. For example, an array and a binary search should not
be reduced to a single, fundamental notion. The two are
quite different. An array is a data structure -- a
component that holds data. A binary search is an
algorithm -- a component that performs a computation on data
stored in a data structure. As long as a data structure
provides an adequate access method, you can use the
binary-search algorithm on it. Only by respecting the
fundamental differences of arrays and binary searches can
efficiency and elegance be simultaneously achieved.
Iterators
The key to STL is the notion of iterators ,
which are generalized pointers that provide a glue for
connecting algorithms and data structures. STL is indeed
retrograde in its disregard of the current academic dogma
suggesting that pointers are evil. Instead of hiding
pointers behind value semantics, it makes them the
corner-stone of the design. The decision to bring pointers
back into the realm of respectability was based on a simple
fact: Most things in programming resemble pointers in that
they identify a location of data. For instance, Internet
addresses, SCSI addresses, and file descriptors all function
as pointers.
Consider the task of printing a list of productive
employees (see the listing "Printing
Names of Productive Employees" ). The employees' names
are stored in a vector , an STL version of a
one-dimensional dynamic array. To print the names of
productive employees, you use the STL function
remove_copy_if() , which scans the range of
elements from its first argument up to, but not including,
its second argument and copies those that do not satisfy a
predicate (its fourth argument) into positions starting from
its third argument. (For most people, the code is clearer
than the explanation.) The functions begin() and
end() return iterators pointing to the first
element and past the last element in the vector,
respectively. (STL requires that for every container, the
number of valid iterators pointing to it is one greater than
the number of elements in the container.) The STL component
ostream_iterator provides an iterator-like
interface to an output stream.
It's imp ortant to note that if you later decide to put
employees' names in a list instead of in a vector, you do
not have to change anything except the declaration of the
variable all . The remove_copy_if()
function works for vectors, lists, deques, and sets
(which are all STL components), as well as for any
user-defined container that provides STL-conforming
iterators. It also works for regular C arrays.
Iterator Categories
STL classifies iterators into five categories: input,
output, forward, bidirectional, and random-access. These
iterator categories are sets of requirements for operations
that are supported by concrete iterator types. An important
experimental discovery I made was that hundreds of different
practical algorithms can be written in terms of these
abstract categories.
STL specifies a set of valid expressions for each
category's iterators, as well as precise semantics for each
iterator's usage. For example, given that i is a
value of a typ e that belongs to a bidirectional iterator
category, if ++i is defined, then -- (++i) == i
. STL also prescribes certain complexity requirements
for these expressions. Users are thereby guaranteed that
algorithms written in terms of these abstract interfaces
will work effectively.
Different algorithms require different kinds of
iterators, and different algorithms are needed to perform
different operations on different data structures. STL uses
a novel language technique that selects the right algorithm
at compile time, depending on the iterator category.
Generic Algorithms
The listing "An
STL Implementation of remove_copy_if() "
illustrates how STL deals with iterators. What's most
striking is the fact that it looks just like regular C code;
only the signature is different. In fact, I've found that C
programmers find it quite easy to start programming in STL
even when they don't know C++, because the underlying idioms
are alrea dy familiar to them. The fact that all the
iterator categories are abstracted from pointers ensures
that there is an efficient implementation for them.
In computer science it's important to base abstractions
on efficient models. In other words, I believe that
remove_copy_if() is efficient because it generates
good code when used with plain C arrays. In fact, if you use
remove_copy_if() with STL function objects rather
than with pointers to functions, as I did in the listing "Printing
Names of Productive Employees," you can obtain code that
is often just as efficient as hand-written assembly code.
The Future
It is my hope that STL will prove to be the beginning of
a long process of developing systematic catalogs of highly
parameterized software components. The ANSI/ISO C++ standard
committee saw the promise of STL and provided a conduit
through which generic programming could reach working
programmers.
I'd like to use this opp ortunity to advocate the
creation of an industrywide consortium for developing new
generic components. No single company can accumulate the
algorithmic expertise that is needed for such an activity.
And it is in everybody's interest that all the fundamental
algorithms and data structures be universally and
inexpensively available.
ACKNOWLEDGMENTS
I would like to express my gratitude to Bjarne
Stroustrup, Ross Towle, Jim Dehnert, Suresh Srinivas, Mary
Loomis, and Larry Rosler for reviewing this article.
vector<Employee> all;
bool is_manager(const Employee& x) {
return x.title == manager }
...
remove_copy_if(
all.begin(),
all.end(),
ostream_iterator<Employee>(cout),
is_manager);
template <clas
s InputIterator, class OutputIterator, class Predicate>
OutputIterator remove_copy_if(InputIterator first, InputIterator last,
OutputIterator result, Predicate pred) {
while (first != last) {
if (!prod(*first)) *result++ = *first;
++ first; }
return result; }
Alexander Stepanov is a member of the technical
staff at Silicon Graphics, Inc. (Mountain View, CA), where
he works on C++ libraries and tools. Prior to joining SGI,
he worked for HP Labs, AT&T Bell Labs, Polytechnic
University (Brooklyn, NY), and GE R&D. He has worked in
such areas as generic software libraries, storage systems,
OSes, compilers, and path-planning algorithms for robots. He
can be contacted on the Internet at mailto:stepanov@mti.sgi.comor
on BIX c/o "editors."