Data Base Administration and Data Communications
Administration review the preliminary system and data base
designs to become familiar with data and processing
requirements. From these specifications, DBA/DCA will devise
a method of implementation that will be reviewed with management
in the next activity. This process could potentially be
extensive. As such, a Detail estimate/schedule is initially
prepared only for Activity A. By the end of the activity,
DBA/DCA will have greater awareness of design requirements and
will be in a position to effectively prepare a complete Detail
estimate/schedule for the remaining activities of Phase 4
(B - E).
In contrast with the logical models, the EPDBM is quite
heterogeneous, not only because it may have to represent the
structures of various DBMS products, conventional files and even
manual files, but also because of the need to cope with existing
structures which may or may not be represented in the logical
models.
Data Base Administration is faced with the following
hardware/software choices for each newly created record type
within a new or existing ELDBM Object:
- Manual File
- Conventional Computer File
- Hierarchical DBMS
- Network DBMS
- Relational DBMS
The next section discusses the situations under which each
selection is preferable.
MANUAL FILES
There are numerous situations where manual files are the
best selection. Signed documents, contracts, and many other
kinds of paper documents are still stored in most enterprises
and will remain in use until we become a completely paperless
society (if that ever happens). Although we should not document
every piece of paper in a Data Base Model, some are of great
importance for the conduct of business. If they were not
included in these models, the business intelligence provided
would be grossly incomplete.
Documents can be defined as inputs and outputs depending on
how they are used. If they are used to collect data (as is the
case for most forms), they are regarded as inputs. If they are
printed reports, they are regarded as outputs. There may be
occasions where a form is used as a "turnaround document."
In other words, it may be used to collect data and then be
presented as a report. In this situation, the form/report is
documented as both an input and an output.
CONVENTIONAL COMPUTER FILES
In the absence of a DBMS, this is obviously the only
choice, but even if a DBMS is available, one may choose to use
conventional files for a variety of reasons, depending on the
characteristics of the applications involved. In general, they
can be sequential, indexed or direct access, with various
options that depend on the particular operating system being
used.
If a response time of hours or more is acceptable to all
applications that use a record, and relationships with other
records have little or no use, batch processing of sequential
files may be the most economical solution. For example, many
large commercial banks process hundreds of thousands of
transactions a day that come in the mail or through Clearing
Houses. There is no point in providing response times of
seconds to these transactions (which would be near the limit of
the fastest DBMS with currently available hardware) since they
have a daily processing cycle.
At the other extreme, some real time control applications
requiring split-second response times must also use flat direct
access files since available DBMS products are slower.
These are just some of the situations where the use of
conventional files is preferred to a DBMS, even considering that
the conventional files do not have the same support for sharing,
data independence, reliability and other benefits of DBMS
technology.
HIERARCHICAL DBMS
Hierarchical DBMS packages have been very popular for many
years, probably because much of human endeavor is hierarchical,
especially if viewed within a limited environment. Most basic
(operational level) information systems can use trees as the
basic data structure. It is when higher level integration is
required for control or policy level information systems that
the restrictions of this structure appear. Since they are the
oldest on the market, their manufacturers have had ample time
for fine tuning. This puts them among the fastest of the DBMS
types, and in some high transaction rate applications, this
makes them the only choice, provided that the limitations of
the hierarchical structure are acceptable.
IBM's IMS is the foremost representative of the
hierarchical DBMS packages and the one most widely used. Its
terminology will therefore be used in our descriptions of the
mapping from the EPDBM to a hierarchical DBMS.
The main price to be paid in using such a DBMS is in the
additional program complexity to handle multiple hierarchies as
will always be the case in an integrated environment.
NETWORK DBMS
A network DBMS is nearly as fast as a hierarchical DBMS and
has extra flexibility in providing relationships. A network is
clearly more complex than a hierarchy, but simpler than a large
number of interconnected hierarchies which would be required for
an integrated Enterprise Data Base.
The CODASYL DBTG, which is closely followed by a number of
commercial DBMS products, enhances portability of applications
among a variety of different machines which may be of paramount
importance in some situations.
Another major advantage of this approach is the ability to
explicitly specify and enforce properties related to referential
integrity (through set Membership clauses).
Networks are appropriate for operational as well as control
and policy level applications due to the ability to "navigate"
through access paths and find related records. However, this
places the burden of selecting access paths on the programmer,
increasing complexity and reducing data independence.
RELATIONAL DBMS
The advantages of the relational approach have been well
publicized and are very real. Among the most important are:
- Enhanced Data Independence - the freedom from having to follow
access paths in searching for data greatly reduces program
complexity, increasing productivity. Even more important, it
allows changes to these access paths to be transparent to the
programs that use them.
- Conceptual Soundness and Simplicity - the relational approach
involves a simple data structure, easily understandable by any
user and operators that behave in a well defined way. This
simplifies communication with end-users, allowing them to
directly access the data base through ad-hoc queries.
The advantages are great in terms of usability and
productivity by both end-users and DP professionals which is
enhanced by the standardization of the SQL language. The price
to be paid is the increased resource utilization, a small one
in most practical cases. Also, in terms of performance, all but
very high volume transaction oriented applications can usually
be satisfied.
HARDWARE SELECTION
In addition to the software choices, final hardware
decisions must be made at this point. A wide range of
alternatives is available in general, but in practice one is
usually confined to existing hardware except in very large and
important projects. We will not discuss these alternatives in
detail here, but just mention that dedicated database machines,
networks of mini, micro and/or mainframes are among them. Data
communications and distributed data base hardware and software
may be required and must be fully cost justified at this point.
Selection of the proper technology for a particular
implementation is a technically complex task, particularly in an
integrated environment, since one has to balance the potentially
conflicting performance requirements of various applications.
In many cases, Data Base Administration must compromise
performance of some applications in order to optimize others.
The political implications of this may be far reaching and are
certainly harder to deal with than the technical issues.
The decisions on what to optimize must be taken, if not
with the approval of all parties involved, at least with their
knowledge. Few things can be more disastrous than a sudden
performance degradation on a user's system caused by the
integration or optimization of some other system.
Very often, the best technical solution for a new
implementation conflicts with the existing DBMS. Although it is
not unusual to have more than one DBMS in an installation, Data
Base Administration should avoid excessive proliferation of
different packages not only because extra effort is needed to
coordinate them, but also because of the additional expertise
that has to be developed and maintained to support the different
DBMS packages.
The logical structures in the ELDBM can be implemented
through any DBMS. Therefore, when in place, it can serve as a
guide to which systems are to be developed or redesigned. This
constitutes a good opportunity to standardize and move towards
the use of as few as possible different DBMS packages.
WHO SHOULD PARTICIPATE?
Data Base Administration and Data Communications
Administration is primarily responsible for formulating the
Enterprise Physical Data Base Design. The effort required to
devise a method of physical implementation should not be
underestimated. This may require considerable thought and
research. Because of this, DBA/DCA may seek assistance from
a variety of functions as part of this process. Those who
may lend assistance include:
- Data Engineering - who can explain objects,
views, and relationships in the Enterprise Logical Data Base
Model (ELDBM).
- Systems Engineering - who can explain objects,
views, and relationships in the Application Logical Data Base
Model (ALDBM), along with processing requirements for an
information system. For example, Systems Engineering should
be able to anticipate transaction volume, processing speed
(as derived from Frequency/Offset/Response Time), proposed
processing method (e.g., interactive, batch, manual
processing), file processing (Create/Update/Reference),
file volatility and hit ratio, etc.
- Software Engineering - who can propose file
access methods, and anticipated program work file requirements.
- DP Operations - who participates in an advisory
capacity, particularly in hardware planning.