PRIDE ® -DBEM
Data Base Engineering Methodology
An integral part of Information Resource Management (IRM)

PHASE 1   PHASE 2   PHASE 3   PHASE 4   PHASE 5   PHASE 6   FORMS

TRANSLATE THIS PAGE TO... Chinese (simple)   Japanese       Dutch   French     German     Italian    
Free Translation courtesy of ALS      Chinese (traditional)   Korean       Portuguese       Russian       Spanish         

CONTENTS

"A Data Base is all of the data needed to support the information requirements of an enterprise, regardless of where used or how stored. By this definition, all companies have a data base; some are managed, most are not."
- Bryce's Law

This section contains the following:


 
    BUSINESS PURPOSE

    The "PRIDE"-Data Base Engineering Methodology (DBEM) is an important part of an overall philosophy of Information Resource Management (IRM) as defined by "PRIDE". This involves the development and control over all of the resources required to produce information. Whereas the "PRIDE"-Enterprise Engineering Methodology (EEM) is principally concerned with developing an Enterprise Information Strategy, methodologies such as DBEM and the "PRIDE"-Information Systems Engineering Methodology (ISEM) are concerned with actually creating the data and system resources needed to produce information.

    The intent of any of the "PRIDE" methodologies, including DBEM, is to define the business environment as to "Who" is to perform "What," "When," "Where" and "Why" (the "5-W's"). As a result, it is used to convert a heterogeneous operating environment into a homogeneous environment. This improves communications and promotes cooperation and teamwork throughout an enterprise. Better organization and discipline also enhances the ability to build quality products and make effective use of resources. In addition, to the 5-W's, the methodology provides "How" to perform the work by providing a variety of techniques and tools deployed throughout the methodology. A methodology, therefore, resembles an assembly line where work is performed in manageable stages.

    DBEM is a generic and universally applicable approach for building any type of data base, regardless of industry, type of application, software language/technique, or data base tool. A Data Base Management Systems (DBMS) is not a prerequisite for DBEM. The methodology is based on tried and proven approaches that are so fundamental to sound data base design that tailoring to individual development requirements is not only unnecessary but highly undesirable.  

    CONCEPTS & PHILOSOPHIES  

    INTRODUCTION

    Data is one of the resources needed to produce information. This implies that Data exhibits distinctively different characteristics than Information. Information represents the intelligence or insight needed to support business actions and/or decisions. A data element by itself is meaningless. It is used to define the facts and events about a business. Data identifies and describes the objects of importance to the enterprise, such as products, orders, customers, vendors, parts, billings, payments, shipments, etc. It is also used for quantitative purposes in measurements and calculations. A single information requirement, thereby, represents an assemblage of these business facts presented in a specific context and time frame. In this respect, data represents the raw material needed to produce information. Obviously, one data element can support many information requirements. Because of this, it is necessary to manage data like any other resource.

    Data is the binding force behind information systems. The only way systems communicate, either internally or externally to other systems, is through data. For this reason alone, data must be controlled to promote sharing between information systems.

    The management of data as a resource begins with a corporate attitude and disposition, not with an elaborate set of tools. Data Resource Management requires the same perspective as managing the parts department of a manufacturing company. The objective is twofold: to classify and standardize resources so they may be shared by multiple applications, and; to control the collection, storage and distribution of resources to minimize overhead. Both are concerned with the efficient and cost effective use of resources.

    Under DBEM, a data base is defined as all the data required to produce information, regardless of where the data is used or how it is stored. From this perspective, all companies have a data base. In fact, the day a company begins to conduct business is the day its systems and data base are born. Both evolve with the business over time as the company's information needs change.

    Data Resource Management is another area where corporate management has abdicated its responsibilities to technicians who have turned a simple concept into an esoteric technical practice. As computer technology evolved, several physical file management techniques were introduced to manage data on the computer, most notably the Data Base Management System (DBMS). The intent of the DBMS is to physically store and retrieve data for use in computer programs. In fact the term "DBMS" is a misnomer since it only deals with data on a direct access device (disk). It does not deal with data on other devices, such as tape files, card files, manual files, etc. Therefore, it does not manage the entire data base, only a portion of it. Nor does it do any logical file management which is more important than the physical file management.

    Although the DBMS was originally designed to permit the sharing of data among applications, this has seldom been implemented. Due to a lack of management discipline, the DBMS is one of the most abused and misapplied products in the industry. It is typically used as nothing more than an elegant file access method, not as a tool for integrating systems.

    What this points out is that companies have been taking a tool oriented or physical approach to managing data. Despite the considerable investment in DBMS technology over the last 25 years, very few companies have realized a managed data base environment. Why? Primarily due to management's failure to recognize and treat data as a reusable resource. Data Resource Management is a "materials management" issue, nothing more, nothing less.

    Imagine a manufacturing company without a materials management function. Under this scenario, engineers would design products without consideration for the other products marketed by the company. Each product would be designed with a unique set of parts. Inevitably, many duplicate parts would be designed. Without some form of coordination, there would be significant overhead and waste from collecting and storing redundant parts. Also, because no formal control mechanism existed to track parts, implementing changes to parts consistently throughout a product line would be a haphazard endeavor.

    This is exactly the situation that occurs during traditional systems development. Each analyst and programmer is permitted to design data bases unique to their application. The result: rampant data redundancy throughout the organization. There is a natural tendency for an analyst to do only what is best for their individual assignment, not necessarily what is best for all corporate applications. This is why a neutral third party is required, to coordinate and standardize data resources on a corporate basis. This is the mission of the Data Resource Management function.  

    DATA BASE CONSTRUCTS

    Data resources can be organized into a generic and universal structure. The basic building block is the data element itself, the representation of an individual fact or an event. A collection of one or more data elements is a record and one or more records make up a file. As previously mentioned, a data base represents all of the data used to produce information, regardless of where used or how stored.

    BASIC CONSTRUCTS

    All data resources are structured in this generic manner. Terms such as "schema," "sub-schema," "segments," "tracks," "cylinders," "sectors," "tables," "arrays," "tuples," "data stores," etc., (all of which deal with particular computer techniques and tools), can all be translated into the basic constructs mentioned above.

    The organization of data serves two purposes: one is to logically describe the "objects" used to manage and operate the business, and; to express how data will be physically stored. The differences between logical and physical are substantial; there will not necessarily be a direct relationship between the two.

    LOGICAL/PHYSICAL CHARACTERISTICS

    There is not necessarily a one-to-one relationship
    between logical and physical

    Physical files may differ considerably from logical files. Here, the file represents a particular way of physically storing data. Data may be physically stored in a variety of files, such as an indexed file, a "flat" file, a DBMS file, etc. Even manual files follow this model with the exception they also store inputs and/or outputs (both of which consist of records and data elements).

    Unlike the logical file that is organized according to a unique data identifier ("primary basic grouping"), the physical file does not require any specific organization and can use any sort/access key desired. Ultimately, it depends on the file management technique or tool being used.

    The logical view of data is the basis for all physical data base design, regardless of the file management technique or tool selected. The physical files must ultimately carry out the intentions of the logical files in terms of what data must be stored, the dependencies between data, and volume. As a matter of fact, all DBMS packages can implement these logical views, regardless of whether they have a hierarchical, network, relational, or object-oriented structure.

    LOGICAL TO PHYSICAL RELATIONSHIPS

    Again, there are substantial differences between logical and physical files. Perhaps the most noticeable difference is the logical file will remain relatively static while the physical file will change dynamically, based on advances in technology. One of the most important reasons for defining data resources logically is to seek data independence from the physical environment, thus allowing any physical implementation without disrupting systems.  

    OBJECT CONCEPT

    In its simplest terms "objects" represent "things" used in the operation of an enterprise, such as Products, Parts, Customers, Employees, Vendors, Orders, Shipments, Billings, Payments, etc. Objects are initially identified when the business is defined under the Enterprise Engineering Methodology (EEM) and passed to DBEM for further definition.

    Objects are uniquely identified by a single data element which is referred to as the "primary basic grouping." For example, "Customer Number" is used to uniquely identify a "Customer" object; "Shipment Number" identifies a "Shipment"; "Order Number" identifies an "Order"; etc. This means that AN OBJECT IS AN OBJECT WHEN A UNIQUE DATA ELEMENT HAS BEEN ESTABLISHED TO IDENTIFY IT. Further, the "primary basic grouping" becomes the common bond that relates data to one another. For example, all customer related data will be "grouped" around the "Customer Number," shipping related data around "Shipment Number," etc.

    Not all numbers and codes will necessarily represent objects. For example, to a post office or shipping company, a Postal Territory is extremely important and requires management. As such, a "Zip/Postal Code" data element is created to uniquely identify each territory. However, to other companies, a Postal Territory is of little concern or importance; it is not an object they must deal with in the operation of their organization. In this situation, "Zip/Postal Code" is used for nothing more than descriptive purposes about "Customers," "Vendors," "Employees," etc. This means AN OBJECT MUST BE RELATED TO AT LEAST ONE BUSINESS FUNCTION THAT IS RESPONSIBLE FOR ITS CONTROL. IF SUCH A BUSINESS FUNCTION DOES NOT EXIST, IT IS NOT AN OBJECT.

    As another test of the validity of an "object," consider how the data element is assigned (its "source"). If the data element is assigned from an external source, then in all likelihood the object is not valid. To illustrate, "Zip/Postal Code" is assigned by the Post Office (not by the average company). "Postal Territory," thereby, is a pertinent object for the Post Office, but not for the average company.  

    TYPES OF OBJECTS: FACTS AND EVENTS

    There are basically two types of "objects": facts and events. Factual objects represent tangible things such as Products, Parts, Employees, and Vendors. In contrast, event related objects are intangible things that are date/time related, for example: Order, Shipment, Billing, Purchase, etc. An event represents some form of interaction between two or more factual objects. To illustrate, a "Customer" (fact), places an "Order" (event), for a "Product" (fact).

    One nuance that distinguishes facts from events is the type of data associated with each. "Names" and "Locations" can be associated with facts; "Dates" and "Times" with events. For example, "Product Name" makes sense, "Product Date" does not. "Order Date" makes sense, "Order Name" does not.

    Since logical files are used to model "objects," they contain only primary data elements to identify, describe and quantify the object. Generated data is not included in logical files since it can be generated based on the primary values. (This is another example of the differences between logical and physical files; in some situations, it is practical to store generated data elements).  

    UNDERSTANDING "VIEWS"

    An "object" is divided into "views" to represent different perspectives about an object. Data elements could conceivably be grouped into one large record, but this would not provide any significant insight into objects and data. Grouping data into distinctly separate views, simplifies our perspectives on data. The objective is to uniquely identify an occurrence of data so that it is not confused with another (for example, "Quantity Shipped" versus "Quantity Ordered.") There are three types of "views" (logical records) associated with "objects":

    1. IDENTIFICATION VIEW - normally consists of a control number or code to identify the object followed by a name or date; for example:

    EMPLOYEE OBJECT - Employee Number & Name PURCHASE OBJECT - Purchase Order Number & Date

    There may be additional data elements used in this view. However, the point is, all objects have one identification view, regardless if they are facts or events. This implies that if the Identification View is deleted, then the other views in the objects are also deleted, as well as pertinent relationship views in other objects.

    2. CHARACTERISTICS VIEW - consists of data that is not used to identify a separate object, nor create a relationship to other objects. Instead, it is used to describe the internal characteristics of an object; for example, a "Customer" object may have two types of characteristic views:

    ADDRESS - Including "Customer Number," "Type Address" (a code to identify an address type for Shipping/Billing/Mailing purposes), "Address," "City," "State," "Zip/Postal Code," and "Country." CONTACT - Including "Customer Number," "Contact Number" (to identify each customer contact), "Name," "Telephone Extension," "Internal Mail Code/Drop."

    This implies that an address and a contact are not strong enough to represent separate objects by themselves. Rather, they are describing various aspects of the customer object.

    An Object can have one or more of these views (or none). Factual objects will typically have Characteristics Views; Event related objects normally do not.

    3. RELATIONSHIP VIEW - consists of data that is used to establish relationships between objects. For example, in a Shipping Record, there may be data elements to relate "Quantity Shipped" to a Shipment ("Shipment Number"), to a Product ("Product Number"), and to a Customer ("Customer Number"). This implies relationships between..

    CUSTOMER ------------- SHIPMENT ------------- PRODUCT

    Event related objects will have at least one Relationship View; Factual objects normally have none. There cannot be more than two facts associated with a single event/view.

    The relationship view connects with identification views (not characteristic views) through record-to-record relationships. To illustrate:

    CUSTOMER SHIPMENT PRODUCT IDENTIFICATION<------->>RELATIONSHIP<<------>>IDENTIFICATION RECORD RECORD RECORD (1:M) (M:M)

    These relationships require clarification to describe the nature of each relationship, as denoted by the arrow notation. In the example above, a Customer may have many Shipments, but a Shipment pertains to a specific Customer (this is a one-to-many relationship (1:M)). Also, a Shipment may consist of many Products, and a Product can be used in many Shipments (many-to-many relationship (M:M)).

    These relationships and the basic grouping of the view establish the constraints of the data base and are essential in physical data base design.  

    BASIC GROUPING CONCEPT

    What distinguishes the different views of an object is the "basic grouping" of the logical record. The term "basic grouping" refers to the indicative data elements used to uniquely identify a view. It also represents a dependency between data elements in a particular context (it is how they are "grouped" into separate views). For example, in a Customer Object, it is used to segregate address data, from credit data, from customer contact data, etc. Perhaps it is easier to think of the "basic grouping" as the key to a logical record (not a physical record). Because the intent is to uniquely identify data, the "basic grouping" consists only of "Indicative" data (not "Descriptive" or "Quantitative").

    There may be up to two parts in a single basic grouping:

    PRIMARY BASIC GROUPING - Since views are used to describe objects, they must all be defined with the one data element used to uniquely identify the overall object; this will be a primary/indicative/object-oriented data element. The Primary Basic Grouping will be the basis to sort views into objects (e.g., all Product related Records will be put in the Product File, all Customer related Records in the Customer File, etc.).

    SECONDARY KEYS are used to either distinguish Characteristic Views or Relationship Views.

    For Relationship Views where it is necessary to relate one object to another (such as an Order to a Customer and to a Product) additional primary/indicative/object-oriented data elements may be added to the basic grouping. These data elements are sometimes referred to as the "Foreign Keys" to another object.

    The concept of "Referential Integrity" (as commonly referred to in the industry) is concerned with the logical consistency between views or records. It requires that every occurrence of a foreign key has a corresponding occurrence as a primary basic grouping in an identification view. As in the "Customer/Shipment/Product" example previously mentioned, "Customer Number" and "Product Number" can only be used as a Secondary Key as long as they are also used as a Primary Basic Grouping in other identification views.

    As an aside, since a "Relationship View" typically applies to event related objects, the secondary key should normally consist of object-oriented identifiers related to factual objects. This will be used to bridge factual objects through the event object (see "Other Object Considerations").

    Characteristic Views also require Secondary Keys. However, the intent here is not to establish a relationship to another object, but rather to establish a separate view within an object (an internal relationship). Under this approach, "view identifiers" are used to segregate data into separate records. For example, "Type Address" is used to distinguish address related data from other data.

    The sequencing of the basic grouping is extremely important for three reasons:

    1. It is the principal criteria for combining logical records into logical files (based on the primary basic grouping data element).

    2. It is the principal criteria for establishing relationships between logical records (based on secondary keys).

    3. As we will see when we discuss "Descriptive" and "Quantitative" data, the "basic grouping" gives meaning to the data.

    Because of its importance, the "basic grouping" must be established in a prescribed format. Unlike a key in a physical record, which can be set to whatever is convenient, the basic grouping must be assigned as:

    1. The Primary Basic Grouping first.

    2. Secondary Keys last.
     

    OTHER OBJECT CONSIDERATIONS

    Factual Objects will typically relate to other Factual Objects through Events. This is quite common and natural. As facts or events are being defined, the analyst should challenge what peripheral facts and events are involved.

    FACTS & EVENTS FACTEVNT.BMP

    The "PRIDE" concept of "Objects" is an advanced refinement over the "Entity/Relationship" model as commonly referred to in the industry. The differences are subtle but significant:

    1. "Entities" typically represent only "facts" and have trouble depicting "event" relationships. Under DBEM, an "event" is just another "object," thus simplifying the establishment of relationships.

    2. "Entities" are typically scattered with no cohesive bond to unite common entities. For example, there is no one point where a global view of a customer can be found. In contrast, the "Object" concept uses the primary basic grouping of the various views to group compatible logical records into a single logical file, thus providing a total picture of the object.

    Coincidentally, the "PRIDE" concept of "Objects" is compatible with what the industry refers to as "object-oriented" data bases and programming. These are techniques that were developed independent of DBEM, yet are complementary.  

    APPLICATION VERSUS ENTERPRISE

    It would be easy to say there are just two types of data base models, logical and physical. However, there is another perspective that adds a different dimension to this, and that is how data is viewed from an "enterprise" versus an "application" perspective.

    An "application" view refers to the data used in a specific system. In terms of the logical model, it represents the "local" data used to describe objects for a particular Information System. It represents only those data elements required to satisfy the information needs for a particular application. Obviously, this will not necessarily be the "global" view of the object, which is the intent of the "enterprise" view. In other words, the "application" view will usually be a subset of the "enterprise" view of data.

    "APPLICATION VERSUS ENTERPRISE"
    APPLICATION VIEW OF A CUSTOMER ENTERPRISE VIEW OF A CUSTOMER
    Customer Number
    Name
    Credit Rating
    Customer Number
    Name
    Credit Rating
    Contact Number
    Title
    Telephone
    Address Code
    Address
    City
    State/Province
    Zip/Postal Code

    The "enterprise" view represents a complete picture of the object, with all of the data required to satisfy all applications, not just one. Under this arrangement, there may be multiple "application" views of objects, but only one "enterprise" view of an object. In fact, it is quite common to have many different "application" views of an object. One system may require certain data elements about a customer object while another requires a totally different set of data elements to describe a customer. These legitimately separate views of the customer, as defined by Systems Engineering during design, are coordinated through the enterprise view of the customer as controlled by the Data Engineering function.

    APPENT.BMP

    When a system is designed into sub-systems with logical files, the "enterprise" data base is adjusted to accommodate the "application" data base. If the objects encountered in the system are new to the enterprise, then new enterprise views must be defined. Initially, the application and enterprise views of an object are identical. As new applications are introduced with different views of the same object, then the enterprise view is modified accordingly by Data Engineering.

    This application/enterprise relationship highlights the fact that data base design is an evolutionary process. Other data base design techniques typically take a "revolutionary" approach by trying to identify all of the data requirements for the entire company at one time. Obviously, the problem with this approach is that it becomes an enormous and unmanageable data base design project with questionable results. Whereas the evolutionary approach naturally synchronizes the data base with all of the applications, the revolutionary approach develops a data base that will not necessarily match the applications.

    Under the evolutionary approach, the corporate data base will expand and contract naturally as the business and applications change. Consequently, excessive or unnecessary data definitions will be avoided.  

    THE FOUR DATA BASE MODELS

    The variables of logical versus physical and application versus enterprise results in four data base models:

    • The Application Logical Data Base Model (ALDBM) represents all of the primary data elements needed to satisfy the information requirements of a single application. In other words, all of the data needed to describe the objects pertinent to a given information system. The ALDBM defines the logical files used in a single system. It also represents a subset of the Enterprise Logical Data Base Model.

    • The Enterprise Logical Data Base Model (ELDBM) represents the primary data elements used to describe all objects in an enterprise, not just that data used in a single system. It represents all logical files in the corporate data base.

    • The Enterprise Physical Data Base Model (EPDBM) represents how the data in the ELDBM is physically stored in files. The corporate data base can be either centralized or distributed. A variety of file management techniques can be used to store the data, e.g., computer files, manual files, etc. The EPDBM, therefore, defines all of the physical files in the corporate data base.

    • The Application Physical Data Base Model (APDBM) represents subsets of the EPDBM used to fulfill a specific application. It satisfies the data requirements of the ALDBM and denotes the physical files used in the system.

    As can be readily seen, there are corresponding relationships between the four data base models; the enterprise view must satisfy the application view and the physical must implement the logical. This type of data resource definition provides for the integrity of the data base by assuring that only those resources required to serve changing business information needs are maintained. In summary, it assures the corporate data base will be correctly synchronized with all of the various applications.

    The relationships between the four data base models can be rather extensive. Diagrams using boxes and arrows are fine for expressing simple relationships, but this is seldom the case. Instead, a set of matrices with horizontal rows and vertical columns is a much more convenient and simpler approach for expressing these relationships. These data base relationship matrices are similar in intent and format to those mentioned in Enterprise Engineering. Data base matrices are used to express relationships between:

    • Application Logical Records
    • Enterprise Logical Records
    • Enterprise Physical Records
    • Application Physical Records
    • Application Logical Records to Enterprise Logical Records
    • Enterprise Logical Records to Enterprise Physical Records
    • Enterprise Physical Records to Application Physical Records
    • Application Logical Records to Application Physical Records

    One by-product of data base modeling is that it provides for a complete description of how data is used throughout a company, regardless of where used or how stored. It tracks where data is collected, stored and retrieved. This type of inventory control implements one of the basic missions of resource management.  

    DATA AS A RESOURCE

    Like a component in a product, data is a reusable resource that can be shared between applications. To avoid redundancy and to verify the integrity of the component, data must be defined with a high degree of precision. Otherwise, it will be virtually impossible to check for duplication. In order to maintain its "cleanliness," data must be specified and classified in the same manner as any other part in a product.

    How data is defined will dictate how it is used. If management cannot see what constitutes an element of data or how it is derived, its validity, currency and accuracy will be highly suspect. Ultimately, users will not be able to trust it. They will not be able to truly tell if they can base decisions and actions on information created by using data they do not understand or know its source. This type of specification traditionally is defaulted to obscure and inflexible program source code. If it is not visible, it is not reusable. This is why data elements like "Net Pay" are difficult to maintain, because their logical calculations reside within source code and are not maintained separately. The definition of data should be used as the specification for programming. A program, therefore, is nothing more than a mechanism to carry out intentions of how the data should be manipulated and processed.  

    LOGICAL VERSUS PHYSICAL DATA

    There is more to defining a data element than providing a cryptic program label, yet this is all that is commonly considered by the average programmer. This is far too vague and inconsistent to assure any precision in data definition. There are actually two aspects to be considered when defining data: its logical meaning and its physical implementation.

    A data element can have only one logical definition but can have one or more physical implementations. If a data element is an expression of a single fact or an event, it is important that it be explicitly defined so it will not be confused with another. If there is a genuine difference in interpretation of the meaning of data between users, then more than one data element is involved.

    Although standardization of data's physical characteristics is an objective, there can be multiple physical representations of data. For example, there can be several legitimate ways to represent "Calculated Delivery Date":

    December 11, 2004 11 DEC, 2004 DEC-11-04

    In this example, the data element has a singular logical definition, "The calculated date when a delivery is due." All that differs is how the data element is physically represented. What this points out is the physical characteristics of data may vary from one application to another.

    Obviously there are significant differences between how a data element is logically and physically defined. Systems Engineers and Data Engineers primarily deal with the logical. Software Engineers and Data Base Administrators work with the physical definition. Data Resource Management must govern both.  

    THE ANATOMY OF A DATA ELEMENT

    A well defined data description contains vital intelligence for establishing relationships between data elements, and for constructing logical records and files. Superficial or inaccurate data definitions will produce erroneous results. For example, logical data base design is totally dependent on the precise and accurate definition of data. The objective when defining data, therefore, is to prove that the data element is unique and non-redundant, thus promoting sharing and re-using resources. This, in turn, leads to system integration.

    As mentioned, there are two aspects to data definition: Logical and Physical. This narrative will describe both with emphasis on Logical Characteristics.

    LOGICAL CHARACTERISTICS

    A. NAME AND DEFINITION

    Each data element has a proper name that it is commonly referred to in business (not necessarily how it is used in programming), such as "Employee Number," "Address," or "Net Pay." To eliminate confusion, the name should not be redundant with another item.

    The data element also has a textual description expressed in a Webster or Oxford style dictionary format. The description provides the meaning of the data element and should be expressed in the terminology of the business.

    B. TYPES OF DATA

    There are three types of data elements: Indicative, Descriptive, and Quantitative.

    INDICATIVE data is used to uniquely identify an object in part or in full. "Uniqueness" is an inherent property of indicative data so that it can be used to clearly differentiate occurrences of an object. This is why control numbers and codes are typically used as indicative data, as opposed to names. Names can be too vague. For example, there may be more than one employee named "John Smith." Without some form of qualifier, it is virtually impossible to distinguish one "John Smith" from another. Consequently, an "Employee Number" is assigned to uniquely identify each employee.

    In most major corporations, "names" are treated as descriptive data due to the volume of occurrences (many employees, many products, many customers, many parts, etc.). However, in smaller enterprises, where there is not a high volume of occurrences, "names" are a much more effective means for controlling occurrences. For example, at a small produce market, the "Produce" object is uniquely identified by name, not by number.

    The point is, numbers and codes offer better control in a major business, yet in a smaller organization, names may be more practical and easier to use.

    Indicative data is used to identify either a whole Object, or a View within an Object. The difference between the two is significant. Data elements such as "Part Number," "Product Number," "Shipment Number," and "Billing Number" are strong enough to identify a whole object by themselves (facts or events). Other data elements may not be strong enough to represent a separate object and are subordinate to the object-oriented identifiers. For example, "Contact Number" may be used to represent an individual person within a "Customer." In this situation, a "Contact" object is not strong enough to be independent from a "Customer" (the company does not manage "Contacts," it manages "Customer Contacts" instead). Under this scenario, "Contact Number" is subordinate to "Customer Number" and, as such, is used to represent a view of a customer.

    "View Identifiers" are typically encountered on "Characteristics Views" and are used in situations where there are multiple occurrences (repeating groups) of the same data element. As in the "Customer Contact" example, a customer may have many people within it to contact. To distinguish each person, a "Contact Number" is devised to uniquely identify each person. This is similar to a body of text where a "Line Number" is used to differentiate each line of text.

    A "View Identifier" is not used on an "Identification View" of an object since the view implies a single occurrence of a data element, not multiple. For example, an Employee will only have one name, an Order will only have one date, etc.

    A "View Identifier" is typically not used on a "Relationship View" since object-oriented identifiers should be strong enough to uniquely identify each occurrence of a data element.

    DESCRIPTIVE data consists of alphanumeric characters that are not strong enough to identify an object, but convey important business facts about an object, such as names, addresses, text, codes, etc.

    QUANTITATIVE data deals with numeric values that are either calculated or are calculable. Measurements and computations are typical examples: "Net-Pay," "Quantity Ordered," "Elapsed Time," "Percent of Gross," etc.

    It is sometimes difficult to differentiate Quantitative data from Indicative data. Indicative data will often use numeric values for identification purposes, such as "Invoice Number," "Purchase Order Number," "Customer Number," etc. However, it would be a mistake to use these numbers for quantitative purposes (aside from counting the number of occurrences).

    Descriptive data should be expressed in the simplest of terms, allowing their dependency to Indicative data (the basic grouping) to express their meaning. For example:

    INDICATIVE DATA + DESCRIPTIVE DATA = MEANING Customer Number Name Customer Name Product Number Name Product Name Vendor Number Address Vendor Address Employee Number Address Employee Address

    This approach results in a minimal number of primary data definitions. The alternative would be to define many names, many addresses, and many dates. Data elements such as "Employee Name" should be defined only in the absence of a control number/code (such as "Employee Number"); in this situation, "Employee Name" is indicative, not descriptive. Names should automatically trigger the analyst to consider the facts being represented, and the indicative data elements related to them.

    This discussion reflects the fact that the nature of the business will dictate "data type." Not all numbers and codes will necessarily be indicative; it is based on the "object" or "view" being identified. This also highlights the fact that "type" refers to the logical nature of data exclusively; not how it will be physically used. After all, descriptive and quantitative data can be used as a physical sort/access key just as well as any indicative data element can.

    C. FORMS OF DATA

    Data comes in two forms: Primary and Generated.

    PRIMARY data refers to data in its virgin state; as introduced to the system from an external source (such as a person or department). "Source" defines who is responsible for entering the data to a system, and who has ultimate authority for the definition of the data element. Depending on how well the data element is defined, it may have either one or many "sources." For example, "Customer Name" may have one source, the Customer Services area. Conversely, a generalized data element, such as "Name," may have many sources depending on circumstances.

    GENERATED data refers to data that relies on other data elements in order to produce the necessary result. This type of data can involve elaborate calculations and algorithms (e.g., DD-1 + DD-2 = DD-3). "Net Pay," "Balance Amount" and "Percent Complete" are some examples of calculated data.

    Another form of generated data is a "Group" data item that represents a specific string of data elements in a assigned format. For example, "Credit Card Number" typically consists of "Financial Institution ID," "Bank Region Number," "Bank Branch Office ID," and "Account Number." There are many other examples of "group" data: such as telephone numbers, product identification codes, public utility account numbers, etc.

    It is a common misconception that group data elements should be used for basic groupings in logical records; THEY SHOULD NOT! Group data is used as a convenient means to describe dependencies between primary data elements. As such, a group data element provides tremendous insight into objects and views. For example, consider the objects and views associated with "Telephone Number."

    Observe the dependencies between the three views. Each has an impact on the others. Should the first view be deleted, the second and third views will also be deleted. From this perspective, the basic grouping defines dependencies and eliminates the problem of multiple occurrences.

    Data elements such as "Telephone Number" and "Credit Card Number" should only be defined as group items if they truly represent a concatenation of indicative data elements representing objects. For example, "Telephone Number" is a valid group item to identify a "Communications Area" and its views for a telephone company. But if "Communication Area" is not a pertinent object to your business, there is little point in defining it as a group item. Instead, it is a simple primary value.

    Group data may not be pertinent for logical records and files, but in certain situations it can be a convenient sort/access key for physical files.

    D. DATA DEPENDENCIES

    Aside from data-to-data relationships required to produce generated data, another form of data dependency is required when defining primary data. The purpose for defining this dependency is to supply additional meaning to subordinate descriptive and quantitative data definitions. For example, it is reasonable to assume that there is a relationship between a "Customer Number" data element and a "Name" data element - "Name" DEPENDS on "Customer Number" to give it meaning ("Customer Name").

    HOW A PRIMARY DATA ELEMENT IS DEFINED IS BASED ON THE DATA ELEMENTS IT DEPENDS ON.

    Logical data dependencies must be explicitly defined. This intelligence will be required for creating logical records ("Views").

    A data element may require a dependency on more than one data element. For example, a "Quantity Ordered," may depend on "Order Number," "Product Number," and "Customer Number" to give it meaning and uniquely identify a single occurrence of "Quantity Ordered."

    Descriptive and quantitative data have dependencies with indicative data elements:

    • They will require one or more "object" type data elements.

    • They may require one "view identifier" type data element to uniquely identify an occurrence of data.

    Dependencies between indicative data elements should also be established to reflect how "view identifiers" depend on superior "object identifiers" (for example, "Contact Number" to "Customer Number").

    Dependencies between primary indicative data elements that are prohibited include:

    1. An "object" to "object" data element relationship is prohibited since this will be defined through logical records.

    2. A "view identifier" to "view identifier" data element relationship is prohibited since this would create an abnormal logical data base design.

    E. PROBLEMS IN DATA DEFINITION

    Defining data elements is not always easy. Quite often a data element's assignment may be different than its business purpose. For example, a financial institution may require "Mother's Maiden Name" from an account holder. In reality, the financial institution is not really interested in the mother as it is in establishing a unique "Security Password" to validate the account holder in case of an emergency. "Security Password," thereby is the data element, "Mother's Maiden Name" is how it is assigned. Another common example is "Social Security Number" as used in the United States. This is a number used by the federal government to identify each citizen for retirement benefits. Many companies will use the number to uniquely identify each employee as opposed to inventing a separate numbering convention. In this situation, the data element is actually "Employee Number," but the number is assigned by the U.S. Social Security Administration.

    Another common problem is to create an indicative item that is bound to a physical input/output as opposed to an object. The classic examples here are "Check Number" and "Deposit Slip Number" as used in banking. In reality, a check or deposit slip are physical inputs for recording a "Debit" or "Credit." Obviously, there are other ways of creating "Debits" and "Credits," particularly with electronic banking (automatic funds transfer for example). Under this scenario, checks and deposit slips are not used; therefore, "Check Number" and "Deposit Slip Number" are invalid. The actual data elements are "Debit Number" and "Credit Number."

    PHYSICAL CHARACTERISTICS

    The physical definition of data is perhaps easier to comprehend by the average programmer and data base administrator. It includes such things as:

    • Length - defines the maximum number of characters that can be assigned to a data element.

    • Class - defines the type of characters used to express a data element, e.g., alphabetic, numeric, alphanumeric, signed numeric, etc.

    • Justification - defines the alignment of data within a field when the number of characters is less than the length of the receiving field, e.g., left, right, around the decimal point.

    • Fill Character - defines the character to complete a field when the data item is shorter than the maximum length, e.g., blank, zero, X, etc.

    • Void Character - defines the character to be used when a data item's value is unknown or non-existent, e.g., blank, zero, X, etc.

    • Unit of Measure - defines the representation of numeric data, e.g., area, volume, weight, length, time, energy rate, money, etc.

    • Precision - defines for numeric data the number of significant digits in a number.

    • Scale - defines for numeric data the placement of the decimal point.

    • Base - defines for numeric data the radix used for representing the number in programming, e.g., decimal, binary, octal, hexadecimal, etc.

    • Mode - defines the format (and type) of a data element for programming, e.g., fixed point integer, floating point, double precision floating point, complex, binary, packed decimal, polar coordinates, etc.

    • Picture - defines how the data element is expressed for programming. It is typically based on length, class, precision and scale.

    • Program Label - defines the proper name of the data element as it will be referred to in a programming language, such as COBOL, FORTRAN, PL/1, Assembler, C, Pascal, ADA, etc. One data element may have many program labels.

    • Validation Rules - defines specific values which the data element may assume. For example, Yes/No, specific codes or numbers to be used, editing/syntactical rules, etc.

    Although Systems Engineering is primarily concerned with logical specifications, they will also provide assistance in gathering physical specifications, particularly when they format inputs and outputs.  

    DATA TAXONOMY & DOMAINS

    The management of any resource requires the development of a classification system. Financial resources are typically arranged according to a chart of accounts; material and human resources are categorized by type. In science, everything from chemical elements to the animal kingdom are organized according to a class structure. There obviously is a purpose to uniquely identify common elements; to provide for the ability to distinguish one from another, and eliminate redundancy. In all instances, classification is based on the inherent characteristics of the element.

    A Data Taxonomy is a hierarchical structure that separates data into specific classes of data based on common characteristics. The taxonomy represents a convenient way to classify data to prove that it is unique and without redundancy. This includes both primary and generated data elements.

    CLASSIFYING DATA
    The objective is to eliminate redundancies
    and promote sharing/integration

    DOMAIN - Elements with similar characteristics

    The lowest level in the classification hierarchy represents what is commonly referred to as the "domain" of a collection of data elements, one or more, with common characteristics. For example, "text" related data elements would be in one domain, "weights" in another, "percentages" in another, "monetary values" in another, etc.

    The domain also defines the standard physical characteristics and values the data may assume. For example, we could establish that all "location" values are alphanumeric, left justified, with blank fill and void characters. In other words, data elements such as "Address," "City," and "State" should assume these physical characteristics for consistency.

    If a data element does not have the standard logical and physical characteristics, it must belong to another "domain." In the situation where a data element may have only one logical definition, but multiple physical definitions, its primary physical definition must first conform to the Domain standards before it can be deviated from in an application record. In other words, the primary physical representation of "Unit Cost" is expressed as an eight character numeric to conform to the "currency" domain. However, in one application, a user desires the data element be expressed as a ten character numeric. It is the same logical data element with just another form of physical expression.

    With a classification system in place, data elements can then be uniquely and consistently defined. When this is done, we then have a basis for checking data redundancy. Also, when a data element has been properly specified in this manner, it becomes rather simple to locate it again for use in other applications.

    Classifying data helps to fulfill one of the the major objectives of Data Resource Management: to eliminate redundancy and promote the re-use of resources in applications.  

    SUMMARY OF MAJOR DBEM CONCEPTS

    • DATA IS A RESOURCE THAT MUST BE MANAGED AND CONTROLLED LIKE ANY OTHER RESOURCE. SYSTEMS COMMUNICATE THROUGH DATA.

    • THE MISSION OF DATA RESOURCE MANAGEMENT IS TO STANDARDIZE AND CONTROL DATA RESOURCES IN THE MOST COST-EFFECTIVE MEANS POSSIBLE.

    • BASIC CONSTRUCTS - Data Base, Files, Records, Data Elements
    LOGICAL PHYSICAL -------------- Represents an "object" | FILE | Represents how data is of the business. -------------- stored. | -------------- Represents views of | RECORD | Represents an area the "object." -------------- within the File. | Represents an individual -------------- Represents an individual element about an object. |DATA ELEMENT| element within a record. --------------

    There is not necessarily a one-to-one relationship between logical files and physical files. However, the logical is used to design the physical.

    • THERE ARE TWO TYPES OF "OBJECTS": Facts and Events.

      • FACTS are Name/Location oriented (tangible things).

      • EVENTS are Date/Time oriented (intangible actions) and represent some form of interaction between two or more factual objects.

    • AN OBJECT HAS ONE DATA ELEMENT USED TO UNIQUELY IDENTIFY THE OVERALL OBJECT, THIS IS REFERRED TO AS THE "Primary Basic Grouping."

    • A FACTUAL OBJECT TYPICALLY RELATES TO ANOTHER FACTUAL OBJECT THROUGH AN EVENT:
    CUSTOMER ------------- ORDER ------------- PRODUCT
    • THREE TYPES OF VIEWS WITHIN AN OBJECT:

      • IDENTIFICATION VIEW - all objects will have one.

      • CHARACTERISTIC VIEW - describes an object and is typically associated with a factual object.

      • RELATIONSHIP VIEW - establishes a relationship between two or more objects. Will typically apply to event related objects.

      • ONLY PRIMARY DATA IS STORED IN A LOGICAL RECORD; GENERATED DATA CAN BE DERIVED FROM PRIMARY DATA.

    • BASIC GROUPING: The key to a logical record. It is used...

      • - As the principal criteria for combining logical records into logical files (based on the primary basic grouping data element).

      • - As the principal criteria for establishing relationships between logical records (based on secondary keys).

      • - To give meaning to descriptive and quantitative data.

    • THE ASSIGNMENT OF THE BASIC GROUPING INCLUDES:

      • - The Primary Basic Grouping; a primary/indicative data element used to identify an object in its entirety.

      • - Secondary Key, to either:

        1. Establish a relationship to other objects (a Foreign Key); object-oriented data elements are thereby used.

        2. To identify a specific view within an object (a qualifying key to note a single occurrence of data). A view-identifier data element is used.

    • THERE ARE FOUR DATA BASE MODELS;

      1. ALDBM - Application Logical Data Base Model - all of the primary data needed to satisfy the information requirements of a system.

      2. ELDBM - Enterprise Logical Data Base Model - all of the primary data needed to satisfy all of the applications (the global view).

      3. EPDBM - Enterprise Physical Data Base Model - represents how all corporate data is physically stored.

      4. APDBM - Application Physical Data Base Model - represents how data for a single system is physically stored. It also represents a subset of the EPDBM.

    • DATA HAS ONE LOGICAL DEFINITION, BUT CAN HAVE MORE THAN ONE PHYSICAL REPRESENTATION.

    • TYPES OF DATA:

      • INDICATIVE - to uniquely identify and control objects, in part or in full. This will include data elements to either identify a whole object or a single view.

      • DESCRIPTIVE - to describe objects.

      • QUANTITATIVE - numeric values used in calculations.

    • FORMS OF DATA:

      • PRIMARY - data assigned from a user area; outside a system.

      • GENERATED - data derived from other data values; either from calculations or group (concatenated data).

    • ONLY PRIMARY/INDICATIVE DATA CAN BE USED IN THE BASIC GROUPING OF A LOGICAL RECORD. Group data elements cannot.

    • DATA TAXONOMY - a hierarchical structure used to classify data elements. The intent is to eliminate redundancy.

    • DOMAIN - the lowest level in the Data Taxonomy. A collection of data elements exhibiting common characteristics.
     

    DATA RESOURCE MANAGEMENT: THE FUNCTION

    The scope of Data Resource Management is much more encompassing than most people envision. It represents a large investment by a company. This is partially the reason why few companies have succeeded in this area, they simply fail to comprehend the magnitude of the function and its importance.

    One of the mistakes made when implementing Data Resource Management historically has been that the task has been delegated to technicians. The "tool approach" to managing data has emphasized the physical data base considerations but little has been accomplished in regard to the logical side. This is the principal reason why the proliferation of DBMS technology has been so widespread. Another symptom of the problem is the use of the term 'Data Base Administrator' which is clearly a technical task.

    The management of data requires centralized control to coordinate the use of resources. This is not to suggest a centralized data base. A company could have a distributed data base spread throughout various locations. It simply means the function is best served through a focal point. This is no different than the function of materials management which is concerned with coordinating the use of all parts, regardless of where used or how stored.

    A centralized Data Resource Management function can better promote and control the exchange of data between applications than a decentralized function which would only have a partial view of the corporate data base. Centralization would be able to maintain the data base models more effectively than if they were left to separate operating units. It would also be able to enforce data base design standards on a more consistent basis.

    One of the more controversial subjects in Data Resource Management pertains to the "ownership" of data. There are those who suggest that data "belongs" to the various users or departments of a company. This is like saying money is the property of the sales or accounting departments, not the company. Data belongs to the enterprise as a whole and not to any single person or department. Of course, how data is accessed should be controlled and safeguarded, just as we would do with any other resource. This is a responsibility for Data Resource Management to perform.

    The concepts of "end-user computing," "Data Mining" and the "Information Center" are totally dependent on the integrity of the data base. Without effective Data Resource Management, they would not be possible. Users could access erroneous or unauthorized data which could present serious financial and security problems to a company.  

    RESPONSIBILITIES OF DATA RESOURCE MANAGEMENT

    There are essentially seven responsibilities associated with the Data Resource Management function:

    1. To eliminate data definition redundancy.

    This does not mean the elimination of duplicate data, only the elimination of duplicate definitions. In many instances, it may be more practical to physically store redundant data in various locations. This decision is based on what is effective for fulfilling the needs of an application.

    Data should be defined properly one time and then re-used as often as there is an application need for it. This is accomplished by classifying data resources and controlling the four data base models.

    2. To satisfy the data needs of all applications and promote data sharing.

    Here again, data base is defined as all of the data in an organization required to produce information, regardless of where used or how stored. In any given organization, there is a finite number of primary data elements, not infinite. The logical corporate data base will keep expanding until this objective is reached. When this finally occurs, the company is in an enviable position. Users and the systems staff will be able to implement new information requirements simply by adjusting timing and creating new combinations of data.

    3. To design the data base to be easy to maintain and modify.

    The four integrated data base models provides invaluable assistance in this regard. They have the ability to expand and contract as the business and applications change. It also provides the means to scope and isolate problem areas in the corporate data base.

    Mapping the four models can be best served by using matrices to express the extensive relationships.

    4. To design the physical data base in the most efficient and cost effective means possible.

    Data Resource Management should be concerned with all file management tools and techniques, not just one. Indexed sequential files, sequential files, etc. are just as important as any DBMS file. The function should be just as concerned with the organization and filing techniques of manual files as they are with computer files. Unfortunately, this is not the situation in most companies today. The tools and techniques selected should be based on such things as required processing speed, anticipated transaction volume, security, and other performance considerations.

    5. To design the data base to be independent from applications and physical environments.

    Data Resource Management must be careful not to impose a dependency on particular hardware or software. In some situations, this may not be avoidable. In this event, conversion options should be explored and planned. Obsolescence in the areas of storage devices and techniques can mitigate against data base planning and create future problems. Also, Data Resource Management should avoid any short term solutions that may create long term problems. This applies to designing a Data Base to meet some particular application need without consideration for the overall needs of the other applications. The Data Resource Manager should have as objectives the complete independence of the data base from hardware, system software and applications.

    6. Cooperate with other IRM functions.

    Data Resource Management is a function that does not operate in isolation. It works closely with the other functions associated with the disciplines of information systems engineering and enterprise engineering. For example, the application logical data base design developed by Systems Engineering is checked for accuracy.

    The specification of the logical designs will result in a physical implementation requiring hardware and software which may not exist or will require modification. Data Resource Management must consider this carefully and advise Systems Engineering accordingly. When determining project costs, Project Management must be advised by Data Resource Management of the pertinent data base costs. Also, the application physical data base design must be delivered to Software Engineering prior to programming.

    For Enterprise Engineering, Data Resource Management must create skeletal definitions of the objects required to operate and manage the enterprise.

    7. To control all data resources.

    In order to provide an accurate accounting of all resources, a "bill of materials" type of system is required to catalog, classify, and cross-reference components to where they are used. This is the intent of an Information Resource Manager (IRM), a software tool used to inventory and track the use of organizational resources, systems resources, and data resources. This type of tracking could be performed manually, but this would create a large and cumbersome trail of paper.

    Acting as a Bill of Material Processor (BOMP), an IRM provides tremendous analytical capabilities, particularly in the area of "impact analysis," which permits a user to evaluate the effect of a change to a resource as it applies to the other information resources in an enterprise. It can also be used to maintain documentation on the various resources. As changes are made, the documentation is automatically updated. Because of the extensive resource intelligence contained in the IRM, it can be used to drive multiple physical data base management systems.

    Because the IRM represents the central location for a company's information knowledge, it becomes a tool used in enterprise engineering and information systems engineering, as well as data base engineering.

    Selling the concept of Data Resource Management to corporate executives is a relatively simple task, as long as it is communicated in management terms, not technical jargon. Data Resource Management will not be successful as long as executives view it as a technical function. However, if management understands and accepts the fact that Data Resource Management is simply another form of materials management, a vital part of the "Information Factory" concept, they will understand and support an effective Data Resource Management function.  

    METHODOLOGY CONSTRUCTION/NAVIGATION

    The Data Base Engineering Methodology (DBEM) consists of an assembly of six phases detailing what is to be accomplished and by whom. Each phase consists of a defined set of activities (a total of 24); each activity consists of a series of operations or tasks to be performed. All phases, activities and tasks produce tangible deliverables that can be reviewed and checked. These deliverables substantiate adherence to the methodology and permits the measurement of progress. Both formal and informal review points are contained throughout the methodology which provides for the effective dialog between management and data base engineers.

    DBEM emphasizes design correctness and the production of a quality product. The first phase is essentially used to plan the DBEM project. The remaining phases map the four data base models as mentioned earlier. The final phase (6) is used to evaluate the DBEM project.

   

"PRIDE-DBEM
Data Base Engineering Methodology

Methodology Concept Diagram

    PHASE 1 - "Data Base Study & Evaluation"

    A DBEM project is normally initiated in conjunction with an EEM or ISEM related project. During the Enterprise Engineering Methodology, objects are initially identified that relate to business functions. DBEM can use these objects as the basis for initiating a tentative set of Enterprise Logical objects. EEM may also identify maintenance projects where certain parts of the data base may need to be documented. However, most DBEM projects are performed in support of the Information Systems Engineering Methodology. Although DBEM projects are normally performed separately from ISEM projects, it may be desirable from a manageability point-of-view to conduct a combined ISEM/DBEM project, where phases from both methodologies are used.

    In all instances, Phase 1 is used to plan the project and identify the parts of the four data base models affected by the project, either new or existing.

    Information requirements and preliminary object definitions resulting from EEM and/or ISEM are used as the basis for creating or modifying a preliminary design of the data base. Any existing data base models are referenced at this time.

    The rough design becomes the basis for developing a project plan, estimate and schedule for the DBEM project. This type of analysis is used not only in relation to DBEM, but also any ISEM projects it may support.

    The formal deliverable resulting from this phase is the "Data Base Study & Evaluation Report" containing the project scope, requirements, approach and project evaluation considerations. A "Data Base Concept Diagram" is included in the report which is a free form graphical rendering of the proposed data base design.

    A formal review of the phase is conducted where the results of the phase are discussed. Based on this review the project may proceed as proposed, discontinued, or revised as required.

    After Phase 1, the project proceeds according to its scope. For ISEM related projects, the next step will be Phase 2. For EEM related projects, Phase 2 may be circumvented and the project will proceed to Phase 3.

    PHASE 2 - "Application Logical Data Base Design"

    Under the supervision of Data Engineering, this phase is performed by Systems Engineering for the application under development.

    During this phase, the data used in the system is organized into logical files for use in the application's sub-systems. The application logical data base model is developed to express the objects included in the application, along with their relationships, e.g., one-to-one, one-to-many, many-to-many.

    The formal deliverable resulting from this phase is the "Application Logical Data Base Design Manual" which includes an ALDBM diagram or matrix to express the relationships of the design. A technical review of the design is conducted prior to proceeding to the next phase.

    PHASE 3 - "Enterprise Logical Data Base Design"

    This phase is performed by Data Engineering who is fundamentally concerned with the design and control of the logical data base models.

    During this phase, Data Engineering merges the application logical data base design into the enterprise design. Common application objects are consolidated into a single global object with the data needed to support all applications. Relationships between objects on the enterprise level are also reconciled.

    The formal deliverable resulting from this phase is the "Enterprise Logical Data Base Design Manual" which is comparable to the manual produced in Phase 2 except it represents the total enterprise view of the data base. An ELDBM matrix is used to describe the extensive relationships within the model.

    A technical review of the design is conducted prior to proceeding to the next phase.

    PHASE 4 - "Enterprise Physical Data Base Design"

    This phase represents the point where Data Base Administration (DBA) and Data Communications Administration (DCA) formally enter the methodology. Although they are consulted in Phase 1, Phases 4 and 5 represent their primary area of responsibility.

    During this phase, the DBA and DCA develop an approach to physically implement the corporate data base. Various file management tools and techniques are evaluated at this time. To develop an effective design, they must review the requirements of all applications affected. Data volume, required processing speed, hit ratio, volatility, performance and security are issues considered at this time. This becomes the basis for developing the enterprise physical data base model.

    The formal deliverable resulting from this phase is the "Enterprise Physical Data Base Design Manual" containing matrices that map the logical to the physical models. It also contains data definition language statements to implement a DBMS if one is selected. Any additional graphics produced during this phase follow the conventions as established by the file management tool or technique selected.

    A technical review of the design is conducted prior to proceeding to the next phase.

    PHASE 5 - "Application Physical Data Base Design"

    This phase is comparable to the previous phase except it is concerned with only a single application and not the entire corporate data base. The application physical data base represents the subset of the EPDBM used to satisfy the data requirements of a single application.

    The formal deliverable resulting from this phase is the "Application Physical Data Base Design Manual." Like the design manual resulting from the previous phase, it contains data definition language statements to implement a DBMS if one is used. Any additional graphics produced are native to the file management tool or technique selected. The file layouts produced from this phase are then delivered to Software Engineering in ISEM Phase 4-II, "Software Engineering."

    A technical review of the design is conducted prior to proceeding to the next phase.

    PHASE 6 - "DBEM Evaluation"

    This phase is used to conduct an operational review of the data base models affected by the project. Based on the earlier design manuals, the data base is reviewed according to specifications. Also, actual project time is reviewed to determine how well estimates and schedules were met and to determine the total cost of the project.

    The formal deliverable resulting from this phase is a "Project Evaluation Report" containing the findings from the phase.

    DBEM METHODOLOGY STRUCTURE  

    EEM/ISEM RELATIONSHIPS

    The "PRIDE"-Enterprise Engineering Methodology (EEM) precedes both the Information Systems Engineering Methodology (ISEM) and the Data Base Engineering Methodology (DBEM). It is used to formulate an Enterprise Information Strategy that includes development projects for ISEM and DBEM to implement.

    Because of the close relationship between systems and data, DBEM has a close working relationship with ISEM. In Phase 2 of ISEM, sub-systems and logical files are defined. This will then normally initiate a DBEM project to support the data requirements. In Phase 2 of DBEM the Application Logical Data Base Model (ALDBM) is prepared by Systems Engineers under the supervision of Data Engineering. Data Engineering will then incorporate the ALDBM into the Enterprise Logical Data Base Model (ELDBM). Data Base Administration will then design/update the Enterprise Physical Data Base Model (EPDBM) and Application Physical Data Base Model (APDBM). The result of the APDBM is the physical file structures for programming. In other words, the DBA must deliver the file layouts to the Software Engineer in Phase 4-II of ISEM.

    ISEM EVENT GOES TO DBEM
    PHASE 2 Systems Engineering defines the Application Logical Files (Objects). These files ultimately represent the primary storage files. PHASE 2
    PHASE 3 Systems Engineering defines the need for Input/Output related files. This includes computer input transaction files and output data files. This may also include manual files to store input/output documents. PHASE 5
    PHASE 4-II Software Engineering defines the need for computer working files to pass data between programs and modules. PHASE 5

    Data base design is separated from system design so that a neutral third party can accommodate the data requirements for the entire enterprise, not just a specific application.  

    WHO SHOULD PERFORM DBEM?

    There are significant differences between Data Engineering and Data Base Administration (DBA). The Data Engineer tends to be more people and business oriented than the DBA, who is more concerned with the effective implementation of technology.

    The Data Engineer is the chief data architect and Systems Engineering liaison. Essentially, they must be proficient in interpersonal relations/communications, possess good analytical/organizational skills, and knowledgeable about the business of the enterprise. They must also be able to interface with Data Base Administration and Data Communications Administration (DCA).

    The DBA and DCA are more in tune with computer hardware/ software than the business. They typically lack the business and people skills as possessed by Data Engineering. This is not to demean their role, only to clarify it. Although Phases 4 and 5 are the principal phases where DBA/DCA works, this does not limit their involvement in the methodology. In fact, they actively participate in the early phases of DBEM in a consulting capacity to Data Engineering. During these phases, DBA and DCA analyzes the viability of designs as prepared by Data Engineering, as well as participate in project estimating/scheduling activities. This participation in the early phases promotes cooperation and communications between the two functions.

    OTHER FUNCTIONS PARTICIPATING IN DBEM

    As in any other work effort, Project Managers are required to plan, estimate, schedule, and control a DBEM project. Further, Quality Assurance personnel review and verify deliverables resulting from the various phases of the methodology.

    Data Processing Operations are interviewed and consulted during the course of a DBEM project. As a result, their participation is vital. End-users are also consulted in terms of physical file requirements (particularly manual files).

    Systems Engineering and Software Engineering also play important roles in DBEM. As chief architect of the information system, Systems Engineering is responsible for defining the data elements needed to support information requirements and the objects (logical files) used in the system. Consequently, Systems Engineering must perform Phase 2 of DBEM to assure that object views and relationships are properly defined. Software Engineering is actually the benefactor of DBEM, receiving the necessary file layouts for use in programming. However, in the absence of a robust Data Resource Management organization, a Software Engineer may have to fulfill the Data Base Administration function.

    When and where these other functions are involved during DBEM is defined by the methodology. Consult the DBEM Functional Matrices for a summary description.  

    BENEFIT$

    The benefits of data base engineering are numerous:

    • It assures that Data Resource Management personnel are "doing the right things," performing data base design that will accurately satisfy the needs of systems applications.

    • Provides for improved control over the data base design process by simplifying the work effort.

    • It demystifies the data base design process.

    • It improves staff performance through a standard and consistent approach.

    • Improves communications between systems development and data base personnel, as well as the data base staff internally.

    • Improved consistency and quality of data bases.

    • Data bases behave reliably and are easier to maintain and modify.

    • More efficient use and control of physical file management tools and techniques.

    • Improved control over project planning and execution.

    • Improved corporate control over information resources.

    • Competitive edge through accurate and timely information under changing conditions.

   


Copyright © 1971-2008 by M. Bryce & Associates
Palm Harbor, Florida, USA
All rights reserved.