PRIDE ® -DBEM
Data Base Engineering Methodology
PHASE 4 - ENTERPRISE PHYSICAL DATA BASE DESIGN

ACTIVITY A   ACTIVITY B   ACTIVITY C   ACTIVITY D   ACTIVITY E  
FUNCTIONAL MATRIX   CHECKLIST   FORMS

TRANSLATE THIS PAGE TO... Chinese (simple)   Japanese       Dutch   French     German     Italian    
Free Translation courtesy of ALS      Chinese (traditional)   Korean       Portuguese       Russian       Spanish         

CONTENTS

"Whereas the logical Data Base will remain relatively static, the physical data base changes dynamically."
- Bryce's Law

This section contains the following:


 
    BUSINESS PURPOSE

    The purpose of this phase is to design or modify the Enterprise Physical Data Base Model (EPDBM). Several events occur during this phase:

    • A method of implementation is selected to physically store data to satisfy the logical data base models.

    • Enterprise Physical files are defined, complete with records and data elements.

    • Relationships between physical files and records are established.

    • Relationships between the EPDBM and ELDBM are established.

    • A schematic of the Enterprise Physical Data Base Model is prepared and reviewed for accuracy.
     

    METHODOLOGY NAVIGATION

    A Phase 4 is initiated following completion of a normal Phase 3 (ISEM related). There is a one-to-one relationship between Phase 4 and the Enterprise Physical Data Base Model as represented by an FD resource in the IRM. As such, the Project Manager may bind the FD number to the project/phase key. For example:

    PD-00123 - Project FD-00002 - Enterprise Physical Data Base Model (EPDBM) 4 - Phase 4 identifier

    The formal deliverable resulting from Phase 4 is an "EPDBM Design Manual" consisting of a description of the Enterprise Physical Data Base Model. This is reviewed with Data Resource Management who must decide whether to:

    • Approve the project for continuation.

    • Request revisions to the phase findings.

    • Discontinue the project.

    Following Phase 4, the DBEM project will proceed to Phase 5, "Application Physical Data Base Design," where physical file structures will be created to implement the logical data base design for a single application.  

    GENERAL DISCUSSION

    Phase 4 represents a major transitional stage in the methodology: going from logical data base design to physical. Whereas Data Engineering had primary responsibility for executing the early phases of DBEM, responsibility now shifts to Data Base Administration and Data Communications Administration who are responsible for the design and development of physical hardware/software related to the collection, storage and retrieval of data.

    The Enterprise Physical Data Base Model (EPDBM) is a global view of all of the files used to store data, this includes manual files, computer related files, and any other storage media. As such, there is only one EPDBM associated with a single enterprise.

    Whereas the ELDBM is a relatively static view of all corporate data, the EPDBM will be more volatile. It reflects the technology used for implementation at any given time. To create the EPDBM, Objects of the ELDBM may be regrouped or split in different ways, according to performance, security and administrative considerations. In other words physical files and records will not necessary be designed in the same manner as their logical counterparts.

    The model consists of multi-level aggregations of data within one or more structures (Network, Hierarchical, Relational, flat files), which will depend on the type of technology and particular product(s) being used. This narrative refers to these aggregates as data bases, files, records and data elements, but relates them to the terminology used in a particular product. Some products refer to this level of modeling as the "Schema" or "Logical Data Bases" and reserves the term Physical Model to descriptions at a lower level of abstraction, sometimes involving even hardware descriptions. DBEM considers this level as the Physical Model and reserves the term Logical for the independent levels of abstraction.

    The EPDBM provides data base designers with the freedom to choose the best technology for each new application, either a DBMS, disk, tape or even manual files, keeping it independent from the logical models. Logical files and records in the ELDBM are ultimately defined to reflect the rules of the business, independent of physical considerations. A straightforward implementation with the same logical and physical structures could possibly produce poor performance or place unreasonable physical constraints on the data base. Even more important, the physical implementation may be changed in response to performance tuning, new technology, or even varying usage loads without impacting the logical models known to Users and Systems Engineering. In particular, distributed data bases and data communications procedures are implemented and documented at this level only, and not in the logical models.

    The Enterprise Physical Data Base Model must satisfy the following requirements:

    1. Be comprehensive enough to represent all data stored in the enterprise's data base, regardless if physical files are centralized or distributed.

    2. Be updated constantly to reflect any changes made to existing files as a result of new developments, modifications, improvements or even performance related reorganizations.

    3. Contain an exact image of the physical implementation, including access paths or any other information necessary for the particular DBMS or file management technique being used.

    4. Uses the structure and terminology of the specific product used to implement each part of the EPDBM.

    The number of aggregation levels, their names and contents vary widely with each file management technique, which makes it impossible to refer to all of them in this narrative. When describing the procedures to convert the ELDBM to either flat files or each of the major DBMS types (Hierarchical, Network and Relational) standard terminology is used within each one. It is quite possible to have several DBMS packages as well as any other types of files within the EPDBM. Each product's terminology and constructs is used in the parts of the EPDBM it implements.  

    LOGICAL VERSUS PHYSICAL RESOURCE STRUCTURES

    The differences between logical and physical data structures can be considerable. Aside from using the same type of IRM resources to model them in the IRM (FD, RD, DD resources), there is little else that the two models share in common.

    The logical model is governed based on some very specific rules. For example, "primary basic groupings," the three types of "views," fact/event relationships, use of indicative data, etc. These rules were created to develop simple data abstractions about the various objects used in the operation of the business. Ultimately, they promote data independence from the physical environment.

    The physical model is totally different. It was created to model the physical environment, and, as such, is dependent on it. This is why it will evolve as technology evolves. Unlike the logical model where the "basic grouping" requires indicative data, the physical model can use whatever sort/access key is convenient for the storage and processing of data. Further, records can be merged and separated as required. This reflects the fact that the physical model is dependent on the physical file management tools and techniques in use.  

    FORMS OF PHYSICAL FILES

    The media used to implement physical files can take many forms: disk, tape, diskettes, cards, manual files, even memory. Defining a file in memory is somewhat controversial, yet important. If programs are going to write working files to memory as part of their normal processing, understanding the layout of the file is just as important as if it were written to disk. Data Base Administration needs to know the capabilities of physical files and processing requirements in order to advise DP Operations for hardware capacity planning.

    Computer files may use a variety of file management techniques: Sequential, Index Sequential, Random, or DBMS specific. The decision to use a particular file management technique should be based on what is effective and economical, not on what is technically fashionable. For example, not everything requires a DBMS. There is little point to using a DBMS if it is going to be used as nothing more than an elegant access method.

    When selecting a suitable file management technique, Data Base Administration must consider the following:

    • PERFORMANCE - anticipated volume of transactions versus speed. Typically, the greater the volume of transactions, the slower the computer processing speed. The fewer the transactions, the greater the responsiveness. Systems Engineering should be able to estimate the required transaction volumes, processing speed, file volatility and hit ratio. This will become the basis for establishing benchmark tests.

    • SECURITY & SHARING - this will influence the file access method to be selected, distribution, and use in multiple applications.

    • BACKUP/RECOVERY REQUIREMENTS - will dictate file retention and other maintenance considerations.

    This highlights the fact that Data Base Administration requires information about the various applications in order to arrive at a suitable physical file design. This includes data and processing specifications as recorded in the IRM by Systems Engineering. Data Communications will also use these specifications to devise a data transmission/conversion strategy.  

    SELECTING A DATA BASE MANAGEMENT SYSTEM (DBMS)

    When should you consider a DBMS? Any time when data is going to be shared between applications on the same computer. As with any file management technique, selection should be based on the criteria mentioned above. Because there are significant differences between various file management techniques, it is unlikely that a single DBMS will be able to serve all of the needs of a company. In fact, DBMS packages will proliferate in companies as data bases become more distributed.

    Currently, there are essentially three different types of DBMS products commercially available:

    • Hierarchical - Featuring a parent/child organization structure (a tree structure).

    • Network - Featuring user defined chains and pointers to relate the different sections in a data base.

    • Relational - Featuring user defined tables based on keys.

      NOTE: As of this writing, Object-Oriented DBMS packages are starting to emerge. However, the technology is still in its infancy and, as such, has not been included in this narrative.

    Although the intent of the three types of DBMS' is to promote the sharing of data and simplify data access, their implementations are significantly different. To illustrate:  

    DBMS PROS AND CONS

    Feature
    HIERARCHICAL
    NETWORK
    RELATIONAL
    ACCESS PATH
    Tree
    Processing.
    User defined
    chains; can be
    extensive.
    Based on
    selected
    keys.
    REFERENTIAL
    INTEGRITY
    Excellent.
    Good if chains
    are well
    maintained.
    Good;
    shows
    promise.
    DATA
    SHARING
    Hindered due
    to complicated
    global DB.
    Multi data
    bases normally
    occur.
    Good.
    Maintaining
    global DB
    is easier
    than
    hierarchical.
    Excellent.
    Encourages
    sharing.
    PROCESSING
    CONSIDERATIONS
    Excellent
    for heavy
    transaction
    volume.
    Excellent
    for heavy
    transaction
    volume.
    Slower for
    heavy trans
    volume.
    Consumes
    resources.
    DEVELOPMENT
    CONSIDERATIONS
    Harder to
    develop.
    Mostly supports
    procedural
    languages.
    Harder to
    develop.
    Mostly supports
    procedural
    languages.
    Easier to
    develop.
    Supports both
    procedural &
    non-proc lang.
    MISCELLANEOUS
    No standard.
    Predominantly
    IBM's IMS.
    CODASYL
    standard.
    Many packages.
    SQL Standard-
    ANSI.
    Trend is to
    relational.

    Obviously there are trade-offs between the various approaches. One is not necessarily better than the other. Bottom-line: Data Base Administration must resolve how well the DBMS fulfills the data and processing requirements of the various applications.

    In order to represent the different structures found in the major DBMS products within a standard framework, it is necessary to draw a correspondence among the various levels of aggregation they support. The table below shows such a correspondence and the IRM resources used to represent them:

    IRM
    Resources
    Flat
    File
    Hierarchical
    (IMS)
    Network
    (CODASYL)
    Relational
    (DB2)
    DD
    Field
    Field
    Field
    Attribute
    RD
    Record
    Segment
    Record
    Table
    FD
    File
    Physical DB
    Area
    Tablespace
    FD
     
    Logical DB
    Schema
    Database

    While the first two levels (field and record) are quite similar for most DBMS packages, the clustering provided by an IMS Physical Data Base, a CODASYL Area or a DB2 Tablespace are quite different. However, these and higher level aggregations are still represented as FD resources.  

    ENTERPRISE PHYSICAL DATA BASE MODEL (EPDBM) BASIC CONSTRUCTS

    ------------- |FD-(EPDBM) | |-----------| |ENTERPRISE | | PHYSICAL | | DB MODEL | ------------- | ----------------------------------- V V ------------- ------------- |FD-(FILE) | |FD-(FILE) | |-----------| |-----------| | CUSTOMER |<--------------------| ORDER | | FILE | | FILE | ------------- ------------- | | | ---------------------- V V V ------------ ------------ ------------ | RD-(REC) | | RD-(REC) | | RD-(REC) | |----------| |----------| |----------| | CUSTOMER |<--------| POINTER | | ORDER | | MAIN REC | | | | DETAILS | ------------ ------------ ------------ KEY-ZIP/POSTAL CODE KEY-ORDER NO. KEY-ORDER NO. DD-CUSTOMER NO. DD-CUSTOMER NO. DD-DATE DD-NAME DD-PRODUCT NO DD-TYPE ADDRESS DD-QTY. (ORD) DD-CITY DD-STATE DD-COUNTRY

    NOTES: The pointers represent relationships between the various resources as physically recorded in the IRM. RD-to-RD and FD-to-FD relationships depend on the selected file management technique.

    • FD represents File Description
    • RD represents Record Description
    • KEY represents the "Sort/Access Key" data elements.
    • DD represents Data Definitions
     

    MANUAL FILES

    Manual files, which are used to store documents, follow a slightly different relationship:

    ------------- |FD-(FILE) | |-----------| | MANUAL | | FILE | | | ------------- | ------------- | ID/OD | |-----------| | INPUT/ | | OUTPUT | | DOCUMENTS | ------------- | ------------- |RD-(RECORD)| |-----------| | RECORDS | | | | | ------------- | ------------- |DD-(DATA) | |-----------| | DATA | | ELEMENTS | | | -------------  

    ELDBM TO EPDBM RELATIONSHIPS

    Data Base Administration has the responsibility of mapping the relationships between the ELDBM and the EPDBM. As mentioned, there is not always a one-to-one relationship between logical and physical structures. One logical file may be implemented by multiple physical files. The point is, the relationships between logical and physical must be mapped in the IRM to demonstrate that the physical implementation satisfies the logical.

    ------------- | RD-(VIEW) | ENTERPRISE LOGICAL |-----------| DATA BASE MODEL | EMPLOYEE | (ELDBM) | VIEW | ------------- | -------------------------- | | V V ------------- ------------- |RD-(RECORD)| |RD-(RECORD)| ENTERPRISE PHYSICAL |-----------| |-----------| DATA BASE MODEL | EMPLOYEE |<-----------| EMPLOYEE | (EPDBM) | TABLE | | RECORD | ------------- -------------  

    EPDBM TO APDBM

    The EPDBM represents the global view of data storage requirements. During Phase 5, Data Base Administration must consider physical requirements for individual systems. This is where the Application Physical Data Base Model (APDBM) is defined. Ultimately, the APDBM represents subsets of the EPDBM. Relationships between the EPDBM and the APDBM can be expressed using RD-to-RD and FD-to-FD relationships.

    PAYROLL SYSTEM EPDBM BENEFITS SYSTEM APDBM APDBM ------------- ------------- ------------- |FD-(FILE) | |FD-(FILE) | |FD-(FILE) | |-----------| |-----------| |-----------| | EMPLOYEE | | EMPLOYEE | | EMPLOYEE | | FILE | |MASTER FILE| | FILE | ------------- ------------- ------------- | | | | | | V V V ------------- ------------- ------------- | RD-(REC) | | RD-(REC) | | RD-(REC) | |-----------| |-----------| |-----------| | EMPLOYEE |<-----------| EMPLOYEE |----------->| EMPLOYEE | | RECORD | | MAIN REC | | TABLE | ------------- ------------- -------------  

    RECORD DESCRIPTION (RD)

    As this discussion indicates, the Record Description (RD) resource in the IRM is used for a variety of purposes. In a logical context, the RD represents a "View"; in a physical context, the RD represents a "Record." The RD is also used to represent other physical structures, such as: screen panels, print maps, input transactions, output data, messages, tables, etc. All of these structures exhibit the same characteristics; all require some form of unique key, and all contain data elements.

    The File Description (FD) is used to represent logical files (Objects), physical files, and data bases. Files consist of records, thereby an FD/RD relationship is required. A data base represents a global view of data, therefore an FD/FD relationship is used to map all of the files within a data base.

    The RD is also the primary resource used to map the relationships between all four data base models. Although the File Description (FD) provides for FD-to-FD relationships, it is the RD that is used to connect the models.

    The following diagram illustrates all of the BASIC relationships needed to map the four data base models in the IRM. Additional pointers are available for different purposes, particularly for complicated physical file structures.  

    IRM RESOURCE RELATIONSHIPS - MAPPING THE FOUR DATA BASE MODELS

    NOTES:
    ALDBM - Application Logical Data Base ModelFD - File Description
    ELDBM - Enterprise Logical Data Base ModelRD - Record Description
    EPDBM - Enterprise Physical Data Base Model 
    APDBM - Application Physical Data Base Model 

    This diagram illustrates all of the BASIC relationships needed to map the four data base models. Additional pointers are available for different purposes, particularly for complicated physical file structures.  

    EPDBM REPRESENTATION

    Graphical techniques to represent physical file management techniques may vary from vendor to vendor. Data Base Administration should use whatever technique is available. To represent FD/RD/DD structures in the IRM, a variety of matrices are available:

    • EPDBM Files Matrix - expresses relationships between enterprise physical files using FD-to-FD Relationships.

    • EPDBM Records Matrix - expresses relationships between enterprise physical records using RD-to-RD Relationships.

    • Additional Matrices - To substantiate that the EPDBM correctly implements the ELDBM and various ALDB Models, additional matrices are used to show:

      • ELDBM/EPDBM Files Matrix
      • ELDBM/EPDBM Records Matrix
      • ALDBM/EPDBM Files Matrix
      • ALDBM/EPDBM Records Matrix

    Aside from the matrices, the other major deliverable resulting from this phase are file layouts which detail the physical data structures.  

    DESCRIPTION OF PHASE ACTIVITIES

    Activity A - Develop Hardware/Software Approach

    Data Base Administration (DBA) and Data Communications Administration (DCA) reviews the current ELDBM and ALDBM designs as created in the previous phases. From this, they determine the most suitable method of implementation for physically collecting, storing, and retrieving data. This includes both hardware and software.

    Activity B - Review Approach

    DBA/DCA review the Data Base Approach with management to obtain approval. Hardware/Software acquisitions are decided here.

    Activity C - Define Enterprise Physical DB Model

    Data Base Administration designs/modifies/documents the Physical Data Base either according to the specifications of the previous activity or existing operational statements. Data Communications Administration designs/ modifies network software as required.

    Activity D - Create D.D.L. Statements

    Data Base Administration creates the necessary Data Definition Statements (SCHEMA, DBD, CREATE, etc) to implement the design of the previous activity. Data Communications Administration creates the corresponding network statements.

    Also, relationships between ELDBM models and the EPDBM model are established. This is based on RD-to-RD relationships.

    DBA/DCA then assembles the deliverables produced in the phase and packages them into a single phase review document which is evaluated by Quality Assurance prior to conducting the formal phase review with management.

    Activity E - Phase 4 Review

    DBA/DCA conducts a formal review of the Phase 4 deliverables with Data Resource Management, Data Engineering, and Project Management. The review is used to evaluate the work performed thus far and revise as required. At this time, management will review the formal Phase 4 "EPDBM Design Manual" consisting of:

    • Phase Cover Page
    • EPDBM Narrative
    • EPDBM File Matrix
    • EPDBM Records Matrix
    • Data Resource Layouts
    • ELDBM/EPDBM File Matrix
    • ELDBM/EPDBM Records Matrix
    • ALDBM/EPDBM File Matrix
    • ALDBM/EPDBM Records Matrix
    • Phase Review Checklist

    Based on this report and subsequent review meeting, management may elect to revise parts of the report, discontinue the project, or approve it for continuation.

   


Copyright © 1971-2009 by M. Bryce & Associates
Palm Harbor, Florida, USA
All rights reserved.