Conceptual Database Design|Information Systems
The three main phases of database design: conceptual, logical,
and physical design.
• How to decompose the scope of the design into specific views of
the enterprise.
• How to use ER modeling to build a local conceptual data model
based on the information given in a view of the enterprise.
• How to validate the resultant conceptual data model to ensure
that it is a true and accurate representation of a view of the
enterprise.
• How to document the process of conceptual database design.
• End-users play an integral role throughout the process of
conceptual database design.
In Chapter 10Chapter 10 we described the main stages of the database system
development lifecycle, one of which is database designdatabase design. This stage
starts only after a complete analysis of the enterprise’s requirements
has been undertaken.
In this chapter, and Chapters 17Chapters 17–1919, we describe a methodology for
455
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid27876
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37780
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41343
the database design stage of the database system development
lifecycle for relational databases. The methodology is presented as a
step-by-step guide to the three main phases of database design,
namely: conceptual, logical, and physical design (see Figure 10.1Figure 10.1).
The main aim of each phase is as follows:
• Conceptual database designConceptual database design—to build the conceptual
representation of the database, which includes identification of
the important entities, relationships, and attributes.
• Logical database designLogical database design—to translate the conceptual
representation to the logical structure of the database, which
includes designing the relations.
• Physical database designPhysical database design—to decide how the logical structure
is to be physically implemented (as base relations) in the target
DBMS.
Structure of this Chapter
In Section 16.1Section 16.1 we define what a database design methodology is
and review the three phases of database design. In Section 16.2Section 16.2 we
provide an overview of the methodology and briefly describe the
main activities associated with each design phase. In Section 16.3Section 16.3
we focus on the methodology for conceptual database design and
present a detailed description of the steps required to build a
conceptual data model. We use the ER modeling technique described
in Chapters 12Chapters 12 and 1313 to create the conceptual data model. The
conceptual data model described in this chapter is the starting point
for the next phase of database design described in the following
455
456
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28003
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36569
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36642
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36811
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31274
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_022.xhtml#eid32613
chapter.
In Chapter 17Chapter 17 we focus on the methodology for logical database
design for the relational model and present a detailed description of
the steps required to convert a conceptual data model into a logical
data model. This chapter also includes an optional step that describes
how to merge two or more logical data models into a single logical
data model, for those using the view integration approach (see
Section 10.5Section 10.5) to manage the design of a database with multiple user
views. The logical data model described in Chapter 17Chapter 17 is the starting
point for the final phase of database design described in the following
two chapters.
In Chapters 18Chapters 18 and 1919 we complete the database design
methodology by presenting a detailed description of the steps
associated with the production of the physical database design for
relational DBMSs. This part of the methodology illustrates that the
development of the logical data model alone is insufficient to
guarantee the optimum implementation of a database system. For
example, we may have to consider modifying the logical model to
achieve acceptable levels of performance.
Appendix DAppendix D presents a summary of the database design
methodology for those readers who are already familiar with
database design and simply require an overview of the main steps.
Throughout the methodology the terms “entity” and “relationship”
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37780
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28150
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37780
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39671
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41343
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_054.xhtml#eid85549
are used in place of “entity type” and “relationship type” where the
meaning is obvious; “type” is generally added only to avoid
ambiguity. In this chapter we mostly use examples from the
StaffClient user views of the DreamHome case study documented in
Section 11.4Section 11.4 and Appendix AAppendix A.
16.1 Introduction to the Database Design Methodology
Before presenting the methodology, we discuss what a design
methodology represents and describe the three phases of database
design. Finally, we present guidelines for achieving success in
database design.
16.1.1 What Is a Design Methodology?
Design methodology
A structured approach that uses procedures, techniques, tools, and
documentation aids to support and facilitate the process of design.
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_020.xhtml#eid30021
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_051.xhtml#eid84853
A design methodology consists of phases each containing a number of
steps that guide the designer in the techniques appropriate at each
stage of the project. A design methodology also helps the designer to
plan, manage, control, and evaluate database development projects.
Furthermore, it is a structured approach for analyzing and modeling a
set of requirements for a database in a standardized and organized
manner.
16.1.2 Conceptual, Logical, and Physical Database Design
In presenting this database design methodology, the design process is
divided into three main phases: conceptual, logical, and physical
database design.
Conceptual database design
The process of constructing a model of the data used in an enterprise,
independent of all physical considerations.
The conceptual database design phase begins with the creation of a
conceptual data model of the enterprise that is entirely independent
of implementation details such as the target DBMS, application
programs, programming languages, hardware platform, performance
issues, or any other physical considerations.
456
457
Logical database design
The process of constructing a model of the data used in an enterprise
based on a specific data model, but independent of a particular DBMS
and other physical considerations.
The logical database design phase maps the conceptual data model on
to a logical model, which is influenced by the data model for the target
database (for example, the relational model). The logical data model is
a source of information for the physical design phase, providing the
physical database designer with a vehicle for making trade-offs that
are very important to the design of an efficient database.
Physical database design
The process of producing a description of the implementation of the
database on secondary storage; it describes the base relations, file
organizations, and indexes used to achieve efficient access to the
data, and any associated integrity constraints and security measures.
The physical database design phase allows the designer to make
decisions on how the database is to be implemented. Therefore,
physical design is tailored to a specific DBMS. There is feedback
between physical and logical design, because decisions taken during
physical design for improving performance may affect the logical data
model.
16.1.3 Critical Success Factors in Database Design
The following guidelines are often critical to the success of database
design:
• Work interactively with the users as much as possible.
• Follow a structured methodology throughout the data modeling
process.
• Employ a data-driven approach.
• Incorporate structural and integrity considerations into the data
models.
• Combine conceptualization, normalization, and transaction
validation techniques into the data modeling methodology.
• Use diagrams to represent as much of the data models as possible.
• Use a Database Design Language (DBDL) to represent additional
data semantics that cannot easily be represented in a diagram.
• Build a data dictionary to supplement the data model diagrams
and the DBDL.
• Be willing to repeat steps.
These factors are built into the methodology we present for database
design.
16.2 Overview of the Database Design Methodology
In this section, we present an overview of the database design
457
458
methodology. The steps in the methodology are as follows.
Conceptual database design
Step 1 Build conceptual data model
Step 1.1 Identify entity types
Step 1.2 Identify relationship types
Step 1.3 Identify and associate attributes with entity or
relationship types
Step 1.4 Determine attribute domains
Step 1.5 lDetermine candidate, primary, and alternate key
attributes
Step 1.6 Consider use of enhanced modeling concepts (optional
step)
Step 1.7 Check model for redundancy
Step 1.8 Validate conceptual data model against user
transactions
Step 1.9 Review conceptual data model with user
Logical database design for the relational model
Step 2 Build logical data model
Step 2.1 Derive relations for logical data model
Step 2.2 Validate relations using normalization
Step 2.3 Validate relations against user transactions
Step 2.4 Check integrity constraints
Step 2.5 Review logical data model with user
Step 2.6 Merge logical data models into global model (optional
step)
Step 2.7 Check for future growth
Physical database design for relational databases
Step 3 Translate logical data model for target DBMS
Step 3.1 Design base relations
Step 3.2 Design representation of derived data
Step 3.3 Design general constraints
Step 4 Design file organizations and indexes
Step 4.1 Analyze transactions
Step 4.2 Choose file organizations
Step 4.3 Choose indexes
Step 4.4 Estimate disk space requirements
Step 5 Design user views
Step 6 Design security mechanisms
Step 7 Consider the introduction of controlled redundancy
Step 8 Monitor and tune the operational system
This methodology can be used to design relatively simple to highly
complex database systems. Just as the database design stage of the
database systems development lifecycle (see Section 10.6Section 10.6) has three
phases—conceptual, logical, and physical design—so too has the
methodology. Step 1Step 1 creates a conceptual database design, Step 2Step 2
creates a logical database design, and Steps 3Steps 3 to 88 create a physical
database design. Depending on the complexity of the database system
being built, some of the steps may be omitted. For example, Step 2.6Step 2.6
of the methodology is not required for database systems with a single
user view or database systems with multiple user views being
managed using the centralized approach (see Section 10.5Section 10.5). For this
reason, we refer to the creation of a single conceptual data model only
in Step 1Step 1 and single logical data model only in Step 2Step 2. However, if the
database designer is using the view integration approach (see SectionSection
10.510.5) to manage user views for a database system, then Steps 1Steps 1 and 22
458
459
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28243
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39813
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41917
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38623
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28150
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28150
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
may be repeated as necessary to create the required number of
models, which are then merged in Step 2.6Step 2.6.
In Chapter 10Chapter 10, we introduced the term “local conceptual data model”
or “local logical data model” to refer to the modeling of one or more,
but not all, user views of a database system and the term “global
logical data model” to refer to the modeling of all user views of a
database system. However, the methodology is presented using the
more general terms “conceptual data model” and “logical data model”
with the exception of the optional Step 2.6Step 2.6, which necessitates the use
of the terms local logical data model and global logical data model, as
it is this step that describes the tasks necessary to merge separate local
logical data models to produce a global logical data model.
An important aspect of any design methodology is to ensure that the
models produced are repeatedly validated so that they continue to be
an accurate representation of the part of the enterprise being
modeled. In this methodology the data models are validated in various
ways such as by using normalization (Step 2.2Step 2.2), by ensuring the
critical transactions are supported (Steps 1.8Steps 1.8 and 2.32.3), and by
involving the users as much as possible (Steps 1.9Steps 1.9 and 2.52.5).
The logical model created at the end of Step 2Step 2 is then used as the
source of information for physical database design described in StepsSteps
33 to 88. Again, depending on the complexity of the database systems
being designed and/or the functionality of the target DBMS, some of
the steps of physical database design may be omitted. For example,
Step 4.2Step 4.2 may not be applicable for certain PC-based DBMSs. The steps
of physical database design are described in detail in Chapters 18Chapters 18
and 1919.
459
460
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38623
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid27876
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38623
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38385
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37552
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38428
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37612
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38593
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39813
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41917
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid40488
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39671
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41343
Database design is an iterative process that has a starting point and an
almost endless procession of refinements. Although the steps of the
methodology are presented here as a procedural process, it must be
emphasized that this does not imply that it should be performed in
this manner. It is likely that knowledge gained in one step may alter
decisions made in a previous step. Similarly, it may be useful to look
briefly at a later step to help with an earlier step. Therefore, the
methodology should act as a framework to help guide the designer
through database design effectively.
To illustrate the database design methodology we use the DreamHome
case study. The DreamHome database has several user views
(Director, Manager, Supervisor, Assistant, and Client) that are
managed using a combination of the centralized and view integration
approaches (see Section 11.4Section 11.4). Applying the centralized approach
resulted in the identification of two collections of user views called
StaffClient user views and Branch user views. The user views
represented by each collection are as follows:
• StaffClient user viewsStaffClient user views—representing Supervisor, Assistant, and
Client user views;
• Branch user viewsBranch user views—representing Director and Manager user
views.
In this chapter, which describes Step 1Step 1 of the methodology, we use the
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_020.xhtml#eid30021
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
StaffClient user views to illustrate the building of a conceptual data
model, and then in the following chapter, which describes Step 2Step 2, we
describe how this model is translated into a logical data model. As the
StaffClient user views represent only a subset of all the user views of
the DreamHome database, it is more correct to refer to the data
models as local data models. However, as stated earlier when we
described the methodology and the worked examples, for simplicity
we use the terms conceptual data model and logical data model until
the optional Step 2.6Step 2.6, which describes the integration of the local
logical data models for the StaffClient user views and the Branch user
views.
16.3 Conceptual Database Design Methodology
This section provides a step-by-step guide for conceptual database
design.
Step 1: Build Conceptual Data Model
Objective
To build a conceptual data model of the data requirements of the
enterprise.
The first step in conceptual database design is to build one (or more)
conceptual data models of the data requirements of the enterprise. A
conceptual data model comprises:
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38623
• entity types;
• relationship types;
• attributes and attribute domains;
• primary keys and alternate keys;
• integrity constraints.
The conceptual data model is supported by documentation, including
ER diagrams and a data dictionary, which is produced throughout the
development of the model. We detail the types of supporting
documentation that may be produced as we go through the various
steps. The tasks involved in Step 1Step 1 are:
Step 1.1 Identify entity types
Step 1.2 Identify relationship types
Step 1.3 Identify and associate attributes with entity or relationship
types
Step 1.4 Determine attribute domains
Step 1.5 Determine candidate, primary, and alternate key attributes
Step 1.6 Consider use of enhanced modeling concepts (optional step)
Step 1.7 Check model for redundancy
Step 1.8 Validate conceptual data model against user transactions
Step 1.9 Review conceptual data model with user
Step 1.1: Identify entity types
Objective
To identify the required entity types.
The first step in building a conceptual data model is to determine and
460
461
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
define the main objects that the users are interested in. These objects
are the entity types for the model (see Section 12.1Section 12.1). One method of
identifying entities is to examine the users’ requirements
specification. From this specification, we identify nouns or noun
phrases that are mentioned (for example, staff number, staff name,
property number, property address, rent, number of rooms). We also
look for major objects, such as people, places, or concepts of interest,
excluding those nouns that are merely qualities of other objects. For
example, we could group staff number and staff name with an object
or entity called Staff and group property number, property address,
rent, and number of rooms with an entity called PropertyForRent.
An alternative way of identifying entities is to look for objects that
have an existence in their own right. For example, Staff is an entity
because staff exist whether or not we know their names, positions,
and dates of birth. If possible, the users should assist with this activity.
It is sometimes difficult to identify entities because of the way they are
presented in the users’ requirements specification. Users often talk in
terms of examples or analogies. Instead of talking about staff in
general, users may mention people’s names. In some cases, users talk
in terms of job roles, particularly when people or organizations are
involved. These roles may be job titles or responsibilities, such as
Director, Manager, Supervisor, or Assistant.
To confuse matters further, users frequently use synonyms and
homonyms. Two words are synonyms when they have the same
meaning, for example, “branch” and “office.” Homonyms occur when
the same word can have different meanings depending on the context.
For example, the word “program” has several alternative meanings
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31333
such as a course of study, a series of events, a plan of work, and an
item on the television.
It is not always obvious whether a particular object is an entity, a
relationship, or an attribute. For example, how would we classify
marriage? In fact, depending on the actual requirements, we could
classify marriage as any or all of these. Design is subjective, and
different designers may produce different, but equally valid,
interpretations. The activity therefore relies, to a certain extent, on
judgement and experience. Database designers must take a very
selective view of the world and categorize the things that they observe
within the context of the enterprise. Thus, there may be no unique set
of entity types deducible from a given requirements specification.
However, successive iterations of the design process should lead to the
choice of entities that are at least adequate for the system required.
For the StaffClient user views of DreamHome, we identify the
following entities:
Staff PropertyForRent
PrivateOwner BusinessOwner
Client Preference
461
462
Lease
Document entity types
As entity types are identified, assign them names that are meaningful
and obvious to the user. Record the names and descriptions of entities
in a data dictionary. If possible, document the expected number of
occurrences of each entity. If an entity is known by different names,
the names are referred to as synonyms or aliases, which are also
recorded in the data dictionary. Figure 16.1Figure 16.1 shows an extract from
the data dictionary that documents the entities for the StaffClient user
views of DreamHome.
Step 1.2: Identify relationship types
Objective
To identify the important relationships that exist between the entity
types.
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36951
Figure 16.1 Extract from the data dictionary for the StaffClient user views of DreamHome showing a
description of entities.
Having identified the entities, the next step is to identify all the
relationships that exist between these entities (see Section 12.2Section 12.2).
When we identify entities, one method
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31377
is to look for nouns in the users’ requirements specification. Again, we
can use the grammar of the requirements specification to identify
relationships. Typically, relationships are indicated by verbs or verbal
expressions. For example:
• Staff Manages PropertyForRent
• PrivateOwner Owns PropertyForRent
• PropertyForRent AssociatedWith Lease
The fact that the requirements specification records these
relationships suggests that they are important to the enterprise and
should be included in the model.
We are interested only in required relationships between entities. In
the previous examples, we identified the Staff Manages
PropertyForRent and the PrivateOwner Owns PropertyForRent
relationships. We may also be inclined to include a relationship
between Staff and PrivateOwner (for example, Staff Assists
PrivateOwner). However, although this is a possible relationship, from
the requirements specification, it is not a relationship that we are
interested in modeling.
In most instances, the relationships are binary; in other words, the
relationships exist between exactly two entity types. However, we
should be careful to look out for complex relationships that may
involve more than two entity types (see Section 12.2.1Section 12.2.1) and
recursive relationships that involve only one entity type (see SectionSection
12.2.212.2.2).
462
463
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31466
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31543
Great care must be taken to ensure that all the relationships that are
either explicit or implicit in the users’ requirements specification are
detected. In principle, it should be possible to check each pair of entity
types for a potential relationship between them, but this would be a
daunting task for a large system comprising hundreds of entity types.
On the other hand, it is unwise not to perform some such check, and
the responsibility is often left to the analyst/designer. However,
missing relationships should become apparent when we validate the
model against the transactions that are to be supported (Step 1.8Step 1.8).
Use Entity–Relationship (ER) diagrams
It is often easier to visualize a complex system rather than decipher
long textual descriptions of a users’ requirements specification. We
use ER diagrams to represent entities and how they relate to one
another more easily. Throughout the database design phase, we
recommend that ER diagrams be used whenever necessary to help
build up a picture of the part of the enterprise that we are modeling.
In this book, we use UML, but other notations perform a similar
function (see Appendix CAppendix C).
Determine the multiplicity constraints of relationship types
Having identified the relationships to model, we next determine the
multiplicity of each relationship (see Section 12.6Section 12.6). If specific values
for the multiplicity are known, or even upper or lower limits,
document these values as well.
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37552
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_053.xhtml#eid85491
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31881
Multiplicity constraints are used to check and maintain data quality.
These constraints are assertions about entity occurrences that can be
applied when the database is updated to determine whether the
updates violate the stated rules of the enterprise. A model that
includes multiplicity constraints more explicitly represents the
semantics of the relationships and results in a better representation of
the data requirements of the enterprise.
Figure 16.2 First-cut ER diagram showing entity and relationship types for the StaffClient user views of
DreamHome.
Check for fan and chasm traps
Having identified the necessary relationships, check that each
relationship in the ER model is a true representation of the “real
world,” and that fan or chasm traps have not been created
inadvertently (see Section 12.7Section 12.7).
463
464
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid32262
Figure 16.2Figure 16.2 shows the first-cut ER diagram for the StaffClient user
views of the DreamHome case study.
Document relationship types
As relationship types are identified, assign them names that are
meaningful and obvious to the user. Also record relationship
descriptions and the multiplicity constraints in the data dictionary.
Figure 16.3Figure 16.3 shows an extract from the data dictionary that
documents the relationships for the StaffClient user views of
DreamHome.
Step 1.3: Identify and associate attributes with entity or relationship types
Objective
To associate attributes with appropriate entity or relationship types.
The next step in the methodology is to identify the types of facts about
the entities and relationships that we have chosen to be represented
in the database. In a
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37007
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37043
similar way to identifying entities, we look for nouns or noun phrases
in the users’ requirements specification. The attributes can be
identified where the noun or noun phrase is a property, quality,
identifier, or characteristic of one of these entities or relationships
(see Section 12.3Section 12.3).
Figure 16.3 Extract from the data dictionary for the StaffClient user views of DreamHome, showing a
description of relationships.
By far the easiest thing to do when we have identified an entity (x) or
a relationship (y) in the requirements specification is to ask “What
information are we required to hold on x or y?” The answer to this
question should be described in the specification. However, in some
cases it may be necessary to ask the users to clarify the requirements.
Unfortunately, they may give answers to this question that also
contain other concepts, so the users’ responses must be carefully
considered.
Simple/composite attributes
It is important to note whether an attribute is simple or composite (see
Section 12.3.1Section 12.3.1). Composite attributes are made up of simple
464
465
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31591
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31634
attributes. For example, the address attribute can be simple and hold
all the details of an address as a single value, such as “115 Dumbarton
Road, Glasgow, G11 6YG.” However, the address attribute may also
represent a composite attribute, made up of simple attributes that
hold the address details as separate values in the attributes street
(“115 Dumbarton Road”), city (“Glasgow”), and postcode (“G11 6YG”).
The option to represent address details as a simple or composite
attribute is determined by the users’ requirements. If the user does
not need to access the separate components of an address, we
represent the address attribute as a simple attribute. On the other
hand, if the user does need to access the individual components of an
address, we represent the address attribute as being composite, made
up of the required simple attributes.
In this step, it is important that we identify all simple attributes to be
represented in the conceptual data model including those attributes
that make up a composite attribute.
Single/multi-valued attributes
In addition to being simple or composite, an attribute can also be
single-valued or multi-valued (see Section 12.3.2Section 12.3.2). Most attributes
encountered will be single-valued, but occasionally a multi-valued
attribute may be encountered; that is, an attribute that holds multiple
values for a single entity occurrence. For example, we may identify
the attribute telNo (the telephone number) of the Client entity as a
multivalued attribute.
On the other hand, client telephone numbers may have been
identified as a separate entity from Client. This is an alternative, and
465
466
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31664
equally valid, way to model this. As you will see in Step 2.1Step 2.1, multi-
valued attributes are mapped to relations anyway, so both approaches
produce the same end result.
Derived attributes
Attributes whose values are based on the values of other attributes
are known as derived attributes (see Section 12.3.3Section 12.3.3). Examples of
derived attributes include:
• the age of a member of staff;
• the number of properties that a member of staff manages;
• the rental deposit (calculated as twice the monthly rent).
Often, these attributes are not represented in the conceptual data
model. However, sometimes the value of the attribute or attributes on
which the derived attribute is based may be deleted or modified. In
this case, the derived attribute must be shown in the data model to
avoid this potential loss of information. However, if a derived
attribute is shown in the model, we must indicate that it is derived.
The representation of derived attributes will be considered during
physical database design. Depending on how an attribute is used, new
values for a derived attribute may be calculated each time it is
accessed or when the value(s) it is derived from changes. However,
this issue is not the concern of conceptual database design, and is
discussed in more detail in Step 3.2Step 3.2 in Chapter 18Chapter 18.
Potential problems
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37897
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31687
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39916
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39671
When identifying the entities, relationships, and attributes for the
user views, it is not uncommon for it to become apparent that one or
more entities, relationships, or attributes have been omitted from the
original selection. In this case, return to the previous steps, document
the new entities, relationships, or attributes, and reexamine any
associated relationships.
As there are generally many more attributes than entities and
relationships, it may be useful to first produce a list of all attributes
given in the users’ requirements specification. As an attribute is
associated with a particular entity or relationship, remove the
attribute from the list. In this way, we ensure that an attribute is
associated with only one entity or relationship type and, when the list
is empty, that all attributes are associated with some entity or
relationship type.
We must also be aware of cases where attributes appear to be
associated with more than one entity or relationship type, as this can
indicate the following:
(1) We have identified several entities that can be represented as a
single entity. For example, we may have identified entities
Assistant and Supervisor both with the attributes staffNo (the staff
number), name, sex, and DOB (date of birth), which can be
represented as a single entity called Staff with the attributes
staffNo (the staff number), name, sex, DOB, and position (with values
Assistant or Supervisor). On the other hand, it may be that these
entities share many attributes but there are also attributes or
relationships that are unique to each entity. In this case, we must
decide whether we want to generalize the entities into a single
entity such as Staff, or leave them as specialized entities
representing distinct staff
roles. The consideration of whether to specialize or generalize
entities was discussed in Chapter 13Chapter 13 and is addressed in more
detail in Step 1.6Step 1.6.
(2) We have identified a relationship between entity types. In this
case, we must associate the attribute with only one entity, the
parent entity, and ensure that the relationship was previously
identified in Step 1.2Step 1.2. If this is not the case, the documentation
should be updated with details of the newly identified
relationship. For example, we may have identified the entities
Staff and PropertyForRent with the following attributes:
Staff staffNo, name, position, sex, DOB
PropertyForRent propertyNo, street, city, postcode,
type, rooms, rent, managerName
The presence of the managerName attribute in PropertyForRent is
intended to represent the relationship Staff Manages
PropertyForRent. In this case, the managerName attribute should be
omitted from PropertyForRent and the relationship Manages should
be added to the model.
DreamHome attributes for entities
466
467
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_022.xhtml#eid32613
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37399
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36940
For the StaffClient user views of DreamHome, we identify and
associate attributes with entities as follows:
Staff staffNo, name (composite: fName, IName),
position, sex, DOB
PropertyForRent propertyNo, address (composite: street,
city, postcode), type, rooms, rent
PrivateOwner ownerNo, name (composite: fName, IName),
address, telNo
BusinessOwner ownerNo, bName, bType, address, telNo,
contactName
Client clientNo, name (composite: fName,
IName), telNo, eMail
Preference prefType, maxRent
Lease leaseNo, paymentMethod, deposit (derived
as PropertyForRent.rent*2), depositPaid,
rentStart, rentFinish, duration (derived
as rentFinish – rentStart)
DreamHome attributes for relationships
Some attributes should not be associated with entities, but instead
should be associated with relationships. For the StaffClient user views
of DreamHome, we identify and associate attributes with
relationships, as follows:
Views viewDate, comment
Document attributes
As attributes are identified, assign them names that are meaningful to
the user. Record the following information for each attribute:
• attribute name and description;
• data type and length;
• any aliases that the attribute is known by;
• whether the attribute is composite and, if so, the simple attributes
that make up the composite attribute;
• whether the attribute is multi-valued;
• whether the attribute is derived and, if so, how it is to be
computed;
• any default value for the attribute.
467
468
Figure 16.4 Extract from the data dictionary for the StaffClient user views of DreamHome showing a
description of attributes.
Figure 16.4Figure 16.4 shows an extract from the data dictionary that
documents the attributes for the StaffClient user views of
DreamHome.
Step 1.4: Determine attribute domains
Objective
To determine domains for the attributes in the conceptual data
model.
The objective of this step is to determine domains for all the attributes
in the model (see Section 12.3Section 12.3). A domaindomain is a pool of values from
which one or more attributes draw their values. For example, we may
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37276
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31591
define:
• the attribute domain of valid staff numbers (staffNo) as being a
five-character variable-length string, with the first two characters
as letters and the next one to three characters as digits in the
range 1–999;
• the possible values for the sex attribute of the Staff entity as being
either “M” or “F.” The domain of this attribute is a single character
string consisting of the values “M” or “F.”
A fully developed data model specifies the domains for each attribute
and includes:
• allowable set of values for the attribute;
• sizes and formats of the attribute.
Further information can be specified for a domain, such as the
allowable operations on an attribute, and which attributes can be
compared with other attributes or used in combination with other
attributes. However, implementing these characteristics of attribute
domains in a DBMS is still the subject of research.
Document attribute domains
As attribute domains are identified, record their names and
characteristics in the data dictionary. Update the data dictionary
entries for attributes to record their domain in place of the data type
and length information.
STEP 1.5: Determine candidate, primary, and alternate key attributes
Objective
To identify the candidate key(s) for each entity type and, if there is
more than one candidate key, to choose one to be the primary key and
the others as alternate keys.
This step is concerned with identifying the candidate key(s) for an
entity and then selecting one to be the primary key (see SectionSection
12.3.412.3.4). A candidate keycandidate key is a minimal set of attributes of an entity
that uniquely identifies each occurrence of that entity. We may
identify more than one candidate key, in which case we must choose
one to be the primary keyprimary key; the remaining candidate keys are called
alternate keysalternate keys.
468
469
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31717
People’s names generally do not make good candidate keys. For
example, we may think that a suitable candidate key for the Staff
entity would be the composite attribute name, the member of staff’s
name. However, it is possible for two people with the same name to
join DreamHome, which would clearly invalidate the choice of name as
a candidate key. We could make a similar argument for the names of
property owners. In such cases, rather than coming up with
combinations of attributes that may provide uniqueness, it may be
better to use an existing attribute that would always ensure
uniqueness, such as the staffNo attribute for the Staff entity and the
ownerNo attribute for the PrivateOwner entity, or define a new attribute
that would provide uniqueness.
When choosing a primary key from among the candidate keys, use the
following guidelines to help make the selection:
• the candidate key with the minimal set of attributes;
• the candidate key that is least likely to have its values changed;
• the candidate key with fewest characters (for those with textual
attribute(s));
• the candidate key with smallest maximum value (for those with
numerical attribute(s));
• the candidate key that is easiest to use from the users’ point of
view.
In the process of identifying primary keys, note whether an entity is
strong or weak. If we are able to assign a primary key to an entity, the
entity is referred to as being strong. On the other hand, if we are
unable to identify a primary key for an entity, the entity is referred to
as being weak (see Section 12.4Section 12.4). The primary key of a weak entity
can be identified only when we map the weak entity and its
relationship with its owner entity to a relation through the placement
of a foreign key in that relation. The process of mapping entities and
their relationships to relations is described in Step 2.1Step 2.1, and therefore
the identification of primary keys for weak entities cannot take place
until that step.
DreamHome primary keys
The primary keys for the StaffClient user views of DreamHome are
shown in Figure 16.5Figure 16.5. Note that the Preference entity is a weak entity
and, as identified previously, the Views relationship has two attributes,
viewDate and comment. 469
470
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31795
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37897
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37389
Figure 16.5 ER diagram for the StaffClient user views of DreamHome with primary keys added.
Document primary and alternate keys
Record the identification of primary and any alternate keys in the data
dictionary.
STEP 1.6: Consider use of enhanced modeling concepts (optional step)
Objective
To consider the use of enhanced modeling concepts, such as
specialization/generalization, aggregation, and composition.
In this step, we have the option to continue the development of the ER
model using the enhanced modeling concepts discussed in ChapterChapter
1313, namely specialization/generalization, aggregation, and
composition. If we select the specialization approach, we attempt to
highlight differences between entities by defining one or more
subclassessubclasses of a superclasssuperclass entity. If we select the generalization
approach, we attempt to identify common features between entities to
define a generalizing superclass entity. We may use aggregation to
represent a ‘has-a’ or ‘is-part-of’ relationship between entity types,
where one represents the ‘whole’ and the other the ‘part’. We may use
composition (a special type of aggregation) to represent an association
between entity types where there is a strong ownership and
coincidental lifetime between the ‘whole’ and the ‘part’.
For the StaffClient user views of DreamHome, we choose to generalize
the two entities PrivateOwner and BusinessOwner to create a superclass
Owner that contains the common attributes ownerNo, address, and telNo.
The relationship that the Owner superclass has with its subclasses is
mandatory and disjoint, denoted as {Mandatory, Or}; each member of
the Owner superclass must be a member of one of the subclasses, but
cannot belong to both.
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_022.xhtml#eid32613
Figure 16.6 Revised ER diagram for the StaffClient user views of DreamHome with specialization/generalization
added.
In addition, we identify one specialization subclass of Staff, namely
Supervisor, specifically to model the Supervises relationship. The
relationship that the Staff superclass has with the Supervisor subclass
is optional: a member of the Staff superclass does not necessarily
have to be a member of the Supervisor subclass. To keep the design
simple, we decide not to use aggregation or composition. The revised
ER diagram for the StaffClient user views of DreamHome is shown in
Figure 16.6Figure 16.6.
470
471
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37429
There are no strict guidelines on when to develop the ER model using
enhanced modeling concepts, as the choice is often subjective and
dependent on the particular characteristics of the situation that is
being modeled. As a useful “rule of thumb” when considering the use
of these concepts, always attempt to represent the important entities
and their relationships as clearly as possible in the ER diagram.
Therefore, the use of enhanced modeling concepts should be guided
by the readability of the ER diagram and the clarity by which it models
the important entities and relationships.
These concepts are associated with enhanced ER modeling. However,
as this step is optional, we simply use the term “ER diagram” when
referring to the diagrammatic representation of data models
throughout the methodology.
STEP 1.7: Check model for redundancy
Objective
To check for the presence of any redundancy in the model.
In this step, we examine the conceptual data model with the specific
objective of identifying whether there is any redundancy present and
removing any that does exist. The three activities in this step are:
471
472
(1) re-examine one-to-one (1:1) relationships;
(2) remove redundant relationships;
(3) consider time dimension.
(1) Re-examine one-to-one (1:1) relationships
In the identification of entities, we may have identified two entities
that represent the same object in the enterprise. For example, we may
have identified the two entities Client and Renter that are actually the
same; in other words, Client is a synonym for Renter. In this case, the
two entities should be merged together. If the primary keys are
different, choose one of them to be the primary key and leave the
other as an alternate key.
(2) Remove redundant relationships
Figure 16.7 Remove the redundant relationship called Rents.
A relationship is redundant if the same information can be obtained
via other relationships. We are trying to develop a minimal data
model and, as redundant relationships are unnecessary, they should
be removed. It is relatively easy to identify whether there is more than
one path between two entities. However, this does not necessarily
imply that one of the relationships is redundant, as they may
represent different associations between the entities. For example,
consider the relationships between the PropertyForRent, Lease, and
Client entities shown in Figure 16.7Figure 16.7. There are two ways to find out
which clients rent which properties. There is the direct route using the
Rents relationship between the Client and PropertyForRent entities, and
there is the indirect route, using the Holds and AssociatedWith
relationships via the Lease entity. Before we can assess whether both
routes are required, we need
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37489
to establish the purpose of each relationship. The Rents relationship
indicates which client rents which property. On the other hand, the
Holds relationship indicates which client holds which lease, and the
AssociatedWith relationship indicates which properties are associated
with which leases. Although it is true that there is a relationship
between clients and the properties they rent, this is not a direct
relationship and the association is more accurately represented
through a lease. The Rents relationship is therefore redundant and
does not convey any additional information about the relationship
between PropertyForRent and Client that cannot more correctly be
found through the Lease entity. To ensure that we create a minimal
model, the redundant Rents relationship must be removed.
(3) Consider time dimension
The time dimension of relationships is important when assessing
redundancy. For example, consider the situation in which we wish to
model the relationships between the entities Man, Woman, and Child, as
illustrated in Figure 16.8Figure 16.8. Clearly, there are two paths between Man
and Child: one via the direct relationship FatherOf and the other via
the relationships MarriedTo and MotherOf. Consequently, we may think
that the relationship FatherOf is unnecessary. However, this would be
incorrect for two reasons:
(1) The father may have children from a previous marriage, and we
are modeling only the father’s current marriage through a 1:1
relationship.
(2) The father and mother may not be married, or the father may be
472
473
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37563
married to someone other than the mother (or the mother may be
married to someone who is not the father).
In either case, the required relationship could not be modeled without
the FatherOf relationship. The message is that it is important to
examine the meaning of each relationship between entities when
assessing redundancy. At the end of this step, we have simplified the
local conceptual data model by removing any inherent redundancy.
STEP 1.8: Validate conceptual data model against user transactions
Objective
To ensure that the conceptual data model supports the required
transactions.
Figure 16.8 Example of a nonredundant relationship FatherOf.
We now have a conceptual data model that represents the data
requirements of the enterprise. The objective of this step is to check
the model to ensure that the model supports the required
transactions. Using the model, we attempt to perform the operations
manually. If we can resolve all transactions in this way, we have
checked that the conceptual data model supports the required
transactions. However, if we are unable to perform a transaction
manually, there must be a problem with the data model, which must
be resolved. In this case, it is likely that we have omitted an entity, a
relationship, or an attribute from the data model.
We examine two possible approaches to ensuring that the conceptual
data model supports the required transactions:
(1) describing the transactions;
(2) using transaction pathways.
Describing the transaction
Using the first approach, we check that all the information (entities,
relationships, and their attributes) required by each transaction is
provided by the model, by documenting a description of each
473
474
transaction’s requirements. We illustrate this approach for an
example DreamHome transaction listed in Appendix AAppendix A from the
StaffClient user views:
Transaction (d): List the details of properties managed byTransaction (d): List the details of properties managed by
a named member of staff at the brancha named member of staff at the branch
The details of properties are held in the PropertyForRent entity and
the details of staff who manage properties are held in the Staff
entity. In this case, we can use the Staff Manages PropertyForRent
relationship to produce the required list.
Using transaction pathways
The second approach to validating the data model against the
required transactions involves diagrammatically representing the
pathway taken by each transaction directly on the ER diagram. An
example of this approach for the query transactions for the StaffClient
user views listed in Appendix AAppendix A is shown in Figure 16.9Figure 16.9. Clearly, the
more transactions that exist, the more complex this diagram would
become, so for readability we may need several such diagrams to
cover all the transactions.
This approach allows the designer to visualize areas of the model that
are not required by transactions and those areas that are critical to
transactions. We are therefore in a position to directly review the
support provided by the data model for the transactions required. If
there are areas of the model that do not appear to be used by any
transactions, we may question the purpose of representing this
information in the data model. On the other hand, if there are areas of
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_051.xhtml#eid84853
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_051.xhtml#eid84853
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37607
the model that are inadequate in providing the correct pathway for a
transaction, we may need to investigate the possibility that critical
entities, relationships, or attributes have been missed.
It may look like a lot of hard work to check every transaction that the
model has to support in this way, and it certainly can be. As a result, it
may be tempting to omit this step. However, it is very important that
these checks are performed now
rather than later, when it is much more difficult and expensive to
resolve any errors in the data model.
Figure 16.9 Using pathways to check whether the conceptual data model supports the user transactions.
STEP 1.9: Review conceptual data model with user
Objective
To review the conceptual data model with the users to ensure that
they consider the model to be a “true” representation of the data
requirements of the enterprise.
474
475
Before completing Step 1Step 1, we review the conceptual data model with
the user. The conceptual data model includes the ER diagram and the
supporting documentation that describes the data model. If any
anomalies are present in the data model, we must make the
appropriate changes, which may require repeating the previous
step(s). We repeat this process until the user is prepared to “sign off”
the model as being a “true” representation of the part of the
enterprise that we are modeling.
The steps in this methodology are summarized in Appendix DAppendix D. The
next chapter describes the steps of the logical database design
methodology.
Chapter Summary
• A design methodologydesign methodology is a structured approach that uses
procedures, techniques, tools, and documentation aids to support
and facilitate the process of design.
• Database design includes three main phases: conceptualconceptual,
logicallogical, and physicalphysical database design.
• Conceptual database designConceptual database design is the process of constructing a
model of the data used in an enterprise, independent of all
physical considerations.
• Conceptual database design begins with the creation of a
conceptual data modelconceptual data model of the enterprise, which is entirely
independent of implementation details such as the target DBMS,
application programs, programming languages, hardware
platform, performance issues, or any other physical
475
476
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_054.xhtml#eid85549
considerations.
• LogicalLogical database design database design is the process of constructing a model
of the data used in an enterprise based on a specific data model
(such as the relational model), but independent of a particular
DBMS and other physical considerations. Logical database design
translates the conceptual data model into a logicallogical datadata model model
of the enterprise.
• Physical database designPhysical database design is the process of producing a
description of the implementation of the database on secondary
storage; it describes the base relations, file organizations, and
indexes used to achieve efficient access to the data, and any
associated integrity constraints and security measures.
• The physical database design phase allows the designer to make
decisions on how the database is to be implemented. Therefore,
physical designphysical design is tailored to a specific DBMS. There is feedback
between physical and conceptual/logical design, because decisions
taken during physical design to improve performance may affect
the structure of the conceptual/logical data model.
• There are several critical factors for the success of the database
design stage, including, for example, working interactively with
users and being willing to repeat steps.
• The main objective of Step 1Step 1 of the methodology is to build a
conceptual data model of the data requirements of the enterprise.
A conceptual data model comprises: entity types, relationship
types, attributes, attribute domains, primary keys, and alternate
keys.
• A conceptual data model is supported by documentation, such as
ER diagrams and a data dictionary, which is produced throughout
the development of the model.
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
• The conceptual data model is validated to ensure it supports the
required transactions. Two possible approaches to ensure that the
conceptual data model supports the required transactions are: (1)
checking that all the information (entities, relationships, and their
attributes) required by each transaction is provided by the model
by documenting a description of each transaction’s requirements;
(2) diagrammatically representing the pathway taken by each
transaction directly on the ER diagram.
Review Questions
16.1 Describe the purpose of a design methodology.
16.2 Describe the main phases involved in database design.
16.3 Identify important factors in the success of database design.
16.4 Discuss the important role played by users in the process of
database design.
16.5 Describe the main objective of conceptual database design.
16.6 Identify the main steps associated with conceptual database
design.