Conceptual Database Design|Information Systems

Conceptual Database Design|Information Systems

The three main phases of database design: conceptual, logical,

and physical design.

• How to decompose the scope of the design into specific views of

the enterprise.

• How to use ER modeling to build a local conceptual data model

based on the information given in a view of the enterprise.

• How to validate the resultant conceptual data model to ensure

that it is a true and accurate representation of a view of the

enterprise.

• How to document the process of conceptual database design.

• End-users play an integral role throughout the process of

conceptual database design.

In Chapter 10Chapter 10 we described the main stages of the database system

development lifecycle, one of which is database designdatabase design. This stage

starts only after a complete analysis of the enterprise’s requirements

has been undertaken.

In this chapter, and Chapters 17Chapters 17–1919, we describe a methodology for

455

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid27876
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37780
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41343
the database design stage of the database system development

lifecycle for relational databases. The methodology is presented as a

step-by-step guide to the three main phases of database design,

namely: conceptual, logical, and physical design (see Figure 10.1Figure 10.1).

The main aim of each phase is as follows:

• Conceptual database designConceptual database design—to build the conceptual

representation of the database, which includes identification of

the important entities, relationships, and attributes.

• Logical database designLogical database design—to translate the conceptual

representation to the logical structure of the database, which

includes designing the relations.

• Physical database designPhysical database design—to decide how the logical structure

is to be physically implemented (as base relations) in the target

DBMS.

Structure of this Chapter

In Section 16.1Section 16.1 we define what a database design methodology is

and review the three phases of database design. In Section 16.2Section 16.2 we

provide an overview of the methodology and briefly describe the

main activities associated with each design phase. In Section 16.3Section 16.3

we focus on the methodology for conceptual database design and

present a detailed description of the steps required to build a

conceptual data model. We use the ER modeling technique described

in Chapters 12Chapters 12 and 1313 to create the conceptual data model. The

conceptual data model described in this chapter is the starting point

for the next phase of database design described in the following

455

456

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28003
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36569
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36642
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36811
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31274
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_022.xhtml#eid32613
chapter.

In Chapter 17Chapter 17 we focus on the methodology for logical database

design for the relational model and present a detailed description of

the steps required to convert a conceptual data model into a logical

data model. This chapter also includes an optional step that describes

how to merge two or more logical data models into a single logical

data model, for those using the view integration approach (see

Section 10.5Section 10.5) to manage the design of a database with multiple user

views. The logical data model described in Chapter 17Chapter 17 is the starting

point for the final phase of database design described in the following

two chapters.

In Chapters 18Chapters 18 and 1919 we complete the database design

methodology by presenting a detailed description of the steps

associated with the production of the physical database design for

relational DBMSs. This part of the methodology illustrates that the

development of the logical data model alone is insufficient to

guarantee the optimum implementation of a database system. For

example, we may have to consider modifying the logical model to

achieve acceptable levels of performance.

Appendix DAppendix D presents a summary of the database design

methodology for those readers who are already familiar with

database design and simply require an overview of the main steps.

Throughout the methodology the terms “entity” and “relationship”

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37780
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28150
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37780
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39671
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41343
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_054.xhtml#eid85549
are used in place of “entity type” and “relationship type” where the

meaning is obvious; “type” is generally added only to avoid

ambiguity. In this chapter we mostly use examples from the

StaffClient user views of the DreamHome case study documented in

Section 11.4Section 11.4 and Appendix AAppendix A.

16.1 Introduction to the Database Design Methodology

Before presenting the methodology, we discuss what a design

methodology represents and describe the three phases of database

design. Finally, we present guidelines for achieving success in

database design.

16.1.1 What Is a Design Methodology?

Design methodology

A structured approach that uses procedures, techniques, tools, and

documentation aids to support and facilitate the process of design.

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_020.xhtml#eid30021
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_051.xhtml#eid84853
A design methodology consists of phases each containing a number of

steps that guide the designer in the techniques appropriate at each

stage of the project. A design methodology also helps the designer to

plan, manage, control, and evaluate database development projects.

Furthermore, it is a structured approach for analyzing and modeling a

set of requirements for a database in a standardized and organized

manner.

16.1.2 Conceptual, Logical, and Physical Database Design

In presenting this database design methodology, the design process is

divided into three main phases: conceptual, logical, and physical

database design.

Conceptual database design

The process of constructing a model of the data used in an enterprise,

independent of all physical considerations.

The conceptual database design phase begins with the creation of a

conceptual data model of the enterprise that is entirely independent

of implementation details such as the target DBMS, application

programs, programming languages, hardware platform, performance

issues, or any other physical considerations.

456

457

Logical database design

The process of constructing a model of the data used in an enterprise

based on a specific data model, but independent of a particular DBMS

and other physical considerations.

The logical database design phase maps the conceptual data model on

to a logical model, which is influenced by the data model for the target

database (for example, the relational model). The logical data model is

a source of information for the physical design phase, providing the

physical database designer with a vehicle for making trade-offs that

are very important to the design of an efficient database.

Physical database design

The process of producing a description of the implementation of the

database on secondary storage; it describes the base relations, file

organizations, and indexes used to achieve efficient access to the

data, and any associated integrity constraints and security measures.

The physical database design phase allows the designer to make

decisions on how the database is to be implemented. Therefore,

physical design is tailored to a specific DBMS. There is feedback

between physical and logical design, because decisions taken during

physical design for improving performance may affect the logical data

model.

16.1.3 Critical Success Factors in Database Design

The following guidelines are often critical to the success of database

design:

• Work interactively with the users as much as possible.

• Follow a structured methodology throughout the data modeling

process.

• Employ a data-driven approach.

• Incorporate structural and integrity considerations into the data

models.

• Combine conceptualization, normalization, and transaction

validation techniques into the data modeling methodology.

• Use diagrams to represent as much of the data models as possible.

• Use a Database Design Language (DBDL) to represent additional

data semantics that cannot easily be represented in a diagram.

• Build a data dictionary to supplement the data model diagrams

and the DBDL.

• Be willing to repeat steps.

These factors are built into the methodology we present for database

design.

16.2 Overview of the Database Design Methodology

In this section, we present an overview of the database design

457

458

methodology. The steps in the methodology are as follows.

Conceptual database design

Step 1 Build conceptual data model

Step 1.1 Identify entity types

Step 1.2 Identify relationship types

Step 1.3 Identify and associate attributes with entity or

relationship types

Step 1.4 Determine attribute domains

Step 1.5 lDetermine candidate, primary, and alternate key

attributes

Step 1.6 Consider use of enhanced modeling concepts (optional

step)

Step 1.7 Check model for redundancy

Step 1.8 Validate conceptual data model against user

transactions

Step 1.9 Review conceptual data model with user

Logical database design for the relational model

Step 2 Build logical data model

Step 2.1 Derive relations for logical data model

Step 2.2 Validate relations using normalization

Step 2.3 Validate relations against user transactions

Step 2.4 Check integrity constraints

Step 2.5 Review logical data model with user

Step 2.6 Merge logical data models into global model (optional

step)

Step 2.7 Check for future growth

Physical database design for relational databases

Step 3 Translate logical data model for target DBMS

Step 3.1 Design base relations

Step 3.2 Design representation of derived data

Step 3.3 Design general constraints

Step 4 Design file organizations and indexes

Step 4.1 Analyze transactions

Step 4.2 Choose file organizations

Step 4.3 Choose indexes

Step 4.4 Estimate disk space requirements

Step 5 Design user views

Step 6 Design security mechanisms

Step 7 Consider the introduction of controlled redundancy

Step 8 Monitor and tune the operational system

This methodology can be used to design relatively simple to highly

complex database systems. Just as the database design stage of the

database systems development lifecycle (see Section 10.6Section 10.6) has three

phases—conceptual, logical, and physical design—so too has the

methodology. Step 1Step 1 creates a conceptual database design, Step 2Step 2

creates a logical database design, and Steps 3Steps 3 to 88 create a physical

database design. Depending on the complexity of the database system

being built, some of the steps may be omitted. For example, Step 2.6Step 2.6

of the methodology is not required for database systems with a single

user view or database systems with multiple user views being

managed using the centralized approach (see Section 10.5Section 10.5). For this

reason, we refer to the creation of a single conceptual data model only

in Step 1Step 1 and single logical data model only in Step 2Step 2. However, if the

database designer is using the view integration approach (see SectionSection

10.510.5) to manage user views for a database system, then Steps 1Steps 1 and 22

458

459

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28243
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39813
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41917
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38623
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28150
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid28150
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
may be repeated as necessary to create the required number of

models, which are then merged in Step 2.6Step 2.6.

In Chapter 10Chapter 10, we introduced the term “local conceptual data model”

or “local logical data model” to refer to the modeling of one or more,

but not all, user views of a database system and the term “global

logical data model” to refer to the modeling of all user views of a

database system. However, the methodology is presented using the

more general terms “conceptual data model” and “logical data model”

with the exception of the optional Step 2.6Step 2.6, which necessitates the use

of the terms local logical data model and global logical data model, as

it is this step that describes the tasks necessary to merge separate local

logical data models to produce a global logical data model.

An important aspect of any design methodology is to ensure that the

models produced are repeatedly validated so that they continue to be

an accurate representation of the part of the enterprise being

modeled. In this methodology the data models are validated in various

ways such as by using normalization (Step 2.2Step 2.2), by ensuring the

critical transactions are supported (Steps 1.8Steps 1.8 and 2.32.3), and by

involving the users as much as possible (Steps 1.9Steps 1.9 and 2.52.5).

The logical model created at the end of Step 2Step 2 is then used as the

source of information for physical database design described in StepsSteps

33 to 88. Again, depending on the complexity of the database systems

being designed and/or the functionality of the target DBMS, some of

the steps of physical database design may be omitted. For example,

Step 4.2Step 4.2 may not be applicable for certain PC-based DBMSs. The steps

of physical database design are described in detail in Chapters 18Chapters 18

and 1919.

459

460

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38623
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_019.xhtml#eid27876
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38623
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38385
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37552
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38428
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37612
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38593
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39813
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41917
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid40488
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39671
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_029.xhtml#eid41343
Database design is an iterative process that has a starting point and an

almost endless procession of refinements. Although the steps of the

methodology are presented here as a procedural process, it must be

emphasized that this does not imply that it should be performed in

this manner. It is likely that knowledge gained in one step may alter

decisions made in a previous step. Similarly, it may be useful to look

briefly at a later step to help with an earlier step. Therefore, the

methodology should act as a framework to help guide the designer

through database design effectively.

To illustrate the database design methodology we use the DreamHome

case study. The DreamHome database has several user views

(Director, Manager, Supervisor, Assistant, and Client) that are

managed using a combination of the centralized and view integration

approaches (see Section 11.4Section 11.4). Applying the centralized approach

resulted in the identification of two collections of user views called

StaffClient user views and Branch user views. The user views

represented by each collection are as follows:

• StaffClient user viewsStaffClient user views—representing Supervisor, Assistant, and

Client user views;

• Branch user viewsBranch user views—representing Director and Manager user

views.

In this chapter, which describes Step 1Step 1 of the methodology, we use the

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_020.xhtml#eid30021
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
StaffClient user views to illustrate the building of a conceptual data

model, and then in the following chapter, which describes Step 2Step 2, we

describe how this model is translated into a logical data model. As the

StaffClient user views represent only a subset of all the user views of

the DreamHome database, it is more correct to refer to the data

models as local data models. However, as stated earlier when we

described the methodology and the worked examples, for simplicity

we use the terms conceptual data model and logical data model until

the optional Step 2.6Step 2.6, which describes the integration of the local

logical data models for the StaffClient user views and the Branch user

views.

16.3 Conceptual Database Design Methodology

This section provides a step-by-step guide for conceptual database

design.

Step 1: Build Conceptual Data Model

Objective

To build a conceptual data model of the data requirements of the

enterprise.

The first step in conceptual database design is to build one (or more)

conceptual data models of the data requirements of the enterprise. A

conceptual data model comprises:

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37834
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid38623
• entity types;

• relationship types;

• attributes and attribute domains;

• primary keys and alternate keys;

• integrity constraints.

The conceptual data model is supported by documentation, including

ER diagrams and a data dictionary, which is produced throughout the

development of the model. We detail the types of supporting

documentation that may be produced as we go through the various

steps. The tasks involved in Step 1Step 1 are:

Step 1.1 Identify entity types

Step 1.2 Identify relationship types

Step 1.3 Identify and associate attributes with entity or relationship

types

Step 1.4 Determine attribute domains

Step 1.5 Determine candidate, primary, and alternate key attributes

Step 1.6 Consider use of enhanced modeling concepts (optional step)

Step 1.7 Check model for redundancy

Step 1.8 Validate conceptual data model against user transactions

Step 1.9 Review conceptual data model with user

Step 1.1: Identify entity types

Objective

To identify the required entity types.

The first step in building a conceptual data model is to determine and

460

461

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
define the main objects that the users are interested in. These objects

are the entity types for the model (see Section 12.1Section 12.1). One method of

identifying entities is to examine the users’ requirements

specification. From this specification, we identify nouns or noun

phrases that are mentioned (for example, staff number, staff name,

property number, property address, rent, number of rooms). We also

look for major objects, such as people, places, or concepts of interest,

excluding those nouns that are merely qualities of other objects. For

example, we could group staff number and staff name with an object

or entity called Staff and group property number, property address,

rent, and number of rooms with an entity called PropertyForRent.

An alternative way of identifying entities is to look for objects that

have an existence in their own right. For example, Staff is an entity

because staff exist whether or not we know their names, positions,

and dates of birth. If possible, the users should assist with this activity.

It is sometimes difficult to identify entities because of the way they are

presented in the users’ requirements specification. Users often talk in

terms of examples or analogies. Instead of talking about staff in

general, users may mention people’s names. In some cases, users talk

in terms of job roles, particularly when people or organizations are

involved. These roles may be job titles or responsibilities, such as

Director, Manager, Supervisor, or Assistant.

To confuse matters further, users frequently use synonyms and

homonyms. Two words are synonyms when they have the same

meaning, for example, “branch” and “office.” Homonyms occur when

the same word can have different meanings depending on the context.

For example, the word “program” has several alternative meanings

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31333
such as a course of study, a series of events, a plan of work, and an

item on the television.

It is not always obvious whether a particular object is an entity, a

relationship, or an attribute. For example, how would we classify

marriage? In fact, depending on the actual requirements, we could

classify marriage as any or all of these. Design is subjective, and

different designers may produce different, but equally valid,

interpretations. The activity therefore relies, to a certain extent, on

judgement and experience. Database designers must take a very

selective view of the world and categorize the things that they observe

within the context of the enterprise. Thus, there may be no unique set

of entity types deducible from a given requirements specification.

However, successive iterations of the design process should lead to the

choice of entities that are at least adequate for the system required.

For the StaffClient user views of DreamHome, we identify the

following entities:

Staff PropertyForRent

PrivateOwner BusinessOwner

Client Preference

461

462

Lease

Document entity types

As entity types are identified, assign them names that are meaningful

and obvious to the user. Record the names and descriptions of entities

in a data dictionary. If possible, document the expected number of

occurrences of each entity. If an entity is known by different names,

the names are referred to as synonyms or aliases, which are also

recorded in the data dictionary. Figure 16.1Figure 16.1 shows an extract from

the data dictionary that documents the entities for the StaffClient user

views of DreamHome.

Step 1.2: Identify relationship types

Objective

To identify the important relationships that exist between the entity

types.

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36951
Figure 16.1 Extract from the data dictionary for the StaffClient user views of DreamHome showing a

description of entities.

Having identified the entities, the next step is to identify all the

relationships that exist between these entities (see Section 12.2Section 12.2).

When we identify entities, one method

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31377
is to look for nouns in the users’ requirements specification. Again, we

can use the grammar of the requirements specification to identify

relationships. Typically, relationships are indicated by verbs or verbal

expressions. For example:

• Staff Manages PropertyForRent

• PrivateOwner Owns PropertyForRent

• PropertyForRent AssociatedWith Lease

The fact that the requirements specification records these

relationships suggests that they are important to the enterprise and

should be included in the model.

We are interested only in required relationships between entities. In

the previous examples, we identified the Staff Manages

PropertyForRent and the PrivateOwner Owns PropertyForRent

relationships. We may also be inclined to include a relationship

between Staff and PrivateOwner (for example, Staff Assists

PrivateOwner). However, although this is a possible relationship, from

the requirements specification, it is not a relationship that we are

interested in modeling.

In most instances, the relationships are binary; in other words, the

relationships exist between exactly two entity types. However, we

should be careful to look out for complex relationships that may

involve more than two entity types (see Section 12.2.1Section 12.2.1) and

recursive relationships that involve only one entity type (see SectionSection

12.2.212.2.2).

462

463

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31466
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31543
Great care must be taken to ensure that all the relationships that are

either explicit or implicit in the users’ requirements specification are

detected. In principle, it should be possible to check each pair of entity

types for a potential relationship between them, but this would be a

daunting task for a large system comprising hundreds of entity types.

On the other hand, it is unwise not to perform some such check, and

the responsibility is often left to the analyst/designer. However,

missing relationships should become apparent when we validate the

model against the transactions that are to be supported (Step 1.8Step 1.8).

Use Entity–Relationship (ER) diagrams

It is often easier to visualize a complex system rather than decipher

long textual descriptions of a users’ requirements specification. We

use ER diagrams to represent entities and how they relate to one

another more easily. Throughout the database design phase, we

recommend that ER diagrams be used whenever necessary to help

build up a picture of the part of the enterprise that we are modeling.

In this book, we use UML, but other notations perform a similar

function (see Appendix CAppendix C).

Determine the multiplicity constraints of relationship types

Having identified the relationships to model, we next determine the

multiplicity of each relationship (see Section 12.6Section 12.6). If specific values

for the multiplicity are known, or even upper or lower limits,

document these values as well.

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37552
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_053.xhtml#eid85491
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31881
Multiplicity constraints are used to check and maintain data quality.

These constraints are assertions about entity occurrences that can be

applied when the database is updated to determine whether the

updates violate the stated rules of the enterprise. A model that

includes multiplicity constraints more explicitly represents the

semantics of the relationships and results in a better representation of

the data requirements of the enterprise.

Figure 16.2 First-cut ER diagram showing entity and relationship types for the StaffClient user views of

DreamHome.

Check for fan and chasm traps

Having identified the necessary relationships, check that each

relationship in the ER model is a true representation of the “real

world,” and that fan or chasm traps have not been created

inadvertently (see Section 12.7Section 12.7).

463

464

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid32262
Figure 16.2Figure 16.2 shows the first-cut ER diagram for the StaffClient user

views of the DreamHome case study.

Document relationship types

As relationship types are identified, assign them names that are

meaningful and obvious to the user. Also record relationship

descriptions and the multiplicity constraints in the data dictionary.

Figure 16.3Figure 16.3 shows an extract from the data dictionary that

documents the relationships for the StaffClient user views of

DreamHome.

Step 1.3: Identify and associate attributes with entity or relationship types

Objective

To associate attributes with appropriate entity or relationship types.

The next step in the methodology is to identify the types of facts about

the entities and relationships that we have chosen to be represented

in the database. In a

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37007
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37043
similar way to identifying entities, we look for nouns or noun phrases

in the users’ requirements specification. The attributes can be

identified where the noun or noun phrase is a property, quality,

identifier, or characteristic of one of these entities or relationships

(see Section 12.3Section 12.3).

Figure 16.3 Extract from the data dictionary for the StaffClient user views of DreamHome, showing a

description of relationships.

By far the easiest thing to do when we have identified an entity (x) or

a relationship (y) in the requirements specification is to ask “What

information are we required to hold on x or y?” The answer to this

question should be described in the specification. However, in some

cases it may be necessary to ask the users to clarify the requirements.

Unfortunately, they may give answers to this question that also

contain other concepts, so the users’ responses must be carefully

considered.

Simple/composite attributes

It is important to note whether an attribute is simple or composite (see

Section 12.3.1Section 12.3.1). Composite attributes are made up of simple

464

465

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31591
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31634
attributes. For example, the address attribute can be simple and hold

all the details of an address as a single value, such as “115 Dumbarton

Road, Glasgow, G11 6YG.” However, the address attribute may also

represent a composite attribute, made up of simple attributes that

hold the address details as separate values in the attributes street

(“115 Dumbarton Road”), city (“Glasgow”), and postcode (“G11 6YG”).

The option to represent address details as a simple or composite

attribute is determined by the users’ requirements. If the user does

not need to access the separate components of an address, we

represent the address attribute as a simple attribute. On the other

hand, if the user does need to access the individual components of an

address, we represent the address attribute as being composite, made

up of the required simple attributes.

In this step, it is important that we identify all simple attributes to be

represented in the conceptual data model including those attributes

that make up a composite attribute.

Single/multi-valued attributes

In addition to being simple or composite, an attribute can also be

single-valued or multi-valued (see Section 12.3.2Section 12.3.2). Most attributes

encountered will be single-valued, but occasionally a multi-valued

attribute may be encountered; that is, an attribute that holds multiple

values for a single entity occurrence. For example, we may identify

the attribute telNo (the telephone number) of the Client entity as a

multivalued attribute.

On the other hand, client telephone numbers may have been

identified as a separate entity from Client. This is an alternative, and

465

466

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31664
equally valid, way to model this. As you will see in Step 2.1Step 2.1, multi-

valued attributes are mapped to relations anyway, so both approaches

produce the same end result.

Derived attributes

Attributes whose values are based on the values of other attributes

are known as derived attributes (see Section 12.3.3Section 12.3.3). Examples of

derived attributes include:

• the age of a member of staff;

• the number of properties that a member of staff manages;

• the rental deposit (calculated as twice the monthly rent).

Often, these attributes are not represented in the conceptual data

model. However, sometimes the value of the attribute or attributes on

which the derived attribute is based may be deleted or modified. In

this case, the derived attribute must be shown in the data model to

avoid this potential loss of information. However, if a derived

attribute is shown in the model, we must indicate that it is derived.

The representation of derived attributes will be considered during

physical database design. Depending on how an attribute is used, new

values for a derived attribute may be calculated each time it is

accessed or when the value(s) it is derived from changes. However,

this issue is not the concern of conceptual database design, and is

discussed in more detail in Step 3.2Step 3.2 in Chapter 18Chapter 18.

Potential problems

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37897
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31687
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39916
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_028.xhtml#eid39671
When identifying the entities, relationships, and attributes for the

user views, it is not uncommon for it to become apparent that one or

more entities, relationships, or attributes have been omitted from the

original selection. In this case, return to the previous steps, document

the new entities, relationships, or attributes, and reexamine any

associated relationships.

As there are generally many more attributes than entities and

relationships, it may be useful to first produce a list of all attributes

given in the users’ requirements specification. As an attribute is

associated with a particular entity or relationship, remove the

attribute from the list. In this way, we ensure that an attribute is

associated with only one entity or relationship type and, when the list

is empty, that all attributes are associated with some entity or

relationship type.

We must also be aware of cases where attributes appear to be

associated with more than one entity or relationship type, as this can

indicate the following:

(1) We have identified several entities that can be represented as a

single entity. For example, we may have identified entities

Assistant and Supervisor both with the attributes staffNo (the staff

number), name, sex, and DOB (date of birth), which can be

represented as a single entity called Staff with the attributes

staffNo (the staff number), name, sex, DOB, and position (with values

Assistant or Supervisor). On the other hand, it may be that these

entities share many attributes but there are also attributes or

relationships that are unique to each entity. In this case, we must

decide whether we want to generalize the entities into a single

entity such as Staff, or leave them as specialized entities

representing distinct staff

roles. The consideration of whether to specialize or generalize

entities was discussed in Chapter 13Chapter 13 and is addressed in more

detail in Step 1.6Step 1.6.

(2) We have identified a relationship between entity types. In this

case, we must associate the attribute with only one entity, the

parent entity, and ensure that the relationship was previously

identified in Step 1.2Step 1.2. If this is not the case, the documentation

should be updated with details of the newly identified

relationship. For example, we may have identified the entities

Staff and PropertyForRent with the following attributes:

Staff staffNo, name, position, sex, DOB

PropertyForRent propertyNo, street, city, postcode,

type, rooms, rent, managerName

The presence of the managerName attribute in PropertyForRent is

intended to represent the relationship Staff Manages

PropertyForRent. In this case, the managerName attribute should be

omitted from PropertyForRent and the relationship Manages should

be added to the model.

DreamHome attributes for entities

466

467

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_022.xhtml#eid32613
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37399
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36940
For the StaffClient user views of DreamHome, we identify and

associate attributes with entities as follows:

Staff staffNo, name (composite: fName, IName),

position, sex, DOB

PropertyForRent propertyNo, address (composite: street,

city, postcode), type, rooms, rent

PrivateOwner ownerNo, name (composite: fName, IName),

address, telNo

BusinessOwner ownerNo, bName, bType, address, telNo,

contactName

Client clientNo, name (composite: fName,

IName), telNo, eMail

Preference prefType, maxRent

Lease leaseNo, paymentMethod, deposit (derived

as PropertyForRent.rent*2), depositPaid,

rentStart, rentFinish, duration (derived

as rentFinish – rentStart)

DreamHome attributes for relationships

Some attributes should not be associated with entities, but instead

should be associated with relationships. For the StaffClient user views

of DreamHome, we identify and associate attributes with

relationships, as follows:

Views viewDate, comment

Document attributes

As attributes are identified, assign them names that are meaningful to

the user. Record the following information for each attribute:

• attribute name and description;

• data type and length;

• any aliases that the attribute is known by;

• whether the attribute is composite and, if so, the simple attributes

that make up the composite attribute;

• whether the attribute is multi-valued;

• whether the attribute is derived and, if so, how it is to be

computed;

• any default value for the attribute.

467

468

Figure 16.4 Extract from the data dictionary for the StaffClient user views of DreamHome showing a

description of attributes.

Figure 16.4Figure 16.4 shows an extract from the data dictionary that

documents the attributes for the StaffClient user views of

DreamHome.

Step 1.4: Determine attribute domains

Objective

To determine domains for the attributes in the conceptual data

model.

The objective of this step is to determine domains for all the attributes

in the model (see Section 12.3Section 12.3). A domaindomain is a pool of values from

which one or more attributes draw their values. For example, we may

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37276
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31591
define:

• the attribute domain of valid staff numbers (staffNo) as being a

five-character variable-length string, with the first two characters

as letters and the next one to three characters as digits in the

range 1–999;

• the possible values for the sex attribute of the Staff entity as being

either “M” or “F.” The domain of this attribute is a single character

string consisting of the values “M” or “F.”

A fully developed data model specifies the domains for each attribute

and includes:

• allowable set of values for the attribute;

• sizes and formats of the attribute.

Further information can be specified for a domain, such as the

allowable operations on an attribute, and which attributes can be

compared with other attributes or used in combination with other

attributes. However, implementing these characteristics of attribute

domains in a DBMS is still the subject of research.

Document attribute domains

As attribute domains are identified, record their names and

characteristics in the data dictionary. Update the data dictionary

entries for attributes to record their domain in place of the data type

and length information.

STEP 1.5: Determine candidate, primary, and alternate key attributes

Objective

To identify the candidate key(s) for each entity type and, if there is

more than one candidate key, to choose one to be the primary key and

the others as alternate keys.

This step is concerned with identifying the candidate key(s) for an

entity and then selecting one to be the primary key (see SectionSection

12.3.412.3.4). A candidate keycandidate key is a minimal set of attributes of an entity

that uniquely identifies each occurrence of that entity. We may

identify more than one candidate key, in which case we must choose

one to be the primary keyprimary key; the remaining candidate keys are called

alternate keysalternate keys.

468

469

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31717
People’s names generally do not make good candidate keys. For

example, we may think that a suitable candidate key for the Staff

entity would be the composite attribute name, the member of staff’s

name. However, it is possible for two people with the same name to

join DreamHome, which would clearly invalidate the choice of name as

a candidate key. We could make a similar argument for the names of

property owners. In such cases, rather than coming up with

combinations of attributes that may provide uniqueness, it may be

better to use an existing attribute that would always ensure

uniqueness, such as the staffNo attribute for the Staff entity and the

ownerNo attribute for the PrivateOwner entity, or define a new attribute

that would provide uniqueness.

When choosing a primary key from among the candidate keys, use the

following guidelines to help make the selection:

• the candidate key with the minimal set of attributes;

• the candidate key that is least likely to have its values changed;

• the candidate key with fewest characters (for those with textual

attribute(s));

• the candidate key with smallest maximum value (for those with

numerical attribute(s));

• the candidate key that is easiest to use from the users’ point of

view.

In the process of identifying primary keys, note whether an entity is

strong or weak. If we are able to assign a primary key to an entity, the

entity is referred to as being strong. On the other hand, if we are

unable to identify a primary key for an entity, the entity is referred to

as being weak (see Section 12.4Section 12.4). The primary key of a weak entity

can be identified only when we map the weak entity and its

relationship with its owner entity to a relation through the placement

of a foreign key in that relation. The process of mapping entities and

their relationships to relations is described in Step 2.1Step 2.1, and therefore

the identification of primary keys for weak entities cannot take place

until that step.

DreamHome primary keys

The primary keys for the StaffClient user views of DreamHome are

shown in Figure 16.5Figure 16.5. Note that the Preference entity is a weak entity

and, as identified previously, the Views relationship has two attributes,

viewDate and comment. 469

470

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_021.xhtml#eid31795
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_027.xhtml#eid37897
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37389
Figure 16.5 ER diagram for the StaffClient user views of DreamHome with primary keys added.

Document primary and alternate keys

Record the identification of primary and any alternate keys in the data

dictionary.

STEP 1.6: Consider use of enhanced modeling concepts (optional step)

Objective

To consider the use of enhanced modeling concepts, such as

specialization/generalization, aggregation, and composition.

In this step, we have the option to continue the development of the ER

model using the enhanced modeling concepts discussed in ChapterChapter

1313, namely specialization/generalization, aggregation, and

composition. If we select the specialization approach, we attempt to

highlight differences between entities by defining one or more

subclassessubclasses of a superclasssuperclass entity. If we select the generalization

approach, we attempt to identify common features between entities to

define a generalizing superclass entity. We may use aggregation to

represent a ‘has-a’ or ‘is-part-of’ relationship between entity types,

where one represents the ‘whole’ and the other the ‘part’. We may use

composition (a special type of aggregation) to represent an association

between entity types where there is a strong ownership and

coincidental lifetime between the ‘whole’ and the ‘part’.

For the StaffClient user views of DreamHome, we choose to generalize

the two entities PrivateOwner and BusinessOwner to create a superclass

Owner that contains the common attributes ownerNo, address, and telNo.

The relationship that the Owner superclass has with its subclasses is

mandatory and disjoint, denoted as {Mandatory, Or}; each member of

the Owner superclass must be a member of one of the subclasses, but

cannot belong to both.

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_022.xhtml#eid32613
Figure 16.6 Revised ER diagram for the StaffClient user views of DreamHome with specialization/generalization

added.

In addition, we identify one specialization subclass of Staff, namely

Supervisor, specifically to model the Supervises relationship. The

relationship that the Staff superclass has with the Supervisor subclass

is optional: a member of the Staff superclass does not necessarily

have to be a member of the Supervisor subclass. To keep the design

simple, we decide not to use aggregation or composition. The revised

ER diagram for the StaffClient user views of DreamHome is shown in

Figure 16.6Figure 16.6.

470

471

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37429
There are no strict guidelines on when to develop the ER model using

enhanced modeling concepts, as the choice is often subjective and

dependent on the particular characteristics of the situation that is

being modeled. As a useful “rule of thumb” when considering the use

of these concepts, always attempt to represent the important entities

and their relationships as clearly as possible in the ER diagram.

Therefore, the use of enhanced modeling concepts should be guided

by the readability of the ER diagram and the clarity by which it models

the important entities and relationships.

These concepts are associated with enhanced ER modeling. However,

as this step is optional, we simply use the term “ER diagram” when

referring to the diagrammatic representation of data models

throughout the methodology.

STEP 1.7: Check model for redundancy

Objective

To check for the presence of any redundancy in the model.

In this step, we examine the conceptual data model with the specific

objective of identifying whether there is any redundancy present and

removing any that does exist. The three activities in this step are:

471

472

(1) re-examine one-to-one (1:1) relationships;

(2) remove redundant relationships;

(3) consider time dimension.

(1) Re-examine one-to-one (1:1) relationships

In the identification of entities, we may have identified two entities

that represent the same object in the enterprise. For example, we may

have identified the two entities Client and Renter that are actually the

same; in other words, Client is a synonym for Renter. In this case, the

two entities should be merged together. If the primary keys are

different, choose one of them to be the primary key and leave the

other as an alternate key.

(2) Remove redundant relationships

Figure 16.7 Remove the redundant relationship called Rents.

A relationship is redundant if the same information can be obtained

via other relationships. We are trying to develop a minimal data

model and, as redundant relationships are unnecessary, they should

be removed. It is relatively easy to identify whether there is more than

one path between two entities. However, this does not necessarily

imply that one of the relationships is redundant, as they may

represent different associations between the entities. For example,

consider the relationships between the PropertyForRent, Lease, and

Client entities shown in Figure 16.7Figure 16.7. There are two ways to find out

which clients rent which properties. There is the direct route using the

Rents relationship between the Client and PropertyForRent entities, and

there is the indirect route, using the Holds and AssociatedWith

relationships via the Lease entity. Before we can assess whether both

routes are required, we need

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37489
to establish the purpose of each relationship. The Rents relationship

indicates which client rents which property. On the other hand, the

Holds relationship indicates which client holds which lease, and the

AssociatedWith relationship indicates which properties are associated

with which leases. Although it is true that there is a relationship

between clients and the properties they rent, this is not a direct

relationship and the association is more accurately represented

through a lease. The Rents relationship is therefore redundant and

does not convey any additional information about the relationship

between PropertyForRent and Client that cannot more correctly be

found through the Lease entity. To ensure that we create a minimal

model, the redundant Rents relationship must be removed.

(3) Consider time dimension

The time dimension of relationships is important when assessing

redundancy. For example, consider the situation in which we wish to

model the relationships between the entities Man, Woman, and Child, as

illustrated in Figure 16.8Figure 16.8. Clearly, there are two paths between Man

and Child: one via the direct relationship FatherOf and the other via

the relationships MarriedTo and MotherOf. Consequently, we may think

that the relationship FatherOf is unnecessary. However, this would be

incorrect for two reasons:

(1) The father may have children from a previous marriage, and we

are modeling only the father’s current marriage through a 1:1

relationship.

(2) The father and mother may not be married, or the father may be

472

473

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37563
married to someone other than the mother (or the mother may be

married to someone who is not the father).

In either case, the required relationship could not be modeled without

the FatherOf relationship. The message is that it is important to

examine the meaning of each relationship between entities when

assessing redundancy. At the end of this step, we have simplified the

local conceptual data model by removing any inherent redundancy.

STEP 1.8: Validate conceptual data model against user transactions

Objective

To ensure that the conceptual data model supports the required

transactions.

Figure 16.8 Example of a nonredundant relationship FatherOf.

We now have a conceptual data model that represents the data

requirements of the enterprise. The objective of this step is to check

the model to ensure that the model supports the required

transactions. Using the model, we attempt to perform the operations

manually. If we can resolve all transactions in this way, we have

checked that the conceptual data model supports the required

transactions. However, if we are unable to perform a transaction

manually, there must be a problem with the data model, which must

be resolved. In this case, it is likely that we have omitted an entity, a

relationship, or an attribute from the data model.

We examine two possible approaches to ensuring that the conceptual

data model supports the required transactions:

(1) describing the transactions;

(2) using transaction pathways.

Describing the transaction

Using the first approach, we check that all the information (entities,

relationships, and their attributes) required by each transaction is

provided by the model, by documenting a description of each

473

474

transaction’s requirements. We illustrate this approach for an

example DreamHome transaction listed in Appendix AAppendix A from the

StaffClient user views:

Transaction (d): List the details of properties managed byTransaction (d): List the details of properties managed by

a named member of staff at the brancha named member of staff at the branch

The details of properties are held in the PropertyForRent entity and

the details of staff who manage properties are held in the Staff

entity. In this case, we can use the Staff Manages PropertyForRent

relationship to produce the required list.

Using transaction pathways

The second approach to validating the data model against the

required transactions involves diagrammatically representing the

pathway taken by each transaction directly on the ER diagram. An

example of this approach for the query transactions for the StaffClient

user views listed in Appendix AAppendix A is shown in Figure 16.9Figure 16.9. Clearly, the

more transactions that exist, the more complex this diagram would

become, so for readability we may need several such diagrams to

cover all the transactions.

This approach allows the designer to visualize areas of the model that

are not required by transactions and those areas that are critical to

transactions. We are therefore in a position to directly review the

support provided by the data model for the transactions required. If

there are areas of the model that do not appear to be used by any

transactions, we may question the purpose of representing this

information in the data model. On the other hand, if there are areas of

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_051.xhtml#eid84853
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_051.xhtml#eid84853
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid37607
the model that are inadequate in providing the correct pathway for a

transaction, we may need to investigate the possibility that critical

entities, relationships, or attributes have been missed.

It may look like a lot of hard work to check every transaction that the

model has to support in this way, and it certainly can be. As a result, it

may be tempting to omit this step. However, it is very important that

these checks are performed now

rather than later, when it is much more difficult and expensive to

resolve any errors in the data model.

Figure 16.9 Using pathways to check whether the conceptual data model supports the user transactions.

STEP 1.9: Review conceptual data model with user

Objective

To review the conceptual data model with the users to ensure that

they consider the model to be a “true” representation of the data

requirements of the enterprise.

474

475

Before completing Step 1Step 1, we review the conceptual data model with

the user. The conceptual data model includes the ER diagram and the

supporting documentation that describes the data model. If any

anomalies are present in the data model, we must make the

appropriate changes, which may require repeating the previous

step(s). We repeat this process until the user is prepared to “sign off”

the model as being a “true” representation of the part of the

enterprise that we are modeling.

The steps in this methodology are summarized in Appendix DAppendix D. The

next chapter describes the steps of the logical database design

methodology.

Chapter Summary

• A design methodologydesign methodology is a structured approach that uses

procedures, techniques, tools, and documentation aids to support

and facilitate the process of design.

• Database design includes three main phases: conceptualconceptual,

logicallogical, and physicalphysical database design.

• Conceptual database designConceptual database design is the process of constructing a

model of the data used in an enterprise, independent of all

physical considerations.

• Conceptual database design begins with the creation of a

conceptual data modelconceptual data model of the enterprise, which is entirely

independent of implementation details such as the target DBMS,

application programs, programming languages, hardware

platform, performance issues, or any other physical

475

476

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_054.xhtml#eid85549
considerations.

• LogicalLogical database design database design is the process of constructing a model

of the data used in an enterprise based on a specific data model

(such as the relational model), but independent of a particular

DBMS and other physical considerations. Logical database design

translates the conceptual data model into a logicallogical datadata model model

of the enterprise.

• Physical database designPhysical database design is the process of producing a

description of the implementation of the database on secondary

storage; it describes the base relations, file organizations, and

indexes used to achieve efficient access to the data, and any

associated integrity constraints and security measures.

• The physical database design phase allows the designer to make

decisions on how the database is to be implemented. Therefore,

physical designphysical design is tailored to a specific DBMS. There is feedback

between physical and conceptual/logical design, because decisions

taken during physical design to improve performance may affect

the structure of the conceptual/logical data model.

• There are several critical factors for the success of the database

design stage, including, for example, working interactively with

users and being willing to repeat steps.

• The main objective of Step 1Step 1 of the methodology is to build a

conceptual data model of the data requirements of the enterprise.

A conceptual data model comprises: entity types, relationship

types, attributes, attribute domains, primary keys, and alternate

keys.

• A conceptual data model is supported by documentation, such as

ER diagrams and a data dictionary, which is produced throughout

the development of the model.

epub://k27oy2ap24fbg1fg0erl.1.vbk/OPS/loc_026.xhtml#eid36817
• The conceptual data model is validated to ensure it supports the

required transactions. Two possible approaches to ensure that the

conceptual data model supports the required transactions are: (1)

checking that all the information (entities, relationships, and their

attributes) required by each transaction is provided by the model

by documenting a description of each transaction’s requirements;

(2) diagrammatically representing the pathway taken by each

transaction directly on the ER diagram.

Review Questions

16.1 Describe the purpose of a design methodology.

16.2 Describe the main phases involved in database design.

16.3 Identify important factors in the success of database design.

16.4 Discuss the important role played by users in the process of

database design.

16.5 Describe the main objective of conceptual database design.

16.6 Identify the main steps associated with conceptual database

design.

Order from us and get better grades. We are the service you have been looking for.