UNIT-I
Definition
of Data Base Management system:
v A Database is a collection of related data organized in a way that data can be easily accessed, managed and updated. Any piece of information can be a data,
For example:
MySql, Oracle, Sybase, Microsoft Access and IBM DB2 etc.
Components of Database System:

Functions of DBMS:
v A Database is a collection of related data organized in a way that data can be easily accessed, managed and updated. Any piece of information can be a data,
For example:
MySql, Oracle, Sybase, Microsoft Access and IBM DB2 etc.
Components of Database System:

Functions of DBMS:
Ø Concurrency Control
Ø Provides Recovery services
Ø Provides Utility services
Ø Provides a clear and logical view of the process that manipulates data
Advantages of DBMS:
v The data is stored at a central location and is shared among multiple users. Thus, the main advantage of DBMS is centralized data management.
CControlled data redundancy:
v During database design, various files are integrated and each logical data item is stored at central location.
EEnforcing data integrity:
v In database approach, enforcing data integrity is much easier. Various integrity constraints are identified bydatabase designer during database design.
DData sharing:
v The data stored in the database can be shared among multiple users or application programs.
EEase of application development:
v The application programmer needs to develop the application programs according to the users’ needs.
DData security:
v Since the data is stored centrally, enforcing security constraints is much easier.
BBackup and recovery:
v The DBMS provides backup and recovery subsystem that is responsible for recovery from hardware and software failures.
Disadvantages of DBMS:
v Complexity
v Costly
v Large in size
DBMS Architecture:
v The DBMS architecture describes how data in the database is viewed by the users.
v This architecture is proposed by ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements Committee) and hence, is also known as ANSI/SPARC architecture.
v The three levels are discussed here.
Internal level:
Conceptual level:
v This level of abstraction deals with the logical structure of the entire database and thus, is also known as logical level.
External level:
v It is the highest level of abstraction that deals with the user’s view of the database and thus, is also known as view level.
Data Independence:
Logical data independence:
v It is the ability to change the conceptual schema without affecting the external schemas or application programs.
Physical data independence:
v It is the ability to change the internal schema without affecting the conceptual or external schema.
Database System Applications:
Database systems are widely used in different areas because of their numerous advantages. Some of the most common database applications are listed here.
Airlines and railways:
v Airlines and railways use online databases for reservation, and for displaying the schedule information.
Banking:
v Banks use databases for customer inquiry, accounts, loans, and other transactions.
Education:
v Schools and colleges use databases for course registration, result, and other information.
Telecommunications:
v Telecommunication departments use databases to store information about the communication network, telephone numbers, record of calls, for generating monthly bills, etc.
Credit card transactions:
v Databases are used for keeping track of purchases on credit cards in order to generate monthly statements.
E-commerce:
v Integration of heterogeneous information sources (for example, catalogs) for business activity such as online shopping, booking of holiday package, consulting a doctor, etc.
Health care information systems and electronic patient record:
v Databases are used for maintaining the patient health care details.
Digital libraries and digital publishing:
v Databases are used for management and delivery of large bodies of textual and multimedia data.
Finance:
v Databases are used for storing information such as sales, purchases of stocks and bonds or data useful for online trading.
Sales:
v Databases are used to store product, customer and transaction details.
Human resources:
v Organizations use databases for storing information about their employees, salaries, benefits, taxes, and for generating salary checks.
Database administrator (DBA):
v DBA is a person who has central control over both data and application programs.
v The responsibilities of DBA vary depending upon the job description and corporate and organization policies.
Schema definition and modification:
v It is the responsibility of the DBA to create the database schema by executing a set of data definition statements in DDL.
New software installation:
v It is the responsibility of the DBA to install new DBMS software, application software, and other related software. After installation, the DBA must test the new software.
Security enforcement and administration:
v DBA is responsible for establishing and monitoring the security of the database system. It involves adding and removing users, auditing, and checking for security problems.
Data analysis:
v DBA is responsible for analyzing the data stored in the database, and studying its performance and efficiency in order to effectively use indexes, parallel query execution, etc.
Physical organization modification:
v The DBA is responsible for carrying out the modifications in the physical organization of the database for better performance.
Structure of Relational databases:
v A relational database consist of a collection of table each table has multiple column and each column has a unique name.
v A row in a table represents a relationship among a set of values.
Basic Structure:
v A table consist of rows and columns in relational databases the column are defined as attributes.
v For each attribute there is a set of permitted values called the domain of that attribute.
v A row or tuple in a table defines the completes information of the entity.
Example:
Student table
s.no sname group age
101 aa bcom 20
Keys:
v A key is value defined by user used to identify an entity uniquely among others entities.
v Relational database supports primary key foreign key candidate key super key
Types of entities:
They are two types of entities
Strong entity set
Weak entity set
Strong entity set:
v A entity set which is having a primary key is called strong entity set.
Weak entity set:
v An entity set which does not name an primary key value is called weak entity set.
Query languages:
v A query language is a language in which a user requests information form the data base.
v Query languages can be categorized either procedural or non procedural.
UNIT-II
DEFINTION OF SQL:
v
Structure Query Language(SQL) is a programming
language used for storing and managing data in RDBMS
v
. SQL was the first commercial language
introduced for E.F Codd's Relational model. Today almost all RDBMS(MySql,
Oracle, Infomix, Sybase, MS Access) uses SQL as the standard database language.
SQL is used to perform all type of data operations in
RDBMS:
DDL : Data Definition Language:
v
All DDL commands are auto-committed. That means
it saves all the changes permanently in the database.
Command Description
Create
to create new table or database
alter for
alteration
truncate delete data from table
drop
to drop a table
rename to
rename a table
DML : Data Manipulation Language:
v
DML commands are not auto-committed. It means
changes are not permanent to database, they can be rolled back.
Command Description
Insert to insert a new row
Update to update existing row
Delete to delete a row
merge merging
two rows or two tables
TCL : Transaction Control Language:
v
These commands are to keep a check on other
commands and their affect on the database.
v
These commands can annul changes made by other
commands by rolling back to original state. It can also make changes permanent.
Command Description
Commit
to permanently save
Rollback to undo change
savepoint to
save temporarily
DCL : Data Control Language:
v
Data control language provides command to grant
and take back authority.
Command
Description
grant grant permission of right
revoke take back permission.
DQL : Data Query Language:
Command
Description
Select
retrieve records from one or more table
SELECT Query:
v
Select query is used to retrieve data from a
tables. It is the most used SQL query. We can retrieve complete tables, or
partial by mentioning conditions using WHERE clause.
Syntax of SELECT Query:
Ø
SELECT column-name1, column-name2, column-name3,
column-nameN from table-name;
Example for SELECT Query
Conside the following Student table
,
S_id S_Name age address
101 Adam
15 Noida
102 Alex 18 Delhi
SQL Constraints:
v
SQl Constraints are rules used to limit the type
of data that can go into a table, to maintain the accuracy and integrity of the
data inside table.
Constraints can be divided into following two types,
Ø
Column level constraints : limits only column
data.
Ø
Table level constraints : limits whole table
data
Constraints are used to make sure that the integrity of data
is maintained in the database. Following are the most used constraints that can
be applied to a table.
Ø
NOT NULL
Ø
UNIQUE
Ø
PRIMARY KEY
Ø
FOREIGN KEY
Ø
CHECK
Ø
DEFAULT
SQL Functions:
v
SQL provides many built-in functions to perform
operations on data. These functions are useful while performing mathematical
calculations, string concatenations, sub-strings etc.
SQL functions are divided into two categories,
Ø
Aggregate Functions
Ø
Scalar Functions
Aggregate Functions:
v
These functions return a single value after
calculating from a group of values. Following are some frequently used
Aggregate functions.
AVG():
v
Average returns average value after calculating
from values in a numeric column.
Its general Syntax is,
Ø
SELECT AVG(column name) from table name
COUNT():
v
Count returns the number of rows present in the
table either based on some condition or without condition.
Its general Syntax is
v
SELECT COUNT(column_name) from table-name
FIRST()
v
First function returns first value of a selected
column
Syntax for FIRST function is
v
SELECT FIRST(column_name) from table-name
LAST()
v
LAST return the return last value from selected
column
Syntax of LAST function is:
v
SELECT LAST(column_name) from table-name
MAX():
v
MAX function returns maximum value from selected
column of the table.
Syntax of MAX function is:
v
SELECT MAX(column name) from table-name
Scalar Functions:
v
Scalar functions return a single value from an
input value. Following are soe frequently used Scalar Functions.
UCASE():
v
UCASE function is used to convert value of
string column to Uppercase character.
Syntax of UCASE:
v
SELECT UCASE(column_name) from table-name
LCASE():
v
LCASE function is used to convert value of
string column to Lowecase character.
Syntax for LCASE is,
v
SELECT LCASE(column_name) from table-name
ROUND():
v
ROUND function is used to round a numeric field
to number of nearest integer. It is used on Decimal point values.
Syntax of Round function is:
Ø
SELECT ROUND(column_name, decimals) from
table-name
Join in SQL:
v
SQL Join is used to fetch data from two or more
tables, which is joined to appear as single set of data.
v
SQL Join is used for combining column from two
or more tables by using values common to both tables. Join Keyword is used in
SQL queries for joining two or more tables.
v
. Minimum required condition for joining table,
is (n-1) where n, is number of tables. A table can also join to itself known
as, Self Join.
Types of Join:
The following are the types of JOIN that we can use in SQL.
v
Inner
v
Outer
v
Left
v
Right
Set Operation in SQL:
v
SQL supports few Set operations to be performed
on table data. These are used to get meaningful results from data, under
different special conditions.
Union:
v
However it will eliminate duplicate rows from
its result set. In case of union, number of columns and data type must be same
in both the tables.
Intersect:
v
Intersect operation is used to combine two
SELECT statements, but it only retuns the records which are common from both
SELECT statements.
Minus
v
Minus operation combines result of two Select
statements and return only those result which belongs to first set of result.
MySQL does not support INTERSECT operator.
v
Keys:
v
A key
consists one or more attributes that helps us to determine other attributes .
v
The role of key is based on a concept known as
determination.
Types of keys:
Ø
Super key
Ø
Candidate key
Ø
Primary key
Ø
Secondary key
Ø
Foreign key
Data types in sql:
Sql supports rich set of data types to store its value sql
supports the following data types.
Types:
Char ,name,
date, varchar2, long, raw, long raw
UNIT-III
Entity Relationship Model:
v
the relationship between you and the phone numbers is 'has a phone
number'. Diagrams created to design these entities and relationships are called
entity–relationship diagrams or ER diagrams.
v
there are three levels of ER models that may be developed.
v
Developing an enterprise-wide conceptual ER model is useful to
support documenting the data
architecture for an organization.
v
The purpose of the conceptual ER model is then to establish
structural metadata commonality for the master
data entities between the set of logical ER models.
v The logical ER model contains more detail than the conceptual ER
model. In addition to master data entities, operational and transactional data
entities are now defined.
v . The details of each data entity are developed and the entity
relationships between these data entities are established.
v The physical model is normally forward engineered to instantiate
the structural metadata into a database management system.
v Relational database objects such as tables, database such as unique key indexes, and database
constraints such as a foreign
key constraint or a commonality constraint..
E-R Diagram:
v ER-Diagram is a visual representation of data that describes how
data is related to each other
Symbols and Notations:
Components of E-R Diagram:
Entity:
v
An Entity can be any object, place, person or class. In E-R
Diagram, an entity is represented using rectangles.
Example:
Attribute
v
An Attribute describes a property or characteristics of an entity.
For example, Name, Age, Address etc can be attributes of a Student. An
attribute is represented using eclipse.
Key Attribute
v
Key attribute represents the main characteristic of an Entity. It
is used to represent Primary key. Ellipse with underlying lines represent Key
Attribute.
Composite Attribute
v
An attribute can also have their own attributes. These attributes
are known as Composite attribute.
Relationship
v
A Relationship describes relations between entities. Relationship
is represented using diamonds.
Ø
There are three types of relationship that exist between Entities.
v
Binary Relationship
v
Recursive Relationship
v
Ternary Relationship
Binary Relationship:
Binary Relationship means relation between two Entities. This is
further divided into three types.
v
One to One : This type of relationship is rarely seen in real world.
v
One to Many : It reflects business rule that one entity is associated with
many number of same entity.
v
Many to Many : The below diagram represents that many students can enroll for
more than one courses
Recursive Relationship:- When an Entity is related with itself it is known as Recursive
Relationship
Ternary Relationship:- Relationship of degree three is called Ternary relationship.
Weak Entity Types:- Weak entity types have no keys. Displayed
by double-rectangular nodes.
Ø Weak entity types typically have partial key for distinguishing their instances.
Strong entity types:- Regular entity types
with keys are sometimes called strong entity types.
Generalization:
v Generalization is a bottom-up approach in which two lower level
entities combine to form a higher level entity.
v In generalization, the higher level entity can also combine with
other lower level entity to make further higher level entity.
Specialization:
v Specialization is opposite to Generalization. It is a top-down
approach in which one higher level entity can be broken down into two lower
level entity.
In specialization, some higher level entities
may not have lower-level entity sets at all.
Aggregation:
v Aggregation is a process when relation
between two entity is treated as a single entity. Here the relation between
Center and Course, is acting as an Entity in relation with Visitor.
Normalization of Database:-
v
Normalization is a systematic approach of decomposing tables to
eliminate data redundancy and undesirable characteristics like Insertion,
Update and Deletion Anamolies.
v
It is a two step process that puts data into tabular form by
removing duplicated data from the
relation tables.
Normalization is used for mainly two purpose.:
v
Eliminating redundant(useless) data.
v
Ensuring data dependencies make sense i.e data is logically
stored.
Updation Anamoly :
v To update address of a
student who occurs twice or more than twice in a table, we will have to update
S_Address column in all the rows, else data will become inconsistent.
Insertion Anamoly :
v Suppose for a new admission,
we have a Student id(S_id), name and address of a student but if student has
not opted for any subjects yet then we have to insert NULL there, leading to
Insertion Anamoly.
Deletion Anamoly :
v If (S_id) 401 has only one
subject and temporarily he drops it, when we delete that row, entire student
record will be deleted along with it.
v
S_id S_Name S_Address
Subject_opted
401 Adam
Noida Bio
402 Alex Panipat Maths
403 Stuart
Jammu Maths
404 Adam Noida
Physics
Normalization Rule:-
v Normalization rule are
divided into following normal form.
v
First Normal Form
v
Second Normal Form
v
Third Normal Form
v
BCNF
First Normal Form (1NF):
v
A row of data cannot contain repeating group of data i.e each
column must have a unique value. Each row of data must have a unique identifier
Primary key.
Example:
S_id S_Name
401 Adam
402 Alex
Second Normal Form (2NF):
v
A table to be normalized to Second Normal Form should meet all the
needs of First Normal Form and there must not be any partial dependency of any
column on primary key.
v
It means that for a table that has concatenated primary key, each
column in the table that is not part of the primary key must depend upon the
entire concatenated key for its existence.
v
If any column depends oly on one part of the concatenated key,
then the table fails Second normal form.
Example:
Customer Table:
customer_id Customer_Name Order_id Order_name
Sale_detail
101 Adam 10 order1 sale1
101 Adam 11 order2 sale2
Third Normal Form (3NF)::
v
Third Normal form applies that every non-prime attribute of table
must be dependent on primary key. The transitive functional dependency should
be removed from the table.
Example:
Student_Detail Table :
Student_id Student_name DOB Street city State Zip
v
The advantage of removing transtive dependency is,
Ø
Amount of data duplication is reduced.
Ø
Data integrity achieved.
Boyce and Codd Normal Form (BCNF):
v
Boyce and Codd Normal Form is a higher version of the Third Normal
form.
v
This form deals with certain type of anamoly that is not handled
by 3NF. A 3NF table which does not have multiple overlapping candidate keys is
said to be in BCNF.
UNIT-IV
QUERY PROCESSING:
v Query processing refers to the range of activites
involved in extracting data from the database.
v The activites include translation of queries in high
level data bases languages in to expressions that can be used at the physical
level of the file system.
The following are the steps involved in processing a query.
MEASURE OF THE QUERY COST:
v The cost of query evaluation can be measured in terms
of a number of different resources including disk accesses in cpu time to
execute a query
Ø The number of seek operations performed
Ø The number of blocks read
Ø The number of blocks written.
SELECTION OPERATION:
FILE SCAN:
v Search algorithms that locate and retrieve records
that a fulfill a selection condition.
v Algorithm (A1)(linear search) scan each file block and
test all records to see whether they satisfy the selection condition.
v Linear search can be applied regardless of
Ø Selection condition or
Ø Ordering of the records in the file or
Ø Availability of indices.
SORTING:
v We may build an index on the relation and then use the
index to read and relation in sorted order may lead to one disk block accesses
each tuple.
v For all relations that fit in the memory techniques
like quick sort can be used for all relations
that do not fit the memory external sorting is good choice.
External sorting:
v Let M denotes the memory size (in pages)
v Create sorted runs let I be o initially repeatedly do
the following till end of the relation.
Ø Read M blocks of relation in the memory.
Ø Sort in the memory blocks.
Ø Write sorted data to run Ri to increment i
JOIN OPERATION:
v Several different algorithms to implement joins
Ø Nested loop join
Ø Block nested loop join
Ø Indexed nested loop join
Ø Merge join
Ø Hash join
Nested loop join:
v In the worst case if there is enough memory only to
hold one block to each relation the estimated cost is (Nr*Bs+Br).
Example of nested loop joins:
Ø Assuming worst case memory availability cost estimate
is with depositor as outer relation
5000*400+100=2000,100
block transfers.
Merge join:
v Sort both relations
on their join attribute.
v Merge the sorted relations to join them
§ Join step in similar to the merge stage of the sort
merge algorthim.
a1
a2 a1 a3
|
Hash join:
v Applicable for equi joins and natural joins
v A hash function h is used to partition tuples of both relations
§ R0,r1,…………………..rn denotes partition of the r tuples.
§ Ro,r1,……………………rn denotes partition of the s tuples.
Query optimization:
v Query optimization s the process of selecting the
most efficient query evaluation plan from among the many strategies.
v Usually possible for processing a given query
specially if the query is complex
v One aspect of optimization occurs at the relational algebra
level where the system attempts to find an expression that is equivalent to the
given expression.
Estimating statistics of expression results:
v The cost of an operation depends on the size and other
statistics of its inputs in order to estimate the cost of the query.
v Some statistics about data base relations are stored
in system catalog the system catalog contains the following information.
Optimization of generalization:
v The refinement from an initial entity set into
successive levels of entity sub groupings represents a top down design process
in which distinctions are made explicit.
v This type of process is called specialization the
design process may also proceed in a
bottom up manner in which multiple entity sets.
v Are grouped into a higher level entity set on the
basis of common features this common can be expressed by generalization.
Example of query generalization:
Ø The number of tuples in the relation (nr)
Ø The number of blocks containing tuples of relation
(br)
Ø The size of a tuple of relation r in bytes (lr)
Ø Statistics about indices such as heights of b+ tree
indices and number of lray pages in the indices.
UNIT-V
TRANSACTION:
v A transaction is a unit program
execution that accesses and possibly updates various data item
v A transaction must see consistent databases.
v During transaction execution the data base may be inconsistent, when the
transaction is committed the data base must be consistent.
v Two main issue to deal with
§
Failures of various kinds such as hardware
failures and system crashes.
§ Concurrent execution of multiple transactions.
ACID PROPERTIES OF TRANSACTION:
ATOMICITY:
v Either all operations of the transaction are properly reflected in the
data base or none
CONSISTENCY:
v Execution of the transaction in isolation preservers the consisitency of
data base.
ISOLATION:
v Although multiple transaction may
execute concurrently each transaction must be unware of other concurrent executing transactions.
Example:
After Ti transaction is finished after tj
transaction started.
DURABLITY:
v After a transaction completes
successfully the changes it has made to the database persist even if they are
system failures.
Example of transaction:
§ Read(a)
§
A:=a-50
§
Write(a)
§
Read(b)
§
B:=b+50
§ Write(b)
TRANSACTION STATE:
ACTIVE:
v The initial state the transaction stays in the state while it is
executing.
PARTIALLY COMMITED:
v After the final statement has been executed.
FAILED:
v After the discovery that normal execution can no longer proceed.
ABORTED:
v After the transaction has rolled
back and the database restored to its state prior to the start of the
transaction.
Two options after it has aborted
Restart
the transaction-only if no internal logical error
Kill the
transaction
COMMITED:
v After successful transaction.
CONCURRENT EEXECUTIONS:
v Multiple transactions are allowed to run concurrently in the system
Advantages:
Increased processor and disk
utilization:
v Leading to better transaction through put one
transaction can be using the cpu while another is reading from writing on the
disk.
Reduced average response time:
v For transactions short transactions need not wait behind long ones.
SERIALIZABILTY:
v Basic assumption-each transaction preservers database
consistency.
v Thus serial execution of a set transactions preservers
database consistency.
v A possibly concurrent schedule is serializable if it
is equivalent to a serial schedule different forms of schedule equivalence give
rise to notations
Ø Conflict serializablity
Ø View serializablity
Conflict serializabilty:
v If a schedule s
can be transformed in to schedule s by a series of swaps in non conflicting
instructions we say that s and s’ are conflict equivalent.
v We say that schedule s in conflict seriazble if it is
conflict equivalent to a serial schedule.
View serializability:
v Let s and s’ be
two schedules with the same set of the transactions s and s’ are view
equivalent