UNIT-I

Definition of Data Base Management  system:

v  A Database is a collection of related data organized in a way that data can be easily accessed, managed and updated. Any piece of information can be a data,

For example:

MySql, Oracle, Sybase, Microsoft Access and IBM DB2 etc.

Components of Database System:


 Functions of DBMS:

Ø  Provides data Independence
Ø  Concurrency Control
Ø  Provides Recovery services
Ø  Provides Utility services

Ø  Provides a clear and logical view of the process that manipulates data

     Advantages of DBMS:

v  The data is stored at a central location and is shared among multiple users. Thus, the main advantage of DBMS is centralized data management.

CControlled data redundancy: 

v  During database design, various files are integrated and each logical data item is stored at central location.

EEnforcing data integrity:

v  In database approach, enforcing data integrity is much easier. Various integrity constraints are identified bydatabase designer during database design.

DData sharing:

v  The data stored in the database can be shared among multiple users or application programs.

EEase of application development:

v  The application programmer needs to develop the application programs according to the users’ needs.

DData security:

v  Since the data is stored centrally, enforcing security constraints is much easier.


BBackup and recovery:

v  The DBMS provides backup and recovery subsystem that is responsible for recovery from hardware and software failures.

Disadvantages of DBMS:

v  Complexity
v  Costly
v  Large in size

DBMS Architecture:

v  The DBMS architecture describes how data in the database is viewed by the users.

v  This architecture is proposed by ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements Committee) and hence, is also known as ANSI/SPARC architecture.

v  The three levels are discussed here.
 Internal level:
v  It is the lowest level of data abstraction that deals with the physical representation of the database on the computer and thus, is also known as physical level.

Conceptual level:

v  This level of abstraction deals with the logical structure of the entire database and thus, is also known as logical level.

External level:

v  It is the highest level of abstraction that deals with the user’s view of the database and thus, is also known as view level.

Data Independence:

Logical data independence:

v  It is the ability to change the conceptual schema without affecting the external schemas or application programs.

Physical data independence:

v  It is the ability to change the internal schema without affecting the conceptual or external schema.

Database System Applications:


Database systems are widely used in different areas because of their numerous advantages. Some of the most common database applications are listed here.

Airlines and railways:

v  Airlines and railways use online databases for reservation, and for displaying the schedule information.

Banking:

v  Banks use databases for customer inquiry, accounts, loans, and other transactions.

Education: 

v  Schools and colleges use databases for course registration, result, and other information.


Telecommunications:

v  Telecommunication departments use databases to store information about the communication network, telephone numbers, record of calls, for generating monthly bills, etc.

Credit card transactions:

v  Databases are used for keeping track of purchases on credit cards in order to generate monthly statements.

 E-commerce:

v  Integration of heterogeneous information sources (for example, catalogs) for business activity such as online shopping, booking of holiday package, consulting a doctor, etc.

Health care information systems and electronic patient record: 

v  Databases are used for maintaining the patient health care details.

Digital libraries and digital publishing:

v  Databases are used for management and delivery of large bodies of textual and multimedia data.

Finance: 

v  Databases are used for storing information such as sales, purchases of stocks and bonds or data useful for online trading.

Sales:

v  Databases are used to store product, customer and transaction details.

Human resources:

v  Organizations use databases for storing information about their employees, salaries, benefits, taxes, and for generating salary checks.

Database administrator (DBA):


v  DBA is a person who has central control over both data and application programs.

v  The responsibilities of DBA vary depending upon the job description and corporate and organization policies.

Schema definition and modification: 

v  It is the responsibility of the DBA to create the database schema by executing a set of data definition statements in DDL.

New software installation: 

v  It is the responsibility of the DBA to install new DBMS software, application software, and other related software. After installation, the DBA must test the new software.

Security enforcement and administration: 

v  DBA is responsible for establishing and monitoring the security of the database system. It involves adding and removing users, auditing, and checking for security problems.

Data analysis:

v  DBA is responsible for analyzing the data stored in the database, and studying its performance and efficiency in order to effectively use indexes, parallel query execution, etc.

Physical organization modification: 

v  The DBA is responsible for carrying out the modifications in the physical organization of the database for better performance.

Structure of Relational databases:

v  A relational database consist of a collection of table each table has multiple column and each column has a unique name.

v  A row in a table represents a relationship among a set of values.

Basic Structure:

v  A table consist of rows and columns in relational databases the column are defined as attributes.

v  For each attribute there is a set of permitted values called the domain of that attribute.

v  A row or tuple in a table defines the completes information of the entity.

Example:

            Student table

s.no                sname                        group                         age

101                 aa                              bcom                          20      


Keys:

v  A key is value defined by user used to identify an entity uniquely among others entities.

v  Relational database supports primary key foreign key candidate key super key

Types of entities:

They are two types of entities
                        Strong entity set
                        Weak entity set

Strong entity set:

v  A entity set which is having a primary key is called strong entity set.

Weak entity set:

v  An entity set which does not name an primary key value is called weak entity set.

Query languages:

v  A query language is a language in which a user requests information form the data base.
v  Query languages can be categorized either procedural or non procedural.







UNIT-II




DEFINTION OF SQL:

v  Structure Query Language(SQL) is a programming language used for storing and managing data in RDBMS

v  . SQL was the first commercial language introduced for E.F Codd's Relational model. Today almost all RDBMS(MySql, Oracle, Infomix, Sybase, MS Access) uses SQL as the standard database language.

SQL is used to perform all type of data operations in RDBMS:

DDL : Data Definition Language:

v  All DDL commands are auto-committed. That means it saves all the changes permanently in the database.

Command                                                                                       Description

Create                                                                                  to create new table or database
alter                                                                                                       for alteration
truncate                                                                                     delete data from table
drop                                                                                                      to drop a table
rename                                                                                                to rename a table

DML : Data Manipulation Language:

v  DML commands are not auto-committed. It means changes are not permanent to database, they can be rolled back.
Command                                                                                           Description
Insert                                                                                           to insert a new row
Update                                                                                       to update existing row
Delete                                                                                           to delete a row
merge                                                                                   merging two rows or two tables


TCL : Transaction Control Language:

v  These commands are to keep a check on other commands and their affect on the database.

v  These commands can annul changes made by other commands by rolling back to original state. It can also make changes permanent.

Command                                                                                                Description

Commit                                                                                                 to permanently save
Rollback                                                                                                   to undo change
savepoint                                                                                            to save temporarily

DCL : Data Control Language:

v  Data control language provides command to grant and take back authority.

Command                                                                                           Description

grant                                                                                          grant permission of right
revoke                                                                                          take back permission.

DQL : Data Query Language:

Command                                                                                            Description

Select                                                                    retrieve records from one or more table

SELECT Query:

v  Select query is used to retrieve data from a tables. It is the most used SQL query. We can retrieve complete tables, or partial by mentioning conditions using WHERE clause.

Syntax of SELECT Query:

Ø  SELECT column-name1, column-name2, column-name3, column-nameN from table-name;

Example for SELECT Query

Conside the following Student table
,
S_id                       S_Name                               age                                        address
101                            Adam                                 15                                          Noida
102                            Alex                                  18                                             Delhi

SQL Constraints:

v  SQl Constraints are rules used to limit the type of data that can go into a table, to maintain the accuracy and integrity of the data inside table.

Constraints can be divided into following two types,

Ø  Column level constraints : limits only column data.

Ø  Table level constraints : limits whole table data

Constraints are used to make sure that the integrity of data is maintained in the database. Following are the most used constraints that can be applied to a table.

Ø  NOT NULL
Ø  UNIQUE
Ø  PRIMARY KEY
Ø  FOREIGN KEY
Ø  CHECK
Ø  DEFAULT

SQL Functions:

v  SQL provides many built-in functions to perform operations on data. These functions are useful while performing mathematical calculations, string concatenations, sub-strings etc.

SQL functions are divided into two categories,
Ø  Aggregate Functions
Ø  Scalar Functions

Aggregate Functions:

v  These functions return a single value after calculating from a group of values. Following are some frequently used Aggregate functions.

 AVG():

v  Average returns average value after calculating from values in a numeric column.
Its general Syntax is,

Ø  SELECT AVG(column name) from table name


COUNT():

v  Count returns the number of rows present in the table either based on some condition or without condition.

Its general Syntax is

v  SELECT COUNT(column_name) from table-name

FIRST()

v  First function returns first value of a selected column

Syntax for FIRST function is

v  SELECT FIRST(column_name) from table-name

LAST()

v  LAST return the return last value from selected column

Syntax of LAST function is:

v  SELECT LAST(column_name) from table-name

MAX():

v  MAX function returns maximum value from selected column of the table.

Syntax of MAX function is:

v  SELECT MAX(column name) from table-name

Scalar Functions:

v  Scalar functions return a single value from an input value. Following are soe frequently used Scalar Functions.

UCASE():

v  UCASE function is used to convert value of string column to Uppercase character.

Syntax of UCASE:

v  SELECT UCASE(column_name) from table-name

LCASE():

v  LCASE function is used to convert value of string column to Lowecase character.
Syntax for LCASE is,

v  SELECT LCASE(column_name) from table-name

ROUND():

v  ROUND function is used to round a numeric field to number of nearest integer. It is used on Decimal point values.

Syntax of Round function is:

Ø  SELECT ROUND(column_name, decimals) from table-name

Join in SQL:

v  SQL Join is used to fetch data from two or more tables, which is joined to appear as single set of data.

v  SQL Join is used for combining column from two or more tables by using values common to both tables. Join Keyword is used in SQL queries for joining two or more tables.

v  . Minimum required condition for joining table, is (n-1) where n, is number of tables. A table can also join to itself known as, Self Join.

Types of Join:

The following are the types of JOIN that we can use in SQL.

v  Inner
v  Outer
v  Left
v  Right

Set Operation in SQL:

v  SQL supports few Set operations to be performed on table data. These are used to get meaningful results from data, under different special conditions.

Union:

v  However it will eliminate duplicate rows from its result set. In case of union, number of columns and data type must be same in both the tables.

Intersect:

v  Intersect operation is used to combine two SELECT statements, but it only retuns the records which are common from both SELECT statements.


Minus

v  Minus operation combines result of two Select statements and return only those result which belongs to first set of result. MySQL does not support INTERSECT operator.
v
Keys:

v  A  key consists one or more attributes that helps us to determine other attributes .
v  The role of key is based on a concept known as determination.

Types of keys:
Ø  Super key
Ø  Candidate key
Ø  Primary key
Ø  Secondary key
Ø  Foreign key

Data types in sql:

Sql supports rich set of data types to store its value sql supports the following data types.

Types:

                Char ,name, date, varchar2, long, raw, long raw






UNIT-III

Entity Relationship Model:
v  the relationship between you and the phone numbers is 'has a phone number'. Diagrams created to design these entities and relationships are called entity–relationship diagrams or ER diagrams.
v  there are three levels of ER models that may be developed.
v  Developing an enterprise-wide conceptual ER model is useful to support documenting the data architecture for an organization.
v  The purpose of the conceptual ER model is then to establish structural metadata commonality for the master data entities between the set of logical ER models.
v  The logical ER model contains more detail than the conceptual ER model. In addition to master data entities, operational and transactional data entities are now defined.

v  . The details of each data entity are developed and the entity relationships between these data entities are established.


v  The physical model is normally forward engineered to instantiate the structural metadata into a database management system.
v  Relational database objects such as tables, database such as unique key indexes, and database constraints such as a foreign key constraint or a commonality constraint..

E-R Diagram:
v  ER-Diagram is a visual representation of data that describes how data is related to each other




Symbols and Notations:

Components of E-R Diagram:
Entity:
v  An Entity can be any object, place, person or class. In E-R Diagram, an entity is represented using rectangles.

Example:


Attribute

v  An Attribute describes a property or characteristics of an entity. For example, Name, Age, Address etc can be attributes of a Student. An attribute is represented using eclipse.

Key Attribute

v  Key attribute represents the main characteristic of an Entity. It is used to represent Primary key. Ellipse with underlying lines represent Key Attribute.

Composite Attribute

v  An attribute can also have their own attributes. These attributes are known as Composite attribute.

Relationship

v  A Relationship describes relations between entities. Relationship is represented using diamonds.

Ø  There are three types of relationship that exist between Entities.

v  Binary Relationship
v  Recursive Relationship
v  Ternary Relationship


Binary Relationship:

Binary Relationship means relation between two Entities. This is further divided into three types.

v  One to One : This type of relationship is rarely seen in real world.

v  One to Many : It reflects business rule that one entity is associated with many number of same entity.

v  Many to Many : The below diagram represents that many students can enroll for more than one courses


Recursive Relationship:- When an Entity is related with itself it is known as Recursive Relationship

Ternary Relationship:- Relationship of degree three is called Ternary relationship.

Weak Entity Types:- Weak entity types have no keys. Displayed by double-rectangular nodes.
Ø  Weak entity types typically have partial key for distinguishing their instances.

Strong entity types:- Regular entity types with keys are sometimes called strong entity types.

Generalization:
v Generalization is a bottom-up approach in which two lower level entities combine to form a higher level entity.


v In generalization, the higher level entity can also combine with other lower level entity to make further higher level entity.


Specialization:

v Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity can be broken down into two lower level entity.

In specialization, some higher level entities may not have lower-level entity sets at all.


Aggregation:

v  Aggregation is a process when relation between two entity is treated as a single entity. Here the relation between Center and Course, is acting as an Entity in relation with Visitor.


Normalization of Database:-

v  Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anamolies.

v  It is a two step process that puts data into tabular form by removing duplicated data  from the relation tables.

Normalization is used for mainly two purpose.:

v  Eliminating redundant(useless) data.

v  Ensuring data dependencies make sense i.e data is logically stored.

Updation Anamoly :
v  To update address of a student who occurs twice or more than twice in a table, we will have to update S_Address column in all the rows, else data will become inconsistent.

Insertion Anamoly :

v  Suppose for a new admission, we have a Student id(S_id), name and address of a student but if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anamoly.

Deletion Anamoly :

v  If (S_id) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it.

v
S_id                 S_Name                       S_Address       Subject_opted
401                   Adam                           Noida                          Bio
402                   Alex                            Panipat                        Maths
403                   Stuart                           Jammu                        Maths
404                   Adam                           Noida                           Physics


Normalization Rule:-

v  Normalization rule are divided into following normal form.

v  First Normal Form
v  Second Normal Form
v  Third Normal Form
v  BCNF


First Normal Form (1NF):

v  A row of data cannot contain repeating group of data i.e each column must have a unique value. Each row of data must have a unique identifier Primary key.

Example:

S_id                       S_Name
401                         Adam
402                         Alex

Second Normal Form (2NF):

v  A table to be normalized to Second Normal Form should meet all the needs of First Normal Form and there must not be any partial dependency of any column on primary key.

v  It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence.

v  If any column depends oly on one part of the concatenated key, then the table fails Second normal form.

Example:

Customer Table:

customer_id           Customer_Name           Order_id           Order_name                 Sale_detail
101                                     Adam                   10                           order1                          sale1
101                                     Adam                    11                           order2                          sale2

Third Normal Form (3NF)::

v  Third Normal form applies that every non-prime attribute of table must be dependent on primary key. The transitive functional dependency should be removed from the table.


Example:

Student_Detail Table :

Student_id             Student_name                         DOB     Street               city                   State   Zip

v  The advantage of removing transtive dependency is,

Ø  Amount of data duplication is reduced.
Ø  Data integrity achieved.

Boyce and Codd Normal Form (BCNF):

v  Boyce and Codd Normal Form is a higher version of the Third Normal form.

v  This form deals with certain type of anamoly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.


UNIT-IV

QUERY PROCESSING:

v  Query processing refers to the range of activites involved in extracting data from the database.

v  The activites include translation of queries in high level data bases languages in to expressions that can be used at the physical level of the file system.

  The following are the steps involved in processing a query.


MEASURE OF THE QUERY COST:

v  The cost of query evaluation can be measured in terms of a number of different resources including disk accesses in cpu time to execute a query

Ø  The number of seek operations performed
Ø  The number of blocks read
Ø  The number of blocks written.

SELECTION OPERATION:

FILE SCAN:
v  Search algorithms that locate and retrieve records that a fulfill  a selection condition.

v  Algorithm (A1)(linear search) scan each file block and test all records to see whether they satisfy the selection condition.

v  Linear search can be applied regardless of

Ø  Selection condition or
Ø  Ordering of the records in the file or
Ø  Availability of indices.
SORTING:

v  We may build an index on the relation and then use the index to read and relation in sorted order may lead to one disk block accesses each tuple.

v  For all relations that fit in the memory techniques like quick sort can be used  for all relations that do not fit the memory external sorting is good choice.

External sorting:

v  Let M denotes the memory size (in pages)

v  Create sorted runs let I be o initially repeatedly do the following till end of the relation.

Ø  Read M blocks of relation in the memory.
Ø  Sort in the memory blocks.
Ø  Write sorted data to run Ri to increment i






JOIN OPERATION:

v  Several different algorithms to implement joins

Ø  Nested loop join
Ø  Block nested loop join
Ø  Indexed nested loop join
Ø  Merge join
Ø  Hash join

Nested loop join:

v  In the worst case if there is enough memory only to hold one block to each relation the estimated cost is (Nr*Bs+Br).

Example of nested loop joins:

Ø  Assuming worst case memory availability cost estimate is with depositor as outer relation
5000*400+100=2000,100 block transfers.

Merge join:

v  Sort both relations  on their join attribute.

v  Merge the sorted relations to join them

§  Join step in similar to the merge stage of the sort merge algorthim.

A       
3
B
1
D
8
F
7
C
13
    a1  a2                     a1      a3
A
A
B
D
C
f

Hash join:

v  Applicable for equi joins and natural joins

v  A hash function h is used  to partition tuples of both relations

§  R0,r1,…………………..rn denotes partition of the r tuples.
§  Ro,r1,……………………rn denotes  partition of the s tuples.



Query optimization:

v  Query  optimization s the process of selecting the most efficient query evaluation plan from among the many strategies.

v  Usually possible for processing a given query specially if the query is complex

v  One aspect of optimization occurs at the relational algebra level where the system attempts to find an expression that is equivalent to the given expression.

Estimating statistics of expression results:

v  The cost of an operation depends on the size and other statistics of its inputs in order to estimate the cost of the query.

v  Some statistics about data base relations are stored in system catalog the system catalog contains the following information.

Optimization of generalization:

v  The refinement from an initial entity set into successive levels of entity sub groupings represents a top down design process in which distinctions are made explicit.

v  This type of process is called specialization the design process may also proceed in  a bottom up manner in which multiple entity sets.

v  Are grouped into a higher level entity set on the basis of common features this common can be expressed by generalization.

Example of query generalization:

Ø  The number of tuples in the relation  (nr)

Ø  The number of blocks containing tuples of relation (br)

Ø  The size of a tuple of relation r in bytes (lr)

Ø  Statistics about indices such as heights of b+ tree indices and number of lray pages in the indices.




          UNIT-V

TRANSACTION:

v A transaction is  a unit program execution that accesses and possibly updates various data item

v A transaction must see consistent databases.

v During transaction execution the data base may be inconsistent, when the transaction is committed the data base must be consistent.

v Two main issue to deal with

§  Failures of various kinds such as hardware failures and system crashes.

§  Concurrent execution of multiple transactions.

ACID PROPERTIES OF TRANSACTION:

ATOMICITY:

v Either all operations of the transaction are properly reflected in the data base or none

CONSISTENCY:

v Execution of the transaction in isolation preservers the consisitency of data base.

ISOLATION:

v Although multiple transaction  may execute concurrently each transaction must be unware of other  concurrent executing  transactions.

Example:

           After  Ti transaction is finished after tj transaction started.

DURABLITY:

v After a  transaction completes successfully the changes it has made to the database persist even if they are system failures.

Example of transaction:

§  Read(a)
§  A:=a-50
§  Write(a)
§  Read(b)
§  B:=b+50
§  Write(b)


TRANSACTION STATE:

ACTIVE:

v The initial state the transaction stays in the state while it is executing.

PARTIALLY COMMITED:

v After the final statement has been executed.

FAILED:

v After the discovery that normal execution can no longer proceed.

ABORTED:

v After  the transaction has rolled back and the database restored to its state prior to the start of the transaction.

Two options after it has aborted

                           Restart the transaction-only if no internal logical error
                           Kill the transaction

COMMITED:  

v After successful transaction.





CONCURRENT EEXECUTIONS:

v  Multiple transactions are allowed  to run concurrently in the system

Advantages:

Increased processor and disk utilization:

v  Leading to better transaction through put one transaction can be using the cpu while another is reading from writing on the disk.

Reduced average response time:

v  For transactions short transactions  need not wait behind long ones.

SERIALIZABILTY:

v  Basic assumption-each transaction preservers database consistency.

v  Thus serial execution of a set transactions preservers database consistency.

v  A possibly concurrent schedule is serializable if it is equivalent to a serial schedule different forms of schedule equivalence give rise to notations

Ø  Conflict serializablity
Ø  View serializablity

Conflict serializabilty:

v  If a  schedule s can be transformed in to schedule s by a series of swaps in non conflicting instructions we say that s and s’ are conflict equivalent.



v  We say that schedule s in conflict seriazble if it is conflict equivalent to a serial schedule.

View serializability:

v  Let s and s’  be two schedules with the same set of the transactions s and s’ are view equivalent