Computer Science and Engineering - Tutorials, Notes, MCQs, Questions and Answers: September 2013

Tuesday, September 24, 2013

Distibuted Databases - Important considerations

Important considerations in distributed database over Centralized database

When compared to the centralized database system, the distributed database system should be capable of or should have the following things.

Data allocation - We need to know the answers for the following questions; What to store? Where to store? and How to store?
Data fragmentation - It is about, How one should organize the data?
Distributed queries and transactions - We must find a way to handle the data using queries and to handle transactions which are happening in multiple distributed sites (Here site means a server).

1. Data allocation

Data allocation deals with the establishment of servers and maintenance of data in any locations. Data allocation strategies can be made by keeping the following things in mind;

The data should be available in or near a site where it is needed most.
The storage of data in a site should increase the availability and reliability of data.
The strategy chosen for data allocation should increase the performance. That is, some of the drawbacks like bottleneck problem of central server concept or limited usability of data should be avoided.
The idea should reduce the cost involved in storage and manipulation of data
There should be a much reduced traffic or utilization of network. It should also ensure that there should never be a unnecessary use of network provided the data available near.

As a whole, data allocation deals with the where to keep the fragmented or replicated data for ease of access.

2. Data fragmentation

Data fragmentation is about how to break a table into fragments?, how many fragments need to be created? A table can be fragmented based on a) what are the frequent applications accesses the data?, b) what conditions are frequently used to access the data?, and c) what is the simplest way of maintaining the table schema at any locations? Here, the questions (a) and (b) mean the attributes and their values used for accessing a table frequently. For example, for the query "SELECT * FROM student WHERE campus='Mumbai'", campus='Mumbai' is the attribute name and value combination.

Fragmentation is of two major types;

Horizontal fragmentation

Primary Horizontal fragmentation
Derived Horizontal fragmentation

Vertical fragmentation

3. Distributed queries and Transactions

When the data are fragmented or replicated and distributed over many sites in the network, then retrieval of the data involves the following;

The identification of the location of requested data,
A protocol to fetch the data, and
A way to organize the data, if it was spread over multiple sites.

Hence, Distributed Database System must be able to handle the data over the network. It just needs a special way to handle the queries and transactions over the conventional centralized database. That is, the system must understand the query and the query components and must be able to locate the data over network.

Further discussions on these considerations will be soon.

*********

Go to Distributed Database home

Distributed Database - Why?

Why Distributed Database?

We are interested in Distributed database for various reasons. Some of them are;

Data are always available to end users, i.e., they are easily accessible. The availability makes the total system reliable.
Distributed database increases the performance of the overall system. Because, the servers are available near the place where it is very much needed.
Support organizational growth. Because, the distributed database structure would not cause stopping of all ongoing services. Only new distributed server may need to be established to handle the new details.
Handling addition of any server, modification of existing modules etc. are easy.
Distributed data handling increases the parallelism. That is, a number of queries can be handled simultaneously over multiple distributed server when compared to the central server approach.

Let us consider the scenario of XYZ bank which is headquartered in New Delhi. Also, assume that the bank maintains its server in its head office. Now, all the bank transactions done at all the branches of XYZ bank must reach the central server to access the data. For example, consider a customer who is trying to withdraw the money from his account through an ATM located in Chennai. His withdrawal request must be sent to the central server, processed in central server, and money will be disbursed in the ATM. The following image shows the Central Server approach for any database for any organization. The requests initiated are shown in thick lines.

The following image shows the Distributed Server approach for the above given scenario. Now assume that, XYZ bank established several servers which are distributed throughout the country, say 10 different servers. Now, any request generated from the ATM from any part of the country will be forwarded to the server available in that part of the country. For any reason, if the requested data is not available with the local server, the server searches for the actual location of the requested data and forwards the request to that server, and routes the answer to the initiator.
The image shown below depicts the distributed server concept. It shows a set of DSs(Distributed Servers), a set of Nodes (not all are labeled), and a set of links which shows the request generated from node to the DS. The dashed line shows that the request generated by a node which is local to some other DS and the received DS forwarded to other DS where the intended data would be available. Here, the main advantage is consumption of network bandwidth is controlled, .i.e., network traffic reduced. Availability of the data and the server increased, as they are very close and accessible.

Saturday, September 21, 2013

Distributed Database - Introduction

What is Distributed Database? / Types of Distributed Databases / Homogeneous and Heterogeneous Distributed Databases

What is Distributed Database?

A database which is distributed over some form of network to bring down the cost or the difficulty in accessing data and to increase the efficiency of the whole system.

Types of Distributed Databases

1. Homogeneous Distributed Database

Identical software are used in all the sites

Here, a site refers to a server which is part of the distributed database system
Software would mean OS, DBMS software, and even the structure of the database used
In some cases, identical hardware are also used

All sites are well known.

As they are similar in terms of DBMS software and hardware used

Partial control over the data of other sites
- As we know the structure of databases, software and hardware used in other sites. Hence the partial control over the data is possible
Looks like a single central database system

2. Heterogeneous Distributed Database

Different sites uses different database software
The structure of databases reside in different sites may be different (because of data partitions)
Co-operation between sites are limited. That is, it is not easy to alter the structure of the database or any other software used

TOPICS (Click to Navigate)