Showing posts with label Query Processing. Show all posts
Showing posts with label Query Processing. Show all posts

Saturday, November 19, 2016

Parallel Query Processing - Examples

Parallel query execution - example SQL queries / Inter-query, Intra-query parallelism examples / Inter-operation and Intra-operation parallelism examples / Independent and Pipelined parallelism examples



Parallel Query Processing









Go back to PARALLEL DATABASES home page





Monday, October 6, 2014

Database Query Processing


Database Query Processing / Overview / Techniques / Evaluation of different operations

Database Query Processing

Different measures for calculating query cost

Measures of Query Cost / Different measures used in calculating the query cost / What are the different measures that are to be considered which calculating the query cost? / Query cost evaluation measures

Measures of Query Cost


In DBMS, the cost involved in executing a query can be measured by considering the number of different resources that are listed below;
  • The number of disk accesses / the number of disk block transfers / the size of the table
  • Time taken by CPU for executing the query

The time taken by CPU is negligible in most systems when compared with the number of disk accesses.

If we consider the number of block transfers as the main component in calculating the cost of a query, it would include more sub-components. Those are;
Rotational latency – time taken to bring and spin the required data under the read-write head of the disk.
Seek time – time taken to position the read-write head over the required track or cylinder.
Sequential I/O – reading data that are stored in contiguous blocks of the disk
Random I/O – reading data that are stored in different blocks that are not contiguous.

That is, the blocks might be stored in different tracks, or different cylinders, etc.
Whether read/write? – read takes less time, write takes more.

From these sub-components, we would list the components of a more accurate measure as follows;
  • The number of seek operations performed
  • The number of block read
  • The number of blocks written
To get the final result, these numbers to be multiplied by the average time required to complete the task. Hence, it can be written as follows;

Query cost = (number of seek operations X average seek time) +

(number of blocks read X average transfer time for reading a block) +

(number of blocks written X average transfer time for writing a block)

Note: here, CPU cost and few other costs like cost of writing the final result are omitted.




Thursday, September 25, 2014

Query Processing in DBMS

Query Processing in DBMS / Steps involved in Query Processing in DBMS / How is a query gets processed in a Database Management System? / Query Processing overview / Database Query Processing

Query Processing

Query Processing would mean the entire process or activity which involves query translation into low level instructions, query optimization to save resources, cost estimation or evaluation of query, and extraction of data from the database.
Goal: To find an efficient Query Execution Plan for a given SQL query which would minimize the cost considerably, especially time.
Cost Factors: Disk accesses [which typically consumes time], read/write operations [which typically needs resources such as memory/RAM].
The major steps involved in query processing are depicted in the figure below;
Figure 1 - Steps in Database Query Processing
Let us discuss the whole process with an example. Let us consider the following two relations as the example tables for our discussion;

Employee(Eno, Ename, Phone)
Proj_Assigned(Eno, Proj_No, Role, DOP)
where,
Eno is Employee number,
Ename is Employee name,
Proj_No is Project Number in which an employee is assigned,
Role is the role of an employee in a project,
DOP is duration of the project in months.
With this information, let us write a query to find the list of all employees who are working in a project which is more than 10 months old.
SELECT Ename
FROM Employee, Proj_Assigned
WHERE Employee.Eno = Proj_Assigned.Eno AND DOP > 10;
Input:
A query written in SQL is given as input to the query processor. For our case, let us consider the SQL query written above.
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the query, the user’s privileges to execute the query, the table names and attribute names, etc. The correct table names, attribute names and the privilege of the users can be taken from the system catalog (data dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL to low level instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent as follows;
πEnameDOP>10 Λ Employee.Eno=Proj_Assigned.Eno(Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical data are information about the size of the table, the length of records, the indexes created on the table, etc. Optimizer also checks for the conditions and conditional attributes which are parts of the query.
Step 4: Execution Plan
A query can be expressed in many ways. The query processor module, at this stage, using the information collected in step 3 to find different relational algebra expressions that are equivalent and return the result of the one which we have written already.
For our example, the query written in Relational algebra can also be written as the one given below;
πEname(Employee Eno  DOP>10 (Prof_Assigned)))
So far, we have got two execution plans. Only condition is that both plans should give the same result.
Step 5: Evaluation
Though we got many execution plans constructed through statistical data, though they return same result (obvious), they differ in terms of Time consumption to execute the query, or the Space required executing the query. Hence, it is mandatory choose one plan which obviously consumes less cost.
At this stage, we choose one execution plan of the several we have developed. This Execution plan accesses data from the database to give the final result.
In our example, the second plan may be good. In the first plan, we join two relations (costly operation) then apply the condition (conditions are considered as filters) on the joined relation. This consumes more time as well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the Employee table. This join may need to compare less number of records. Hence, the second plan is the best (with the information known, not always).
Output:
The final result is shown to the user.

The overall information discussed above are depicted in Figure 2 below;
Figure 2 - Query Processing [Note: in Step 4, NJ means Natural Join]



 

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery