The growth in grid databases, coupled with the utility of parallel query processing, presents an important opportunity to understand and utilize high–performance parallel database processing within a major database management system (DBMS). This important new book provides readers with a fundamental understanding of parallelism in data–intensive applications, and demonstrates how to develop faster capabilities to support them. It presents a balanced treatment of the theoretical and practical aspects of high–performance databases to demonstrate how parallel query is executed in a DBMS, including concepts, algorithms, analytical models, and grid transactions.
High–Performance Parallel Database Processing and Grid Databases serves as a valuable resource for researchers working in parallel databases and for practitioners interested in building a high–performance database. It is also a much–needed, self–contained textbook for database courses at the advanced undergraduate and graduate levels.
Chapter 1: Introduction.
1.1 A Brief Overview: Parallel Databases and Grid Databases.
1.2 Parallel Query Processing: Motivations.
1.3 Parallel Query Processing: Objectives.
1.4 Forms of Parallelism.
1.5 Parallel Database Architectures.
1.6 Grid Database Architecture.
1.7 Structure of this Book.
1.9 Bibliographical Notes.
Chapter 2: Analytical Models.
2.1 Cost Models.
2.2 Cost Notations.
2.3 Skew Model.
2.4 Basic Operations in Parallel Databases.
2.6 Bibliographical Notes.
PART II: BASIC QUERY PARALLELISM.
Chapter 3: Parallel Search.
3.1 Search Queries.
3.2 Data Partitioning.
3.3 Search Algorithms.
3.5 Bibliographical Notes.
Chapter 4: Parallel Sort and Group–By.
4.1 Sorting, Duplicate Removal, and Aggregate Queries.
4.2 Serial External Sorting Method.
4.3 Algorithms for Parallel External Sort.
4.4 Parallel Algorithms for GroupBy Queries.
4.5 Cost Models for Parallel Sort.
4.6 Cost Models for Parallel GroupBy.
4.8 Bibliographical Notes.
4.9 Exercises iii.
Chapter 5: Parallel Join.
5.1 Join Operations.
5.2 Serial Join Algorithms.
5.3 Parallel Join Algorithms.
5.4 Cost Models.
5.5 Parallel Join Optimization.
5.7 Bibliographical Notes.
PART III: ADVANCED PARALLEL QUERY PROCESSING.
Chapter 6: Parallel GroupBy–Join .
6.1 GroupBy–Join Queries.
6.2 Parallel Algorithms for GroupBy–Before–Join Query Processing.
6.3 Parallel Algorithms for ?GroupBy–After–Join? Query Processing.
6.4 Cost Model Notations.
6.5 Cost Model for "GroupBy–Before–Join" Query Processing.
6.6 Cost Model for "GroupBy–After–Join" Query Processing.
6.8 Bibliographical Notes.
Chapter 7: Parallel Indexing.
7.1 Parallel Indexing – An Internal Sight of Parallel Indexing Structures.
7.2 Parallel Indexing Structures.
7.3 Index Maintenance.
7.4 Index Storage Analysis.
7.5 Parallel Processing of Search Queries Using Index.
7.6 Parallel Index–Join Algorithms.
7.7 Comparative Analysis.
7.9 Bibliographical Notes.
Chapter 8: Parallel Universal Quantification ? Collection Join Queries.
8.1 Universal Quantification and Collection Join.
8.2 Collection Types and Collection Join Queries.
8.3 Parallel Algorithms for Collection Join Queries.
8.4 Parallel Collection–Equi Join Algorithms.
8.5 Parallel Collection–Intersect Join Algorithms.
8.6 Parallel Sub–Collection Join Algorithms.
8.8 Bibliographical Notes.
Chapter 9: Parallel Query Scheduling and Optimization.
9.1 Query Execution Plan.
9.2 Sub–Queries Execution Scheduling Strategies.
9.3 Serial vs. Parallel Execution Scheduling.
9.4 Scheduling Rules.
9.5 Cluster Query Processing Model.
9.6 Dynamic Cluster Query Optimization.
9.7 Other Approaches of Dynamic Query Optimization.
9.9 Bibliographical Notes.
PART IV: GRID DATABASES.
Chapter 10: Transactions in Distributed and Grid Databases.
10.1 Grid Database Challenges.
10.2 Distributed Database Systems and Multidatabase Systems.
10.3 Basic Definitions on Transaction Management.
10.4 ACID Properties of Transactions.
10.5 Transaction Management in Various Database Systems.
10.6 Requirements in Grid Database Systems.
10.7 Concurrency Control Protocols.
10.8 Atomic Commit Protocols.
10.9 Replica Synchronization Protocols.
10.11 Bibliographical Notes vi.
Chapter 11: Grid Concurrency Control.
11.1 A Grid Database Environment.
11.2 An Example.
11.3 Grid Concurrency Control.
11.4 Correctness of GCC Protocol.
11.5 Features of GCC Protocol.
11.7 Bibliographical Notes.
Chapter 12: Grid Transaction Atomicity and Durability.
12.2 Grid Atomic Commit Protocol (Grid–ACP).
12.3 Handling Failure of Sites with Grid–ACP.
12.5 Bibliographical Notes.
Chapter 13: Replica Management in Grids.
13.2 Replica Architecture.
13.3 Grid Replica Access Protocol (GRAP).
13.4 Handling Multiple Partitioning.
13.6 Bibliographical Notes.
Chapter 14: Grid Atomic Commitment in Replicated Data.
14.2 Modified Grid Atomic Commitment Protocol.
14.3 Transaction Properties in Replicated Environment.
14.5 Bibliographical Notes.
PART V: OTHER DATA INTENSIVE APPLICATIONS.
Chapter 15: Parallel Online Analytic Processing (OLAP) and Business Intelligence.
15.1 Parallel Multidimensional Analysis.
15.2 Parallelization of ROLLUP Queries.
15.3 Parallelization of CUBE Queries.
15.4 Parallelization of Top–N and Ranking Queries.
15.5 Parallelization of CUME—DIST Queries.
15.6 Parallelization of NTILE and Histogram Queries.
15.7 Parallelization of Moving Average and Windowing Queries.
15.9 Bibliographical Notes.
Chapter 16: Parallel Data Mining – Association Rules and Sequential Patterns.
16.1 From Databases, Data Warehousing, to Data Mining: A Journey.
16.2 Data Mining: A Brief Overview.
16.3 Parallel Association Rules.
16.4 Parallel Sequential Patterns.
16.6 Bibliographical Notes.
Chapter 17: Parallel Clustering and Classification.
17.1 Clustering and Classification.
17.2 Parallel Clustering.
17.3 Parallel Classification.
17.5 Bibliographical Notes.
Clement H. C. Leung, PhD, is Foundation Chair in Computer Science at Victoria University, Australia. Dr. Leung previously held the Established Chair in Computer Science at the University of London.
Wenny Rahayu, PhD, is Associate Professor at La Trobe University, Australia, and actively works in the areas of database design and implementation, covering object–relational databases and Web databases.
Sushant Goel, PhD, is a software consultant and holds a PhD in computer systems engineering from RMIT University, Australia. His research interests are in grid transaction management and software development processes, such as agile computing.