Distributed Database
IT 위키
Distributed Database is a collection of databases distributed across multiple physical locations that function as a single logical database. Each site can operate independently while participating in a unified database system through communication over a network.
Key Concepts[편집 | 원본 편집]
- Data Distribution: Data is distributed across multiple sites based on factors like performance, reliability, and locality.
 - Transparency: Users interact with the distributed database as if it were a single database, regardless of the underlying distribution.
 - Replication: Data is duplicated across multiple sites to improve fault tolerance and availability.
 - Partitioning: Data is divided into subsets, each stored at a specific location.
 
Characteristics[편집 | 원본 편집]
Distributed databases are defined by the following characteristics:
- Distributed Data Storage: Data is stored on multiple nodes or sites.
 - Autonomy: Each node can function independently and manage its local database.
 - Transparency:
- Location Transparency: Users do not need to know where data is physically stored.
 - Replication Transparency: Users are unaware of data being replicated across sites.
 - Fragmentation Transparency: Users do not need to know how data is partitioned.
 
 - Scalability: The system can grow by adding more nodes.
 - Fault Tolerance: Replication and redundancy provide resilience to failures.
 
Types of Distributed Databases[편집 | 원본 편집]
Distributed databases can be classified based on their architecture:
- Homogeneous Distributed Database:
- All nodes use the same database management system (DBMS).
 - Example: A PostgreSQL cluster.
 
 - Heterogeneous Distributed Database:
- Nodes may use different DBMSs but are integrated into a single system.
 - Example: A system integrating MySQL and Oracle databases.
 
 - Federated Database:
- Autonomous databases are integrated through a middleware layer.
 - Example: A research database integrating multiple institutional datasets.
 
 
Advantages[편집 | 원본 편집]
- Improved Performance: Data is stored closer to where it is needed, reducing access time.
 - Fault Tolerance: Data replication ensures system availability during node failures.
 - Scalability: The system can handle growing amounts of data by adding more nodes.
 - Resource Sharing: Enables sharing of hardware, software, and data resources.
 
Limitations[편집 | 원본 편집]
- Complexity: Managing a distributed database is more complex than a centralized one.
 - Consistency: Maintaining consistency across nodes in a distributed system can be challenging.
 - Communication Overhead: Data synchronization and query execution across nodes incur network overhead.
 - Latency: Network delays can affect query response times.
 
Example: Distributed Query in a Distributed Database[편집 | 원본 편집]
Consider a distributed database with two nodes:
- Node 1 stores employee data.
 - Node 2 stores department data.
 
Query: Retrieve the names of employees in the "Sales" department.
Steps[편집 | 원본 편집]
| Step | Action | Performed On | 
|---|---|---|
| 1 | Parse query: SELECT employees.name FROM employees JOIN departments ON employees.dept_id = departments.dept_id WHERE departments.name = 'Sales'. | Query Coordinator | 
| 2 | Decompose query into sub-queries:
  | |
| 3 | Execute sub-queries on respective nodes:
  | |
| 4 | Combine results and return final output. | Query Coordinator | 
Data Distribution Techniques[편집 | 원본 편집]
Distributed databases use the following techniques to distribute data:
- Replication:
- Duplicates data across multiple sites.
 - Improves fault tolerance and read performance but requires synchronization.
 
 - Fragmentation:
- Divides data into fragments, stored at different sites.
 - Types:
- Horizontal Fragmentation: Divides a table into rows.
 - Vertical Fragmentation: Divides a table into columns.
 - Hybrid Fragmentation: Combines horizontal and vertical fragmentation.
 
 
 - Hybrid Distribution:
- Combines replication and fragmentation to optimize performance and fault tolerance.
 
 
Applications[편집 | 원본 편집]
Distributed databases are widely used in:
- Global Enterprises: Managing geographically dispersed data.
 - Cloud Databases: Supporting distributed cloud-based platforms like Google Spanner and Amazon Aurora.
 - IoT Systems: Managing data from distributed devices.
 - Big Data Analytics: Processing large-scale distributed datasets.
 
Challenges[편집 | 원본 편집]
Distributed databases face several challenges:
- Data Consistency: Ensuring consistency across replicas while maintaining performance.
 - Network Partitioning: Handling situations where communication between nodes is disrupted.
 - Query Optimization: Efficiently executing queries across distributed nodes.
 - Security: Securing data transmission and storage across multiple locations.