Distributed Database

Distributed Database is a collection of databases distributed across multiple physical locations that function as a single logical database. Each site can operate independently while participating in a unified database system through communication over a network.

Key Concepts[편집 | 원본 편집]

Data Distribution: Data is distributed across multiple sites based on factors like performance, reliability, and locality.
Transparency: Users interact with the distributed database as if it were a single database, regardless of the underlying distribution.
Replication: Data is duplicated across multiple sites to improve fault tolerance and availability.
Partitioning: Data is divided into subsets, each stored at a specific location.

Characteristics[편집 | 원본 편집]

Distributed databases are defined by the following characteristics:

Distributed Data Storage: Data is stored on multiple nodes or sites.
Autonomy: Each node can function independently and manage its local database.
Transparency:
- Location Transparency: Users do not need to know where data is physically stored.
- Replication Transparency: Users are unaware of data being replicated across sites.
- Fragmentation Transparency: Users do not need to know how data is partitioned.
Scalability: The system can grow by adding more nodes.
Fault Tolerance: Replication and redundancy provide resilience to failures.

Types of Distributed Databases[편집 | 원본 편집]

Distributed databases can be classified based on their architecture:

Homogeneous Distributed Database:
- All nodes use the same database management system (DBMS).
- Example: A PostgreSQL cluster.
Heterogeneous Distributed Database:
- Nodes may use different DBMSs but are integrated into a single system.
- Example: A system integrating MySQL and Oracle databases.
Federated Database:
- Autonomous databases are integrated through a middleware layer.
- Example: A research database integrating multiple institutional datasets.

Advantages[편집 | 원본 편집]

Improved Performance: Data is stored closer to where it is needed, reducing access time.
Fault Tolerance: Data replication ensures system availability during node failures.
Scalability: The system can handle growing amounts of data by adding more nodes.
Resource Sharing: Enables sharing of hardware, software, and data resources.

Limitations[편집 | 원본 편집]

Complexity: Managing a distributed database is more complex than a centralized one.
Consistency: Maintaining consistency across nodes in a distributed system can be challenging.
Communication Overhead: Data synchronization and query execution across nodes incur network overhead.
Latency: Network delays can affect query response times.

Example: Distributed Query in a Distributed Database[편집 | 원본 편집]

Consider a distributed database with two nodes:

Node 1 stores employee data.
Node 2 stores department data.

Query: Retrieve the names of employees in the "Sales" department.

Steps[편집 | 원본 편집]

Step	Action	Performed On
1	Parse query: SELECT employees.name FROM employees JOIN departments ON employees.dept_id = departments.dept_id WHERE departments.name = 'Sales'.	Query Coordinator
2	Decompose query into sub-queries: Query 1: Retrieve department IDs for "Sales" from Node 2. Query 2: Retrieve employee names for the matching department IDs from Node 1. \|\| Query Coordinator
3	Execute sub-queries on respective nodes: Node 2 returns department IDs for "Sales." Node 1 returns employee names for matching department IDs. \|\| Node 1, Node 2
4	Combine results and return final output.	Query Coordinator

Data Distribution Techniques[편집 | 원본 편집]

Distributed databases use the following techniques to distribute data:

Replication:
- Duplicates data across multiple sites.
- Improves fault tolerance and read performance but requires synchronization.
Fragmentation:
- Divides data into fragments, stored at different sites.
- Types:
  - Horizontal Fragmentation: Divides a table into rows.
  - Vertical Fragmentation: Divides a table into columns.
  - Hybrid Fragmentation: Combines horizontal and vertical fragmentation.
Hybrid Distribution:
- Combines replication and fragmentation to optimize performance and fault tolerance.

Applications[편집 | 원본 편집]

Distributed databases are widely used in:

Global Enterprises: Managing geographically dispersed data.
Cloud Databases: Supporting distributed cloud-based platforms like Google Spanner and Amazon Aurora.
IoT Systems: Managing data from distributed devices.
Big Data Analytics: Processing large-scale distributed datasets.

Challenges[편집 | 원본 편집]

Distributed databases face several challenges:

Data Consistency: Ensuring consistency across replicas while maintaining performance.
Network Partitioning: Handling situations where communication between nodes is disrupted.
Query Optimization: Efficiently executing queries across distributed nodes.
Security: Securing data transmission and storage across multiple locations.

익명 사용자

검색

Distributed Database

이름공간

더 보기

문서 행위

목차

Key Concepts[편집 | 원본 편집]

Characteristics[편집 | 원본 편집]

Types of Distributed Databases[편집 | 원본 편집]

Advantages[편집 | 원본 편집]

Limitations[편집 | 원본 편집]

Example: Distributed Query in a Distributed Database[편집 | 원본 편집]

Steps[편집 | 원본 편집]

Data Distribution Techniques[편집 | 원본 편집]

Applications[편집 | 원본 편집]

Challenges[편집 | 원본 편집]

See Also[편집 | 원본 편집]

둘러보기

둘러보기

광고

위키 도구

위키 도구

익명 사용자

검색

Distributed Database

Key Concepts[편집 | 원본 편집]

Characteristics[편집 | 원본 편집]

Types of Distributed Databases[편집 | 원본 편집]

Advantages[편집 | 원본 편집]

Limitations[편집 | 원본 편집]

Example: Distributed Query in a Distributed Database[편집 | 원본 편집]

Steps[편집 | 원본 편집]

Data Distribution Techniques[편집 | 원본 편집]

Applications[편집 | 원본 편집]

Challenges[편집 | 원본 편집]

See Also[편집 | 원본 편집]

둘러보기

위키 도구

문서 도구

분류 목록