Distributed Database 편집하기

'''Distributed Database''' is a collection of databases distributed across multiple physical locations that function as a single logical database. Each site can operate independently while participating in a unified database system through communication over a network.
==Key Concepts==
*'''Data Distribution:''' Data is distributed across multiple sites based on factors like performance, reliability, and locality.
*'''Transparency:''' Users interact with the distributed database as if it were a single database, regardless of the underlying distribution.
*'''Replication:''' Data is duplicated across multiple sites to improve fault tolerance and availability.
*'''Partitioning:''' Data is divided into subsets, each stored at a specific location.
==Characteristics==
Distributed databases are defined by the following characteristics:
*'''Distributed Data Storage:''' Data is stored on multiple nodes or sites.
*'''Autonomy:''' Each node can function independently and manage its local database.
*'''Transparency:'''
**'''Location Transparency:''' Users do not need to know where data is physically stored.
**'''Replication Transparency:''' Users are unaware of data being replicated across sites.
**'''Fragmentation Transparency:''' Users do not need to know how data is partitioned.
*'''Scalability:''' The system can grow by adding more nodes.
*'''Fault Tolerance:''' Replication and redundancy provide resilience to failures.
==Types of Distributed Databases==
Distributed databases can be classified based on their architecture:
#'''Homogeneous Distributed Database:'''
#*All nodes use the same database management system (DBMS).
#*Example: A PostgreSQL cluster.
#'''Heterogeneous Distributed Database:'''
#*Nodes may use different DBMSs but are integrated into a single system.
#*Example: A system integrating MySQL and Oracle databases.
#'''Federated Database:'''
#*Autonomous databases are integrated through a middleware layer.
#*Example: A research database integrating multiple institutional datasets.
==Advantages==
*'''Improved Performance:''' Data is stored closer to where it is needed, reducing access time.
*'''Fault Tolerance:''' Data replication ensures system availability during node failures.
*'''Scalability:''' The system can handle growing amounts of data by adding more nodes.
*'''Resource Sharing:''' Enables sharing of hardware, software, and data resources.
==Limitations==
*'''Complexity:''' Managing a distributed database is more complex than a centralized one.
*'''Consistency:''' Maintaining consistency across nodes in a distributed system can be challenging.
*'''Communication Overhead:''' Data synchronization and query execution across nodes incur network overhead.
*'''Latency:''' Network delays can affect query response times.
==Example: Distributed Query in a Distributed Database==
Consider a distributed database with two nodes:
*Node 1 stores employee data.
*Node 2 stores department data.
Query: Retrieve the names of employees in the "Sales" department.
===Steps===
{| class="wikitable"
!Step!!Action!!Performed On
|-
|1||Parse query: SELECT employees.name FROM employees JOIN departments ON employees.dept_id = departments.dept_id WHERE departments.name = 'Sales'.||Query Coordinator
|-
|2||Decompose query into sub-queries:
*Query 1: Retrieve department IDs for "Sales" from Node 2.
*Query 2: Retrieve employee names for the matching department IDs from Node 1. || Query Coordinator
|-
|3||Execute sub-queries on respective nodes:
*Node 2 returns department IDs for "Sales."
*Node 1 returns employee names for matching department IDs. || Node 1, Node 2
|-
|4||Combine results and return final output.||Query Coordinator
|}
==Data Distribution Techniques==
Distributed databases use the following techniques to distribute data:
*'''Replication:'''
**Duplicates data across multiple sites.
**Improves fault tolerance and read performance but requires synchronization.
*'''Fragmentation:'''
**Divides data into fragments, stored at different sites.
**Types:
***'''Horizontal Fragmentation:''' Divides a table into rows.
***'''Vertical Fragmentation:''' Divides a table into columns.
***'''Hybrid Fragmentation:''' Combines horizontal and vertical fragmentation.
*'''Hybrid Distribution:'''
**Combines replication and fragmentation to optimize performance and fault tolerance.
==Applications==
Distributed databases are widely used in:
*'''Global Enterprises:''' Managing geographically dispersed data.
*'''Cloud Databases:''' Supporting distributed cloud-based platforms like Google Spanner and Amazon Aurora.
*'''IoT Systems:''' Managing data from distributed devices.
*'''Big Data Analytics:''' Processing large-scale distributed datasets.
==Challenges==
Distributed databases face several challenges:
*'''Data Consistency:''' Ensuring consistency across replicas while maintaining performance.
*'''Network Partitioning:''' Handling situations where communication between nodes is disrupted.
*'''Query Optimization:''' Efficiently executing queries across distributed nodes.
*'''Security:''' Securing data transmission and storage across multiple locations.
==See Also==
*[[Distributed Systems]]
*[[Query Optimization]]
*[[Database Replication]]
*[[Sharding]]
*[[CAP Theorem]]
*[[Distributed Query Processing]]
*[[Cloud Databases]]
[[Category:Database]]
[[Category:Distributed Computing]]