Propensity Score Matching 편집하기

'''Propensity Score Matching (PSM)''' is a statistical technique used in observational studies to reduce selection bias when estimating the causal effect of a treatment or intervention. It involves pairing treated and untreated units with similar propensity scores, which represent the probability of receiving the treatment based on observed covariates.
==Key Concepts==
*'''Propensity Score:''' The probability of a unit receiving the treatment, given its covariates.
*'''Matching:''' Pairing units from the treatment and control groups with similar propensity scores.
*'''Balancing Covariates:''' Ensures that the treatment and control groups are comparable in terms of covariates.
==Steps in Propensity Score Matching==
#'''Estimate Propensity Scores:''' Use logistic regression or another model to estimate propensity scores for each unit based on covariates.
#'''Match Units:''' Pair treated units with untreated units that have similar propensity scores using methods like nearest neighbor or caliper matching.
#'''Assess Balance:''' Check whether covariates are balanced between the matched treatment and control groups.
#'''Estimate Treatment Effect:''' Compare outcomes between the matched groups to estimate the causal effect of the treatment.
==Matching Methods==
Several methods are used for matching units based on propensity scores:
*'''Nearest Neighbor Matching:''' Matches each treated unit with the closest untreated unit based on propensity score.
*'''Caliper Matching:''' Matches units only if their propensity scores are within a predefined threshold.
*'''Radius Matching:''' Matches treated units with all untreated units within a specified range of propensity scores.
*'''Kernel Matching:''' Uses weighted averages of untreated units within a certain range of propensity scores.
*'''Stratification Matching:''' Divides units into strata based on propensity scores and compares treated and untreated units within each stratum.
==Example of PSM in Python==
Using the `statsmodels` library to estimate propensity scores and match units:<syntaxhighlight lang="python">
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors

# Example dataset
data = pd.DataFrame({
    'age': [25, 30, 45, 50, 35],
    'income': [30000, 40000, 50000, 60000, 45000],
    'treatment': [1, 0, 1, 0, 1],
    'outcome': [1, 0, 1, 0, 1]
})

# Estimate propensity scores
model = LogisticRegression()
model.fit(data[['age', 'income']], data['treatment'])
data['propensity_score'] = model.predict_proba(data[['age', 'income']])[:, 1]

# Match treated and untreated units
treated = data[data['treatment'] == 1]
untreated = data[data['treatment'] == 0]

matcher = NearestNeighbors(n_neighbors=1)
matcher.fit(untreated[['propensity_score']])
distances, indices = matcher.kneighbors(treated[['propensity_score']])

# Create matched dataset
matched = untreated.iloc[indices.flatten()].reset_index(drop=True)
matched['matched_to'] = treated.index.values
print(matched)
</syntaxhighlight>
==Applications of PSM==
Propensity score matching is widely used in fields such as:
*'''Healthcare:''' Evaluating the effectiveness of treatments or medical interventions.
*'''Economics:''' Analyzing policy impacts using observational data.
*'''Education:''' Assessing the impact of programs on student performance.
*'''Marketing:''' Measuring the effects of campaigns or promotions.
==Advantages==
*'''Reduces Selection Bias:''' Balances observed covariates between treatment and control groups.
*'''Improves Causal Inference:''' Provides a framework for estimating treatment effects in observational studies.
*'''Simple and Intuitive:''' Easy to implement and interpret.
==Limitations==
*'''Unmeasured Confounding:''' PSM cannot account for unobserved variables that influence treatment assignment.
*'''Data Loss:''' Matching may exclude units that cannot be paired, reducing the sample size.
*'''Model Dependency:''' Results depend on the correctness of the model used to estimate propensity scores.
==Related Concepts and See Also==
*[[Causal Inference]]
*[[Matching Methods]]
*[[Treatment Effect]]
*[[Selection Bias]]
*[[Logistic Regression]]
*[[Covariate Balancing]]
*[[Observational Studies]]
[[분류:Data Science]]