익명 사용자
로그인하지 않음
토론
기여
계정 만들기
로그인
IT 위키
검색
Propensity Score Matching
편집하기
IT 위키
이름공간
문서
토론
더 보기
더 보기
문서 행위
읽기
편집
원본 편집
역사
경고:
로그인하지 않았습니다. 편집을 하면 IP 주소가 공개되게 됩니다.
로그인
하거나
계정을 생성하면
편집자가 사용자 이름으로 기록되고, 다른 장점도 있습니다.
스팸 방지 검사입니다. 이것을 입력하지
마세요
!
'''Propensity Score Matching (PSM)''' is a statistical technique used in observational studies to reduce selection bias when estimating the causal effect of a treatment or intervention. It involves pairing treated and untreated units with similar propensity scores, which represent the probability of receiving the treatment based on observed covariates. ==Key Concepts== *'''Propensity Score:''' The probability of a unit receiving the treatment, given its covariates. *'''Matching:''' Pairing units from the treatment and control groups with similar propensity scores. *'''Balancing Covariates:''' Ensures that the treatment and control groups are comparable in terms of covariates. ==Steps in Propensity Score Matching== #'''Estimate Propensity Scores:''' Use logistic regression or another model to estimate propensity scores for each unit based on covariates. #'''Match Units:''' Pair treated units with untreated units that have similar propensity scores using methods like nearest neighbor or caliper matching. #'''Assess Balance:''' Check whether covariates are balanced between the matched treatment and control groups. #'''Estimate Treatment Effect:''' Compare outcomes between the matched groups to estimate the causal effect of the treatment. ==Matching Methods== Several methods are used for matching units based on propensity scores: *'''Nearest Neighbor Matching:''' Matches each treated unit with the closest untreated unit based on propensity score. *'''Caliper Matching:''' Matches units only if their propensity scores are within a predefined threshold. *'''Radius Matching:''' Matches treated units with all untreated units within a specified range of propensity scores. *'''Kernel Matching:''' Uses weighted averages of untreated units within a certain range of propensity scores. *'''Stratification Matching:''' Divides units into strata based on propensity scores and compares treated and untreated units within each stratum. ==Example of PSM in Python== Using the `statsmodels` library to estimate propensity scores and match units:<syntaxhighlight lang="python"> import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.neighbors import NearestNeighbors # Example dataset data = pd.DataFrame({ 'age': [25, 30, 45, 50, 35], 'income': [30000, 40000, 50000, 60000, 45000], 'treatment': [1, 0, 1, 0, 1], 'outcome': [1, 0, 1, 0, 1] }) # Estimate propensity scores model = LogisticRegression() model.fit(data[['age', 'income']], data['treatment']) data['propensity_score'] = model.predict_proba(data[['age', 'income']])[:, 1] # Match treated and untreated units treated = data[data['treatment'] == 1] untreated = data[data['treatment'] == 0] matcher = NearestNeighbors(n_neighbors=1) matcher.fit(untreated[['propensity_score']]) distances, indices = matcher.kneighbors(treated[['propensity_score']]) # Create matched dataset matched = untreated.iloc[indices.flatten()].reset_index(drop=True) matched['matched_to'] = treated.index.values print(matched) </syntaxhighlight> ==Applications of PSM== Propensity score matching is widely used in fields such as: *'''Healthcare:''' Evaluating the effectiveness of treatments or medical interventions. *'''Economics:''' Analyzing policy impacts using observational data. *'''Education:''' Assessing the impact of programs on student performance. *'''Marketing:''' Measuring the effects of campaigns or promotions. ==Advantages== *'''Reduces Selection Bias:''' Balances observed covariates between treatment and control groups. *'''Improves Causal Inference:''' Provides a framework for estimating treatment effects in observational studies. *'''Simple and Intuitive:''' Easy to implement and interpret. ==Limitations== *'''Unmeasured Confounding:''' PSM cannot account for unobserved variables that influence treatment assignment. *'''Data Loss:''' Matching may exclude units that cannot be paired, reducing the sample size. *'''Model Dependency:''' Results depend on the correctness of the model used to estimate propensity scores. ==Related Concepts and See Also== *[[Causal Inference]] *[[Matching Methods]] *[[Treatment Effect]] *[[Selection Bias]] *[[Logistic Regression]] *[[Covariate Balancing]] *[[Observational Studies]] [[분류:Data Science]]
요약:
IT 위키에서의 모든 기여는 크리에이티브 커먼즈 저작자표시-비영리-동일조건변경허락 라이선스로 배포된다는 점을 유의해 주세요(자세한 내용에 대해서는
IT 위키:저작권
문서를 읽어주세요). 만약 여기에 동의하지 않는다면 문서를 저장하지 말아 주세요.
또한, 직접 작성했거나 퍼블릭 도메인과 같은 자유 문서에서 가져왔다는 것을 보증해야 합니다.
저작권이 있는 내용을 허가 없이 저장하지 마세요!
취소
편집 도움말
(새 창에서 열림)
둘러보기
둘러보기
대문
최근 바뀜
광고
위키 도구
위키 도구
특수 문서 목록
문서 도구
문서 도구
사용자 문서 도구
더 보기
여기를 가리키는 문서
가리키는 글의 최근 바뀜
문서 정보
문서 기록