익명 사용자
로그인하지 않음
토론
기여
계정 만들기
로그인
IT 위키
검색
Support Vector Machine
편집하기
IT 위키
이름공간
문서
토론
더 보기
더 보기
문서 행위
읽기
편집
원본 편집
역사
경고:
로그인하지 않았습니다. 편집을 하면 IP 주소가 공개되게 됩니다.
로그인
하거나
계정을 생성하면
편집자가 사용자 이름으로 기록되고, 다른 장점도 있습니다.
스팸 방지 검사입니다. 이것을 입력하지
마세요
!
'''Support Vector Machine (SVM)''' is a powerful supervised machine learning algorithm used for both classification and regression tasks, though it is primarily used in classification. SVM works by finding the optimal boundary, or hyperplane, that best separates the data points of different classes. SVM is effective in high-dimensional spaces and is especially suitable for binary classification problems. ==How It Works== SVM aims to maximize the margin between data points of different classes, where the margin is defined as the distance between the closest data points (support vectors) of each class and the hyperplane. By maximizing this margin, SVM achieves a decision boundary that is robust and generalizes well to new data. If data is not linearly separable, SVM can use a '''kernel trick''' to transform the data into a higher-dimensional space, where a separating hyperplane can be found. Common kernels include: *'''Linear Kernel''': Suitable for linearly separable data. *'''Polynomial Kernel''': Used for non-linear relationships, where the degree of the polynomial can be adjusted. *'''Radial Basis Function (RBF) Kernel''': A popular choice for highly non-linear data. *'''Sigmoid Kernel''': Sometimes used as an alternative to RBF, though less common. ==Applications of SVM== SVM is widely applied across industries, especially in applications requiring high accuracy and interpretability. Common use cases include: *'''Image Classification''': Object and facial recognition, where SVM effectively separates complex visual features. *'''Text Classification''': Spam filtering and sentiment analysis, where SVM performs well with high-dimensional data such as text features. *'''Medical Diagnosis''': Disease prediction and classification of medical data, leveraging SVM’s robustness with noisy and complex datasets. *'''Bioinformatics''': Gene expression data classification, where the large number of features and small sample sizes benefit from SVM’s margin-maximizing approach. ==Key Parameters in SVM== Several important parameters in SVM influence its performance: *'''C (Regularization Parameter)''': Controls the trade-off between maximizing the margin and minimizing classification error. A smaller C allows for a larger margin at the cost of more misclassified points, while a larger C aims for accurate classification at the risk of a smaller margin. *'''Gamma (γ)''': Specific to RBF and polynomial kernels, gamma defines the influence of individual data points. A higher gamma focuses more on close points, potentially capturing complex patterns but risking overfitting. *'''Kernel Choice''': Choosing an appropriate kernel is essential, as it directly affects the separation boundary in non-linear problems. ==Advantages and Disadvantages of SVM== '''Advantages:''' *'''Effective in High-Dimensional Spaces''': SVM performs well with high-dimensional data, especially when the number of features exceeds the number of samples. *'''Robust with Clear Margins''': Provides strong predictive power with a distinct margin, making it effective for binary classification. '''Disadvantages:''' *'''Computationally Intensive''': SVM can be slow to train with large datasets, especially with complex kernels. *'''Less Effective on Noisy Data''': Sensitive to overlapping classes or mislabeled data, which can reduce its classification accuracy. ==Evaluation Metrics for SVM== To evaluate SVM model performance, several metrics are commonly used: *'''[[Accuracy]]''': The proportion of correct predictions over total predictions. *'''[[Precision]]''': The ratio of true positives to all predicted positives, important in scenarios with imbalanced data. *'''[[Recall]]''': The ratio of true positives to all actual positives, critical in applications where missing positive cases has high costs. *'''[[F1 Score]]''': The harmonic mean of precision and recall, providing a balanced metric for imbalanced classes. *'''[[ROC Curve|ROC]]-[[AUC]]''': The area under the ROC curve, which measures the model’s ability to distinguish between classes across thresholds. ==See Also== *[[Logistic Regression]] *[[Decision Tree]] *[[K-Nearest Neighbor]] *[[Random Forest]] *[[Naive Bayes]] [[Category:Data Science]]
요약:
IT 위키에서의 모든 기여는 크리에이티브 커먼즈 저작자표시-비영리-동일조건변경허락 라이선스로 배포된다는 점을 유의해 주세요(자세한 내용에 대해서는
IT 위키:저작권
문서를 읽어주세요). 만약 여기에 동의하지 않는다면 문서를 저장하지 말아 주세요.
또한, 직접 작성했거나 퍼블릭 도메인과 같은 자유 문서에서 가져왔다는 것을 보증해야 합니다.
저작권이 있는 내용을 허가 없이 저장하지 마세요!
취소
편집 도움말
(새 창에서 열림)
둘러보기
둘러보기
대문
최근 바뀜
광고
위키 도구
위키 도구
특수 문서 목록
문서 도구
문서 도구
사용자 문서 도구
더 보기
여기를 가리키는 문서
가리키는 글의 최근 바뀜
문서 정보
문서 기록