Data Science Cheat Sheet 편집하기 (부분)

== Models ==

* '''Support Vector Machine (SVM)''': A supervised model that finds the optimal hyperplane for class separation, widely used in high-dimensional tasks like text classification (e.g., spam detection).
** '''''Advantage''''': Effective in high-dimensional spaces and robust to overfitting with the proper kernel.
** '''''Disadvantage''''': Computationally intensive on large datasets and sensitive to parameter tuning.
* '''k-Nearest Neighbors (kNN)''': A non-parametric method that classifies based on nearest neighbors, often applied in recommendation systems and image recognition.
** '''''Advantage''''': Simple and intuitive, with no training phase, making it easy to implement.
** '''''Disadvantage''''': Computationally expensive at prediction time, especially with large datasets, and sensitive to irrelevant features.
* '''Decision Tree''': A model that splits data into branches based on feature values, useful for interpretable applications like customer segmentation and medical diagnosis.
** '''''Advantage''''': Highly interpretable and handles both numerical and categorical data well.
** '''''Disadvantage''''': Prone to overfitting, especially with deep trees, and can be sensitive to small data changes.
* '''Linear Regression''': A statistical technique that predicts a continuous outcome based on linear relationships, commonly used in financial forecasting and trend analysis.
** '''''Advantage''''': Simple and interpretable, with fast training for large datasets.
** '''''Disadvantage''''': Assumes a linear relationship, so it's unsuitable for complex, non-linear data.
* '''Logistic Regression''': A classification model estimating the probability of a binary outcome, widely used in credit scoring and binary medical diagnostics.
** '''''Advantage''''': Interpretable with a clear probabilistic output, efficient for binary classification.
** '''''Disadvantage''''': Limited to linear boundaries, making it ineffective for complex relationships without transformations.
* '''Naive Bayes''': A probabilistic classifier assuming feature independence, effective in text classification tasks like spam filtering due to its speed and simplicity.
** '''''Advantage''''': Fast and efficient, especially on large datasets with independence assumptions holding.
** '''''Disadvantage''''': Assumes feature independence, which may reduce accuracy if dependencies exist between features.