|
|
(One intermediate revision by the same user not shown) |
Line 1: |
Line 1: |
| '''Logistic Regression'''
| | #REDIRECT [[Logistic Regression]] |
| | |
| '''Logistic Regression''' is a statistical and machine learning algorithm used for binary classification tasks, where the output variable is categorical and typically represents two classes (e.g., yes/no, spam/not spam, fraud/not fraud). Despite its name, Logistic Regression is a classification algorithm, not a regression algorithm, as it predicts probabilities of classes rather than continuous values.
| |
| | |
| == How It Works ==
| |
| | |
| Logistic Regression models the probability of a binary outcome using a logistic function, also known as the sigmoid function. The sigmoid function compresses values to range between 0 and 1, representing the probability of belonging to a particular class. The model predicts the probability that the input belongs to the positive class (1) and classifies it by applying a threshold, often 0.5.
| |
| | |
| The logistic function is represented by:
| |
| | |
| P(y=1 | X) = 1 / (1 + e^-(b0 + b1X1 + b2X2 + ... + bnXn))
| |
| | |
| where:
| |
| * '''P(y=1 | X)''' is the probability of the output being 1 given the input features.
| |
| * '''X1, X2, ..., Xn''' are the input features.
| |
| * '''b0''' is the intercept, and '''b1, b2, ..., bn''' are the coefficients of the features.
| |
| | |
| == Types of Logistic Regression ==
| |
| | |
| * '''Binary Logistic Regression''': Used for binary classification with two possible outcomes (e.g., yes/no).
| |
| * '''Multinomial Logistic Regression''': Used when the outcome variable has more than two categories without any ordering (e.g., classifying types of animals).
| |
| * '''Ordinal Logistic Regression''': Used when the outcome variable has ordered categories (e.g., ranking levels from low to high).
| |
| | |
| == Applications of Logistic Regression ==
| |
| | |
| Logistic Regression is widely used across industries due to its simplicity, interpretability, and effectiveness in binary classification tasks:
| |
| | |
| * '''Healthcare''': Predicting disease outcomes, risk assessments, and patient survival chances.
| |
| * '''Finance''': Credit scoring, fraud detection, and risk analysis.
| |
| * '''Marketing''': Customer churn prediction, targeting potential buyers, and lead qualification.
| |
| * '''Social Sciences''': Survey analysis, where responses fall into categories like agree/disagree or support/oppose.
| |
| | |
| == Key Metrics for Evaluating Logistic Regression ==
| |
| | |
| To assess the performance of a Logistic Regression model, common metrics include:
| |
| | |
| * '''Accuracy''': The proportion of correct predictions.
| |
| * '''Precision''': The ratio of true positive predictions to all positive predictions.
| |
| * '''Recall''': The ratio of true positive predictions to all actual positives.
| |
| * '''F1 Score''': The harmonic mean of precision and recall, useful when dealing with imbalanced data.
| |
| * '''AUC-ROC Curve''': Measures the model’s ability to distinguish between classes, where a higher Area Under the Curve (AUC) indicates better performance.
| |
| | |
| == Assumptions of Logistic Regression ==
| |
| | |
| Logistic Regression relies on several assumptions for accurate results:
| |
| | |
| 1. '''Linearity of Independent Variables and Log-Odds''': Assumes a linear relationship between the log-odds of the outcome and the independent variables.
| |
| 2. '''Independence of Observations''': Observations should be independent of each other to avoid biased results.
| |
| 3. '''No Multicollinearity''': Independent variables should not be highly correlated with each other, which can be checked using Variance Inflation Factor (VIF).
| |
| 4. '''Sufficient Sample Size''': Logistic Regression requires a large enough sample size, especially for categorical variables, to make accurate predictions.
| |
| | |
| == Handling Limitations ==
| |
| | |
| Logistic Regression may not perform well if the relationship between variables is highly non-linear. In such cases, transformations, polynomial features, or using a more complex model like Decision Trees or Neural Networks can be considered.
| |
| | |
| == See Also ==
| |
| * [[Linear Regression]]
| |
| * [[Support Vector Machine]]
| |
| * [[K-Nearest Neighbor]]
| |
| * [[Decision Tree]]
| |
| * [[Naive Bayes]]
| |