Supervised Learning : Regression 

Syllabus: Bias, Variance, Generalization, Underfitting, Overfitting, Linear regression, Regression: regression, Ridge regression, Gradient descent algorithm. Evaluation Metrics: MAE, RMSE,R2.

Bias :

Bias is a systematic error in a machine learning model. It is the difference between the average prediction of the model and the true value. Bias can be introduced into a model in a variety of ways, such as:

  • Underfitting: This occurs when the model is too simple and cannot capture the complexity of the data.
  • Overfitting: This occurs when the model is too complex and learns the training data too well, but is unable to generalize to new data.
  • Selection bias: This occurs when the training data is not representative of the real-world data that the model will be used on.
  • Confirmation bias: This occurs when the model is designed to confirm existing beliefs, rather than learn from the data.

à High bias and low bias are two different types of bias.

High bias: A model with high bias is unable to capture the underlying patterns in the data. It is too simple and makes too many assumptions. As a result, the model will make inaccurate predictions, even on the training data.

Here are some examples of High bias models:

·        Linear regression

·        Logistic regression

·        Naive Bayes

Low bias: A model with low bias is able to capture the underlying patterns in the data. It is complex enough to learn the training data well. However, if the model is too complex, it may overfit the training data and be unable to generalize to new data.

Here are some examples of Low bias models:

·        Decision trees

·        Support vector machines

·        Random forests

Ways to reduce high bias :

There are a number of ways to reduce high bias in machine learning models. Some of the most common methods include:

  • Use a more complex model: A more complex model will be able to learn more complex patterns in the data, which can help to reduce bias.
  • Increase the number of features: Adding more features to the model can also help to reduce bias, as it gives the model more information to learn from. 
  • Use a larger training dataset: A larger training dataset will give the model more examples to learn from, which can help to reduce bias. 
  • Use regularization techniques: Regularization techniques can help to prevent overfitting by penalizing the model for learning too complex of a relationship between the features and the target variable. 
Use ensemble methods: Ensemble methods combine the predictions of multiple models to produce a more accurate prediction. This can help to reduce bias by averaging out the biases of the individual models.


Variance :

Variance in machine learning is how much a model's predictions change when trained on different subsets of the training data. It's a measure of how much the model relies on the training data to make predictions.

A model with high variance learns the training data too well, including the noise in the data. This means that the model won't be able to generalize well to new data that it hasn't seen before.

A model with low variance doesn't learn the training data as well, but it's less likely to overfit. This means that the model will be able to generalize better to new data.

To reduce variance, you can use a simpler model or more training data.



Underfitting and Overfitting:

Underfitting and overfitting are two common problems that can occur in machine learning. Both problems happen when a model is not able to generalize well to new data, but they are caused by different things.

Underfitting occurs when a model is too simple and cannot learn the underlying patterns in the training data. This can be caused by using a model with too few parameters, not enough training data, or features that are not representative of the underlying problem. An underfitted model will perform poorly on both the training and test data.

Overfitting occurs when a model learns the training data too well, including the noise in the data. This can be caused by using a model with too many parameters, too much training data, or features that are not relevant to the underlying problem. An overfitted model will perform well on the training data but poorly on the test data.


How to avoid underfitting:

  • Use a model that is appropriate for the complexity of the data. A more complex model can learn more complex patterns in the data, but it is also more likely to overfit.
  • Use a dataset that is large enough and representative of the real-world data that the model will be used on. A small or unrepresentative dataset can lead to underfitting.
  • Use regularization techniques to prevent the model from overfitting. Regularization techniques add a penalty to the model for being too complex.

How to avoid overfitting:

  • Use a simpler model. A simpler model is less likely to overfit, but it may be less accurate.
  • Use less training data. A smaller training dataset can help to prevent the model from learning the noise in the data.
  • Use feature selection to remove irrelevant features. Irrelevant features can lead to overfitting.
  • Use regularization techniques. Regularization techniques add a penalty to the model for being too complex.

Examples of underfitting and overfitting:

  • Underfitting: A model for predicting house prices might not be able to take into account factors such as the size of the house, the location of the house, and the condition of the house. This could be because the model is too simple, or because the training data does not include all of these factors.
  • Overfitting: A model for predicting house prices might learn the noise in the training data, such as the names of the sellers or the dates on which the houses were sold. This could be because the model is too complex, or because the training data is too small.

Regression:

Regression is a statistical method that allows us to model the relationship between a dependent variable and one or more independent variables. The dependent variable is the variable that we are trying to predict, while the independent variables are the variables that we believe influence the dependent variable.

Regression models are typically fit to data using a least squares approach. This means that the model is chosen so that the sum of the squared errors between the actual values of the dependent variable and the predicted values of the dependent variable is minimized.

Once a regression model has been fit to the data, it can be used to make predictions about the dependent variable for new values of the independent variables. For example, if we have a regression model that predicts house prices based on square footage, we can use the model to predict the price of a new house if we know its square footage.

Regression models are used in a wide variety of fields, including finance, marketing, and manufacturing. For example, regression models can be used to:

  • Predict future sales based on historical sales data and other factors, such as advertising spending and economic conditions.
  • Identify the most important factors that influence customer satisfaction.
  • Determine the optimal manufacturing process to produce a product with a specific quality level.

Linier regression:

Linear regression is a statistical method that models the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the variables, and it is typically fit to data using a least squares approach. Once the model is fit, it can be used to make predictions about the dependent variable for new values of the independent variables.

Linear regression is a powerful tool that is used in a wide variety of fields, including finance, marketing, and manufacturing. It is a relatively simple method to understand and implement, and it can be very effective for making predictions.

Here is an example of a simple linear regression model:

y = ax + b

where:

  • y is the dependent variable
  • x is the independent variable
  • a is the slope of the regression line
  • b is the y-intercept of the regression line

The slope of the regression line tells us how much the dependent variable changes for a one-unit change in the independent variable. The y-intercept of the regression line tells us the value of the dependent variable when the independent variable is equal to zero.

Linear regression is a valuable tool for understanding and predicting the relationships between variables. It is used in a wide variety of fields and can be applied to a wide range of problems.


Logistics Regression :

Logistic regression is a statistical model that models the probability of a binary outcome, such as yes or no, based on prior observations of a data set. It is a supervised learning algorithm, which means that it learns from a set of labeled data, where the output variable is the binary variable that we are trying to predict.

Logistic regression models are trained using a maximum likelihood approach, which means that the model parameters are chosen to maximize the probability of the observed data. Once trained, logistic regression models can be used to predict the probability of the output variable for new values of the input variables. 

Here is an example of a simple logistic regression model:

P(z) = 1 / (1 + e^(-z))

The slope of the logistic regression curve tells us how much the probability of the output variable changes for a one-unit change in the independent variable. The y-intercept of the logistic regression curve tells us the probability of the output variable when the independent variable is equal to zero.

Here are some examples of how logistic regression is used in the real world:

  • Finance: Logistic regression models can be used to predict the probability of a loan defaulting, the probability of a customer churning, or the probability of a stock price going up or down.
  • Marketing: Logistic regression models can be used to predict the probability of a customer clicking on an ad, the probability of a customer making a purchase, or the probability of a customer responding to a marketing campaign.
  • Healthcare: Logistic regression models can be used to predict the probability of a patient having a disease, the probability of a patient responding to a treatment, or the probability of a patient being readmitted to the hospital.

Lasso Regression:

Lasso regression, short for "Least Absolute Shrinkage and Selection Operator" regression, is a linear regression technique used for variable selection and regularization.

 

Lasso regression is a technique used in statistics and machine learning to build models when you have lots of input features. It helps simplify the model by automatically selecting the most important features and making some of the less important ones completely irrelevant. This can lead to more accurate and understandable models. It does this by adding a penalty term that encourages the model to set some feature coefficients to zero. So, lasso regression is like a feature selector that helps you focus on what matters most for your prediction while ignoring the rest.


Here's what the terms in this equation represent:

  • ‘m’ is the number of training examples.
  • ‘n, is the number of features.
  • ‘hw(x^(i))’ is the predicted value for the ith training example using the linear regression model with weights w.
  • ‘y^(i)’ is the actual target value for the ith training example.
  • ‘Wj’ represents the weight or coefficient associated with the jth feature.
  • λ (lambda) is the regularization parameter, which controls the strength of regularization. A higher value of lambda results in stronger regularization.

Ridge Regression: 



coming soon.......



Gradient descent algorithm:

Gradient descent is an iterative optimization algorithm commonly used to train machine learning models. It works by iteratively adjusting the parameters of the model in the direction of the negative gradient of the cost function until the minimum of the cost function is reached.

The cost function is a measure of how well the model is predicting the data. It is typically calculated by comparing the model's predictions to the actual values in the training set.

The gradient of the cost function is a vector that points in the direction of the steepest ascent of the function. By moving in the opposite direction of the gradient, gradient descent is able to find the minimum of the cost function.


Here is a simplified overview of the gradient descent algorithm:

  1. Initialize the model parameters.
  2. Calculate the cost function and its gradient.
  3. Update the model parameters in the direction of the negative gradient.
  4. Repeat steps 2 and 3 until the cost function converges or a maximum number of iterations is reached.

Gradient descent is a powerful algorithm that can be used to train a wide variety of machine learning models, including linear regression, logistic regression, and neural networks.

Here are some of the advantages of using gradient descent in machine learning:

  • It is a simple and easy-to-understand algorithm.
  • It is very efficient and can be used to train large models.
  • It is very flexible and can be used to train a wide variety of machine learning models.

Here are some of the disadvantages of using gradient descent in machine learning:

  • It can only find local minima of the cost function.
  • It can be sensitive to the choice of learning rate.
  • It can converge slowly for some problems.

Evaluation Metrics: MAE, RMSE, R2

Evaluation metrics are essential tools for assessing the performance of machine learning models, particularly in regression tasks where you're predicting continuous values. Here are descriptions of three common evaluation metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared (R2).

Mean Absolute Error (MAE):

MAE measures the average absolute difference between the predicted values and the actual values.

where:

  • n is the number of samples
  • y^ is the predicted value
  • yi is the actual value

If you add up all the errors (the differences between predicted and actual values), take their absolute values (so negative errors become positive), and then find the average. MAE tells you how far off, on average, your predictions are from the real values.


Root Mean Square Error (RMSE):

RMSE is similar to MAE but emphasizes the squared differences between predicted and actual values, which gives higher weight to larger errors.

R-squared (R2):

R2 tells you how well your model fits the data. It's like a score between 0 and 1, where 1 means your model is perfect, and 0 means it's as good as just using the average of the actual values. R2 represents the proportion of the variation in the data that your model can explain. Higher R2 values indicate a better model fit.







⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️
Disclaimer:
  • This website is not affiliated with any university or educational institution.
  • The materials on this website are intended to be used as a supplement to your regular studies.
  • Please consult with your instructor if you have any questions about the course material.
  • Additionally, please note that the information on this website is not intended to be a substitute for professional advice. If you have any specific questions or concerns about your academic progress, please consult with your instructor or another qualified professional.
⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️⬆️