Machine Learning Terminologies.

Machine Learning Terminologies and Processes.






Introduction:    

Hi and welcome all to my blog in which we are going to discuss about machine learning terminology and processes. We'll see end to end machine learning modelling process, The process begins with business problems and then lead to machine learning problem we'll discuss how that happens and how data goes through preprocessing and modelling process followed by predictioning the output.

    Before we proceed with machine learning process lets have a look at various commonly used machine learning terminology. coming to the basic terminology i.e. Training, Model and Prediction. Training is the process where we train our model based on the historical data the 'Model' then analyzes various patterns in the data and be self sufficient to make future predictions on the unseen data. 

Typically to train the model we split our data into 2 parts which is Training set and other is Testing set. The training dataset is shown to the model and the model understand the relation between the input variable and the output and then the model test his learning over the test data and we can calculate the accuracy of our model how the model is performing i.e. compare the predicted value with the observed value. Closer the model output to the real time observation higher the accuracy of the model.


Actual Working of End to End Machine Learning Process:

    The process starts with business problems and ends with predictions ever wonder what are other processes included in between these both. They are as follows,

1. Machine Learning Problem Framing:

    This is the process where we actually need to work on selecting the correct machine learning algorithm based on the data and the required output, whether the problem is to classify the data or to form clusters among the data. These needs to be selected from three main types of machine learning algorithm, like

i) Supervised Machine Learning Algorithm (Output is already known): Classification and Regression Problems

ii) Unsupervised Machine Learning Algorithm (output is not known already): Clustering, Association and Dimensionality Reduction Problems

iii) Reinforcement Learning (Algorithm choices its decision while learning.)

2. Dataset Collection/ Data integration:

    In this process the dataset is collected to provide to our algorithm or if there are various dataset we need to integrate them into single dataset so it becomes more convenient to process and create our model.

3. Data Preparation:

    In this process we have the data but the data isn't prepared enough to be used for training the model. That means the data contains null values, outliers, and is dirty i.e. not well structured or is improperly collected that will affect our predictions and cause difficulties in training our model. So our data needs to be gone through some process to remove outliers, misleading data or to structure the data in well format or to remove the Null values, etc.

Sometimes the algorithm gives the biased output because we provide the data in the order which is present so we may need to shuffle the data in order to show all the data of all variety and periods in order to get unbiased output.

Also we need to specify the amount of the data that we are going to give to our algorithm to train sort of data to test, its a good practice to give large amount of data for training purposes and small portion to the testing so our model is well versed with the data.

4. Data Visualization and Analysis:

    It is the good practice to visualize the data in order to understand the pattern in the data and we can analyze visually and perform necessary actions before modelling.
eg. Scatter plot, Histogram, etc.

5. Feature Engineering:

    It is the process of manipulating the raw or original data to new data of useful features, which is the most critical and time consuming because it works on trial and error process of machine learning. It is the important process in order to get the answer to our questions like which feature will I use in order to get the desired output.

6. Model Training:

    In this process we need to train our model, we usually train that multiple times based on various variables called parameters, which is also called as parameter tuning this helps in improving the accuracy of our model.

Parameter Tuning includes:

i) Loss Function.

ii) Regularization.

iii) Learning Parameter

7. Model Evaluation:

The model evaluation is one of the important step in order to check the accuracy of the model we trained, it helps us how our model is performing on the test data so we can easily provide the unseen data to our model and the model can predict the output clearly or accurately. We should not train our model with higher accuracy but train the model with higher consistency rate.

Eg. Confusion matrix, RMSE, etc.

8. Assuming Business Evaluation:

This is the final stage where we need to hand over the model to the business in order to predict the business goals, before in this step we need to evaluate the model on various unseen business problems and if the model performs well we can proceed else we need to go back to previous steps to extract the correct features or augment the data properly, clean the data or integrate the data properly. 

End:

We can apply the model to the organizational data and ensure high sales and detect future anomalies if any.

Thank you!




Comments