Friday, February 28, 2020

10 Stages Of A Machine Learning Project In 2020 (And Where You Fit)

Introduction
The stages and workflows that are involved in Machine Learning projects are evolving as the field and technology itself develops.
The emergence of GPU enabled mobile devices has introduced a new stage within the traditional ML project workflow. Emergence of new stages has also created new roles and job titles.
This article aims to the following:
1. Explain in detail each of the identified stages involved in a Machine Learning project.
2. Mention roles that are involved in each of the stages. I have also included LinkedIn links to Job searches for each role within the United States.
3. Inform you of end results delivered after each stage (these are referred to as deliverables).
Let’s begin.
1. Problem Definition

Photo by Austin Distel on Unsplash
Problem definition is the initial stage of a Computer Vision/ML project, and it focuses on gaining an understanding of the problem poised to be solved by applying ML.
It usually involves a problem descriptor that records, in a selected form, a scenario-based description of first-hand experience of an encounter of the problem to be solved.
This stage also captures what an ideal solution to a problem will be from the problem descriptor’s perspective.
A problem descriptor can be clients, customers, users or colleagues.
The deliverable for this stage is a document (word or pdf) with the following included (but not limited to):
1. Problem Statement
2. Ideal Problem Solution
3. Understanding and insight into the problem
4. Technical requirements
Associated roles: IT Business Analyst
2. Research

Photo by William Iven on Unsplash
This stage sets the foundation for later stages, along with the planning of implementation and development work carried out within subsequent stages.
An exploration into the form a solution will take is conducted, along with information into the data structures, formats, and sources.
A combination of an understanding of the problem, unified with proposed solutions, and available data, will enable a suitable ML model selection process to achieve the ideal solution result.
At this stage, it is helpful to research the hardware and software requirements for the algorithms and model implementation; this saves a lot of time in later stages.
The deliverable for this stage is a document (word or pdf) with research into the following included:
1. Data Structure and Source
2. Solution form
3. Neural Network / Model Architecture
4. Algorithm Research
5. Hardware Requirements
6. Software Requirements
Associated roles: Machine Learning Researcher, Data Scientist, AI Researcher
3. Data Aggregation / Mining / Scraping

Photo by Sai Kiran Anagani on Unsplash
Data is the fuel for an ML/CV application. Data aggregation is a crucial step that sets a precedent for the effectiveness and performance of the trained model.
The output of the agreed-upon solution defines the data aggregated.
Data understanding is paramount and any sourced data should be examined and analyzed utilizing visualization tools or statistical methods.
Data examination promoted data integrity and credibility by ensuring the data sourced is the expected data.
Data analysis and exploration carried on the data also ensure the following requirements are met:
The data gathered needs to be diverse enough to ensure that the model predictions capabilities accommodate a variety of possible scenarios.
The data gathered needs to aspire to be unbiased to ensure that the model can generalize appropriately during inference.
The data gathered needs to be abundant.
Tools for collecting data will vary. Data sources could come in the form of APIs, XML feeds, CSV, or Excel files. In some scenarios, data mining/scraping from online sources is required. Ensure to check on third party websites scraping/mining policies before conducting scrapes.
The deliverable for this stage is a folder with raw source data along with annotation files within each subfolder.
Associated roles: Data Scientist, Data Analyst
4. Data Preparation / Preprocessing / Augmentation

Photo by Mika Baumeister on Unsplash
Preprocessing steps for data are based mainly on the model input requirements. Refer back to the research stage and recall input parameters and requirements that the selected model / neural network architecture requires.
The preprocessing step transforms the raw sourced data into a format that enables successful model training.
Data preprocessing could include the identified steps below, but not limited to the mentioned steps:
Data Reformatting (resizing images, modification to color channels, noise reduction, image enhancement)
Data Cleaning
Data Normalisation
Data augmentation is a step that is carried out to improve the diversification of data that has been sourced. Augmentation of image data could take the following forms:
Rotation of an image by any arbitrary degrees
Scaling of an image either to create zoomed in/out effects
Cropping of an image
Flipping (horizontal or vertical) of an image
Mean Subtraction
The deliverable for this stage is a folder with subfolders labeled train, test, and validation along with annotation files within each subfolder.
Associated roles: Data Scientist
5. Model Implementation

Photo by Kevin Ku on Unsplash
Typically, model implementation is simplified by leveraging exiting models that are available from a variety of online sources. Most ML/DL framework such as PyTorch or TensorFlow, have pre-trained models that are leveraged to speed up the model implementation stage.
These pre-trained models have been trained on robust datasets and mimic the state of the art neural network architectures’ performance and structure.
You rarely have to implement a model from scratch. The following might be expected to be conducted during the model implementation stage:
Removal of last layers within a neural network to repurpose models for specific tasks. For example, removing the last layer of a Resnet neural network architecture enables the utilization of a descriptor provided by the model within an encoder-decoder neural network architecture
Fine-tuning pre-trained models
The deliverable for this stage is a model that is ready to be trained.
Associated roles: Data Scientist, Machine Learning Engineer, Computer Vision Engineer, NLP Engineer, AI Engineer
6. Training

TensorBoard UI from https://github.com/tensorflow/tensorboard
The training data delivered from the previous Data stages are utilized within the training stage. The implementation of model training involves passing the refined aggregated training data through the implemented model to create a model that can perform its dedicated task well.
The training of the implemented model involves iteratively passing mini-batches of the training data through the model for a specified amount of epochs. During the early stages of training, model performance and accuracy can be very unimpressive. Still, as the model conducts predictions and a comparison of predicted values is made to the desired/target value, backpropagation takes place within the neural networks, the model begins to improve and gets better at the task it’s designed and implemented to do.
Just before training can commence, we have to set hyperparameters and network parameters that will steer the effectiveness of our training stage on the model.
Hyperparameters: These are values that are defined before the training of the network begins; they are initialized to help steer the network to a positive training outcome. Their effect is on the machine / deep learning algorithm, but they are not affected by the algorithm. Their values do not change during training. Examples of hyperparameters are regularization values, learning rates, number of layers, etc.
Network parameter: These are components of our network that are not manually initialized. They are embedded network values that are manipulated by the network directly. An example of a network parameter is the weights internal to the network.
When conducting training, it is vital to ensure that metrics are recorded of each training process and at each epoch. The metrics that are generally collected are the following:
Training accuracy
Validation accuracy
Training Loss
Validation Loss
To collate and visualize training metrics, tools such as visualization tools Matplotlib and Tensorboard can be utilized.
By visualizing the training metrics, it is possible to identify some common ML model training pitfalls, such as underfitting and overfitting.
Underfitting: This occurs when a machine learning algorithm fails to learn the patterns in a dataset. Underfitting can be fixed by using a better algorithm or model that is more suited for the task. Underfitting can also be adjusted fixed by recognizing more features within the data and presenting it to the algorithm.
Overfitting: This problem involves the algorithm predicting new instances of patterns presented to it, based too closely on instances of patterns it observed during training. This can cause the machine-learning algorithm to not generalize accurately to unseen data. Overfitting can occur if the training data does not accurately represent the distribution of test data. Overfitting can be fixed by reducing the number of features in the training data and reducing the complexity of the network through various techniques.
The deliverable for this stage is a developed model and training metrics
Associated roles: Data Scientist, Machine Learning Engineer, Computer Vision Engineer, NLP Engineer, AI Engineer
7. Evaluation

Confusion matrix image from https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
At this stage, you should have a trained model and are ready to conduct evaluation techniques on its performance.
For evaluation, we utilize a partition of the refined data, usually referred to as the ‘test data’. The test data have not been seen during the model during training. They are also representative of examples of data that are expected to be encountered in practical scenarios.
Some examples of evaluation strategies that can be leveraged are as follows:
Confusion matrix (error matrix): Provides a visual illustration of the number of matches or mismatches the annotation of the ground truth to the classifier results. A confusion matrix is typically structured in tabular form, where the rows are filled with the observational results from the ground-truth, and the columns are filled with inference results from the classifier.
Precision-Recall: These are performance metrics that are used to evaluate classification algorithms, visual search systems, and more. Using the example of evaluating a visual search system(find similar images based on a query image), precision captures the number of results returned that are relevant, while recall captures the number of relevant results in your dataset that are returned.
The deliverable for this stage is a document containing the evaluation results, and evaluation strategies outputs are also included.
Associated roles: Data Scientist, Machine Learning Engineer, Computer Vision Engineer, NLP Engineer, AI Engineer
8. Parameter tuning and Inference

Photo by Bill Oxford on Unsplash
Parameter tuning is the process of model refinement that is conducted by making modifications to hyperparameter values. The purpose of parameter tuning is to increase the model performance, and this correlates to improvements in evaluation results.
Once hyperparameters are tuned and new values are selected, training and evaluation commence again.
The process of parameter tuning is carried out until a suitable enough model is generated.
Inference is a real-world test of our model. It involves utilizing real-world data that have been sourced from applicable environments. At this stage, we should be confident in our model performance.
The deliverable for this stage is a refined model
Associated roles: Data Scientist, Machine Learning Engineer, Computer Vision Engineer, NLP Engineer, AI Engineer
9. Model Conversion to appropriate mobile format

Photo by Patrick Michalicka on Unsplash
Once we have our refined model, we are ready to place it on devices where it can be utilized.
Model conversion is a step that is required when developing models that are to be used within edge devices such as mobile phones or IoT devices.
Model conversion involves ML models trained in a GPU/CPU environment and converting them into an optimized and efficient version. The streamlined model is small enough to be stored on devices and sufficiently accurate to conduct suitable inference.
Examples of tools that enable model conversion to the mobile-optimized model are:
Core ML: This is a framework released by Apple to create iOS only dedicated models. CoreML provides some models for common machine learning tasks such as recognition and detection. It’s an iOS-only alternative to TensorFlow Lite.
PyTorch Mobile: PyTorch is a popular machine learning framework and is used extensively in machine learning-related research. PyTorch mobile can be compared to TensorFlow Lite, as it enables the conversion of PyTorch trained model to a mobile-optimized version that can be leveraged on iOS and Android devices. Although, PyTorch Mobile is still in its infancy and currently in experimental release status.
TensorFlow Lite: takes existing TensorFlow models and converts them into an optimized and efficient version in the form of a .tflite file. The streamlined model is small enough to be stored on devices and sufficiently accurate to conduct suitable inference.
The deliverable for this stage is an ML model that has been optimized for on-device usage.
Associated roles: Data Scientist, Machine Learning Engineer, Computer Vision Engineer, NLP Engineer, AI Engineer
10. Model Deployment

Photo by SpaceX on Unsplash
Deploying our final trained model is the last step within all the identified stages. Integrating our model within a broader ecosystem of application or tool, or simply building an interactive web interface around our model, is an essential step of model deployment.
There is also a monitoring responsibility that should be undertaken to assess the performance of the model while in a production environment. This is to ensure that the model is performing sufficiently well, and it still fit for purpose.
Model retraining and updating is also a process within the model deployment stage. Model updating ensures the credibility and reliability of our model for the desired task.
The deliverables for this stage could be the following:
1. Model performance monitoring system
2. Web UI Interface to access model functionalities
3. Continuous integration pipelines that enable model redeployment
Associated roles: Associated roles: Data Engineer, Machine Learning Engineer, Computer Vision Engineer, NLP Engineer, AI Engineer