Image in modal.

There’s been a big increase in artificial intelligence (AI) within digital health technologies. The cross between medical technology and AI requires that products be evaluated in accordance with domestic and international regulations. These technologies include interacting hybrids of software and hardware, stand-alone software, and software as a medical device interfacing with medical administrative systems.

The FDA’s Digital Health Department recommends using digital health criteria and traditional software guidance to carry out specific design approaches as needed per the technology. Both the FDA and the Medicine and Healthcare products Regulation Agency (MHRA) have expressed the need for machine-learning transparency and further explanation of validating an algorithm that is continually learning, changing, and improving itself.

With an evolving regulatory environment and an ever-expanding digital health market, there’s a need to find new approaches to address compliance around AI software.1 This wouldn’t exclude the need to ensure compliance to computer software regulations such as 21 CFR Part 11 and Annex 11 for audit trails, electronic records, and signatures.

The European community has taken the biggest stance in data protection reform. New requirements have emerged for AI under the General Data Protection Regulation (GDPR) in the European Union, as well as an alignment with privacy and security regulations that followed in the United States. In 2016, the EU rolled out the GDPR, which directly affected AI software. Companies were given two years to comply with the new requirements for data protection.

GDPR mandates transparency in our machine-learning algorithms, along with enough insight into the logic and significance of a machine-learning model for the data subject to opt out of automated decision-making. However, the combination of inadequate transparency and increased technological complexity allows opportunities for error to emerge. This makes it difficult to explain the interrelationships in how these medical technologies operate, which in turn leads to compliance issues with regulated industries.2

Because GDPR requires an explanation of the logic behind machine-learning algorithms, the approach suggested in this article is a method for achieving transparency.

Technology overview: AI, machine learning, and deep learning3,4

Artificial intelligence

AI is the development of computer systems that think like humans and can perform tasks very similar to the way humans do. Essentially, AI software is any computer program that is “smart.” This type of software possesses the ability to solve certain problems more quickly and efficiently than a human, although it requires humans to program it. AI software can’t replace human intelligence but can augment it.

AI software is driven by algorithms programmed to achieve a specific purpose. Based on its inputs, an algorithm will produce an outcome that meets the user’s needs. Due to the increase in data that businesses generate, there’s a growing need to find meaning in those data so a business can profit from them and better meet customers’ needs. When programmed right, AI tools help achieve this goal.

AI algorithms are trained using various methods. The four common learning methods found in the industry are supervised, unsupervised, reinforcement, and semisupervised.

Machine learning

Machine learning is a subset of AI where computers can learn from data and make decisions without being explicitly programmed. Depending on the business problem and the desired outcome needed, the following methods can be used.

Supervised learning

In supervised learning, the machine is taught by being given examples of desired (or undesired) outcomes. The algorithm trains on labeled historic data, or “ground truth,” and learns general rules that map input to output or target. Supervised learning algorithms employ one or more of the following statistics to achieve the desired outcome: Bayesian statistics, decision trees, forecasting, random forests, and neural networks.

Unsupervised learning

In unsupervised learning, the machine studies data to identify patterns using unlabeled data. The machine determines correlations and relationships by parsing the available unlabeled data. The most common techniques would be clustering, k-means analysis, and nearest neighbor.

Reinforcement learning

In reinforcement learning, the machine is provided a set of allowed actions, rules, and potential end states. The algorithm learns through a feedback system. The algorithm takes actions and receives feedback about the appropriateness of its actions and, based on the feedback, modifies the strategy and takes further actions that would maximize the expected reward over a given amount of time. Examples are Markov decision process (MDP), learning automata, and artificial neural networks (ANN).

Semisupervised

In semisupervised learning, the machine is given a small amount of labeled data along with a larger amount of unlabeled data. In this case, the provided inputs and outputs establish the general pattern the machine can extrapolate and apply to the remaining data. Using a semisupervised model is less expensive and time-consuming because fewer data need to be labeled. Recent research has shown that using data-labeling methodologies in AI can reduce the time needed for this task.4

Deep learning and natural language processing

Deep learning is part of a broader family of machine-learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semisupervised, or unsupervised. Deep learning is used where there are larger sets of data. Deep learning’s growing popularity is due to the dramatic increases in computer processing capabilities, the availability of massive amounts of data for training computer systems, and advances in machine-learning algorithms and research.

Deep-learning capabilities are designed to continually analyze data with a logic structure similar to how humans draw conclusions. It uses a layer structure, or algorithms, called “artificial neural networks.” Unlike machine learning, deep learning has more than one hidden layer to accomplish a learning task. These hidden layers perform the feature extraction, whereas in machine learning, a human performs the extraction. The models generated by deep learning are more complex and can ingest larger amounts of data than its predecessors, which makes this method a desirable option.

Infrastructure hardware for deep learning

Training a neural network is an intensive task and often the hardest phase of deep learning. A graphics processing unit (GPU) is often the best solution for training a neural network (NN) and processing big data because a GPU can handle intensive computations and decrease computation time.

A GPU is a more specialized processing unit compared to a central processing unit (CPU). Although CPUs can execute a few complex computations efficiently, they max out with big data and more complex NNs. CPUs and GPUs are, in fact, opposites. GPUs are great at handling many sets of simple instructions, whereas CPUs can handle a small set of very complex instructions. Therefore, depending on the type of task needed, either a CPU or GPU can be used.

Qualification approach

Complex AI software has features that are harder to check and therefore require a verification and validation (V&V) approach different from traditional software.

As with conventional V&V methods, we want to ensure that the correct offering is being built correctly and that it will meet its intended use. But because AI software differs from traditional software in many ways—for example, it can learn new behaviors, act autonomously, or even adjust its performance—businesses also need a process for adjusting the V&V approach to mitigate potential risk and cost overruns, and to stay competitive in a digital market.

Additionally, there’s a need for continual maintenance and defect resolution for AI software. This prevents drift from its intended use, ensures that models comply with the established truth to which they are trained, and enables robustness. The goal of this new V&V method, which is specifically designed for AI software products, is to build trust with customers and provide accountability in the models.

Verification and validation: A new approach for AI

With verification activities, it’s imperative that the project team includes developers and V&V engineers with experience in testing neural networks. Furthermore, it’s industry practice and regulatory expectation that the experts developing, testing, and reviewing these models be trained in the product requirements and theory of operation.

In the healthcare field, AI software must be built right (verification) and meet its intended use (validation), just like traditional software. Traditional steps are still necessary, such as documenting your design, testing, verifying requirements, and resolving test failures through proper test setups and methodologies. This would include performance challenges to ensure the proper functioning of the algorithm to its intended use. These could be achieved through proper feasibility and robustness checks. Various agencies recommend using FMEAs for failure testing. FMEAs are critical in developing medical diagnostic software as well as software that provides recommendations to healthcare professionals.

Shift in software development life cycle for neural networks

An adjusted software development life cycle (SDLC) to control the various stages of V&V can be followed for neural networks (NN). To incorporate NN into the SDLC, certain adjustments within the phases of the V-model can be made.

Within the life sciences industry, V&V means the following:

Software verification (left branch in the figure below) provides objective evidence that the design outputs of a particular phase of the software development life cycle meet all of the specified requirements for that phase.

Software validation (right branch in the figure below) is confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled.

Figure 1 below outlines an overview of the cycle with a description of phase adjustments.

Figure 1: A software development life cycle (SDLC) for neural networks.
Figure 1: A software development life cycle (SDLC) for neural networks. This diagram shows the relationship between verification (left branch) and validation (right branch), and the life cycle phases described in the standards.5 Source: Maria DiBari and Alonso Diaz

Systems requirements must be enhanced to include NN specification

System Architectural Design must contain NN architecture including integration with other Systems, interoperability

Software Requirements Analysis must include NN software requirements including type of NN, learning algorithm, a description of the inputs and outputs, acceptable errors and training set(s) for pre-trained NN

Software Architectural Design should contain NN software architecture design including type of NN (feedforward, Self-Organizing Map, etc.) and the learning algorithm (Least Means Squared (LMS), Levenberg-Marquardt, Newton's method etc.)

Software Detailed Design must include a description of precise code constructs required to implement the NN

Software Coding must contain NN code

System Qualification Testing should verify that the system requirements are sufficient enough to ensure that, when implemented, the NN will interface properly with the system in production

System Integration Testing should verify that the architectural design is detailed enough so, when implemented, the NN can interface with system hardware and software in various fidelity testbeds

Software Qualification Testing should ensure that the requirements are sufficiently detailed to adequately and accurately describe the NN

Software Integration should verify that the NN interfaces with other software including proper inputs and outputs for the NN

Software Unit Testing must include both black and white box testing for modularized NN code

Software Coding

Development to proper coding standards

Figure 2: Definition of possible deliverables shown in figure 1 that complement the SDLC for a neural network.


The purpose of a comprehensive guideline is to prevent some common machine-learning issues, such as overfitting and underfitting. Overfitting occurs when a model has high variance and high noise-to-signal ratio. Underfitting occurs when variance is low, bias is high, and the model is too simplistic.

There are various ways to prevent overfitting. Cross-validation is one. Other methods include introducing more data, eliminating more features, and regularizing by simplifying your model. In deep learning, stopping the training process at its optimal iteration point is used to prevent overfitting.

A pathway to explainability: repeatability and provenance

A V&V approach isn’t the only method necessary for qualifying a machine-learning model. A robust model-management program should also be in place, which ensures provenance.

Where V&V checks your immediate qualifying scenario, the model management process ensures that models continue to function adequately and robustly. The model-management process is a maintenance activity that ensures the models continually uptake data, and that the desired outcomes aren’t changed from their intended use. This ensures that there are tracked versions of models that can give the desired feature outcome.

Key factors in model management include:

  • Training data accuracy
  • Code
  • Environment storage and tracking
  • File types used
  • Configurations of code and tests
  • Measurements of performance
  • Sources of client data
  • Basic knowledge of technology being used

Not managing or tracking these key attributes can result in lost work, especially during iterating. Having a well-documented model-management and model-training framework will help establish best practices.

Machine-learning guidelines must have enough detail to allow a competent analyst to reproduce the necessary conditions and obtain results within the proposed acceptance criteria. Somebody else should be able to step in the next day, train all the models they have created so far, and get the same results. Processes must define exactly which parameters you need to train one model, in a way that can be shared without revealing any trade secrets. This is critical when dealing with GDPR and privacy and security data. It’s also an important aspect of machine-learning algorithms that require special attention or scrutiny by a regulatory agency.

Repeatability reduces variation in iterative refinement of models. This becomes critical when troubleshooting weights and biases across NNs because you’re able to pinpoint parameters that need to be adjusted. A well-understood training phase along with data provenance lead to a repeatable outcome.

Explainability is the idea that a model and its output can be clearly explained to others. There are many pathways to explainability of machine-learning algorithms. A common approach is to focus on repeatability and provenance. In this case, explainability requires tracking many different components when machine-learning algorithms or deep-learning NNs are being developed.

Under new regulations such as GDPR, it’s important to understand the data journey, from preprocessing data and desired features, through training a NN, and finally deploying. Additionally, proper model management is necessary to help us gain the provenance, repeatability, and a level of explainability we couldn’t previously achieve.

Working toward explainability is essential and has many benefits. Some of the benefits of describing how a model arrives at a prediction include:

  • Ensuring that there’s no bias by accounting for our data
  • Accounting for how a model output is translated into action
  • Accounting for how we govern outcomes generated by our models over time
  • Increased sense of confidence after a model is deployed
  • Increased trust in our models by our customers

Conclusion

Transparency is challenging when developing machine-learning algorithms. It’s often referred to as the “black box” problem. Explaining the inner workings of these algorithms is often impossible, and sometimes companies consider the algorithms trade secrets. But, without the transparency that is now mandated, it will be hard for these innovations to be widely accepted.

Creating a framework around provenance and repeatability of machine-learning models is one of the ways to open the “black box,” qualify machine-learning algorithms, and comply with regulations while creating a pathway to explainability.

References

1. Federal Laboratory Consortium for Technology Transfer. “Artificial Intelligence and Machine Learning in Medical Devices.” FLC News, October 22, 2020.

2. Bhandari, Avantika. “The Impact of GDPR on Artificial Intelligence.” Montreal AI Ethics Institute, February 20, 2022.

3. Glowaki, Jonathan; Reichhoff, Martin. “Effective Model Validation Using Machine Learning.” Millman White Paper, May 2017.

4. Nevala, Kimberly. “The Machine Learning Primer.” SAS Best Practices, 2017.

5. Mackall, Dale; Nelson, Stacy; Schumman, Johann. “Verification & Validation of Neural Networks for Aerospace Systems.” Dryden Flight Research Center/NASA Ames Research Center, June 12, 2002.