Explaining the Crosswalk Between Singapore’s AI Verify Testing Framework and The U.S. NIST AI Risk Management Framework
On October 13, 2023, Singapore’s Infocomm Media Development Authority (IMDA) and the U.S.’s National Institute of Standards and Technology (NIST) published a “Crosswalk” of IMDA’s AI Verify testing framework and NIST’s AI Risk Management Framework (AI RMF). Developed under the aegis of the Singapore–U.S. Partnership for Growth and Innovation, the Crosswalk is a mapping document that guides users on how adopting one framework can be used to meet the criteria of the other. Similar to other crosswalk initiatives that NIST has done with other leading AI frameworks (such as with the ISO/IEC FDIS 23894 and the proposed EU AI Act, OECD Recommendation on AI, Executive Order 13960 and the Blueprint for an AI Bill of Rights), this Crosswalk aims to harmonize “international AI governance frameworks to reduce industry’s cost to meet multiple requirements.”
The aim of this blog post is to provide further clarity on the Crosswalk and what it means for organizations developing and deploying AI systems. The blog post is structured into four parts.
- First, the blog post will briefly summarize AI Verify (a fuller summary on which can be found in FPF’s blog post here).
- Second, the blog post will summarize the NIST AI RMF, what it aims to achieve, and how it works.
- Third, the blog post will explain the “Crosswalk”, and how users can expect to use the document.
- Fourth, the blog post will conclude with observations on what the Crosswalk means for international cooperation in the area of AI governance.
AI Verify – Singapore’s AI governance testing framework and toolkit
AI Verify is an AI governance testing framework and toolkit launched by the IMDA and the Personal Data Protection Commission of Singapore (PDPC). First announced in May 2022, AI Verify enables organizations to conduct a voluntary self-assessment of their AI systems through a combination of technical tests and process-based checks. In turn, this allows companies who use AI Verify to objectively and verifiably demonstrate to stakeholders their responsible and trustworthy deployment of AI systems.
At the outset, there are several key characteristics of AI Verify that users should be mindful of.
- First, while AI Verify is presently in a “Minimum Viable Product” phase, it is designed to provide one-stop AI testing and will therefore evolve going forward.
- Second, rather than attempting to define ethical or governance standards or thresholds (and providing AI systems with a “pass” or “fail” mark), AI Verify merely provides an objective measure to verify the performance of an AI system. By open-sourcing AI Verify, the IMDA and the PDPC hope that the global AI testing community can gravitate towards workable standards in various industries and use cases.
- Third, to promote user trust, AI Verify permits organizations to conduct self-testing on premises, and also allows for third-party testing. The AI Verify Foundation, which oversees the development and adoption of AI Verify, also welcomes users to contribute codes and plug-ins to enhance the toolkit.
- Fourth, and crucially, AI Verify has presently been developed to support supervised learning AI models (e.g. binary classification, multiclass classification, regression models). It has not been designed for generative AI systems, though this is something that the AI Verify Foundation is working on separately through initiatives such as its Generative AI Evaluation Sandbox and the LLM Evaluation Catalogue.
AI Verify comprises two parts: (1) a Testing Framework, which references 11 internationally-accepted AI ethics and governance principles grouped into 5 pillars; and (2) a Toolkit that organizations can use to execute technical tests and to record process checks from the Testing Framework. The 5 pillars and 11 principles under the Testing Framework are:
- Transparency on the use of AI and AI systems
- Principle 1 – Transparency: Providing appropriate information to individuals impacted by AI systems
- Understanding how an AI model reaches a decision
- Principle 2 – Explainability: Understanding and interpreting the decisions and output of an AI system
- Principle 3 – Repeatability/reproducibility: Ensuring consistency in AI output by being able to replicate an AI system, either internally or through a third party
- Ensuring safety and resilience of the AI system
- Principle 4 – Safety: Ensuring safety by conducting impact/risk assessments, and ensuring that known risks have been identified / mitigated
- Principle 5 – Security: Ensuring the cyber-security of AI systems
- Principle 6 – Robustness: Ensuring that the AI system can still function despite unexpected input
- Ensuring Fairness
- Principle 7 – Fairness: Avoiding unintended bias, ensuring that the AI system makes the same decision even if a certain attribute is changed, and ensuring that the data used to train the model is representative
- Principle 8 – Data governance: Ensuring the source and quality of data by adopting good data governance practices when training AI models
- Ensuring proper (human) management and oversight of the AI system
- Principle 9 – Accountability: Ensuring proper management oversight during AI system development
- Principle 10 – Human agency and oversight: Ensuring that the AI system is designed in a way that will not diminish the ability of humans to make decisions
- Principle 11 – Inclusive growth, societal and environmental well-being: Ensuring beneficial outcomes for people and the planet.
As mentioned earlier, FPF’s previous blog post on AI Verify provides more detail on the objectives and mechanics of AI Verify’s Testing Framework and Toolkit. This summary merely sets the context for readers to better appreciate how the Crosswalk document should be understood.
AI Risk Management Framework – U.S. NIST’s industry-agnostic voluntary guidance on managing AI risks
The AI RMF was issued by NIST in January 2023. Currently in its first version, the goal of the AI RMF is “to offer a resource to organizations designing, developing, deploying or using AI systems to help manage the many risks of AI and promote trustworthy and responsible development and use of AI systems.”
The AI RMF underscores the perspective that responsible AI risk management tools can assist organizations in cultivating public trust in AI technologies. Intended to be sector-agnostic, the AI RMF is voluntary, flexible, structured (in that it provides taxonomies of risks), measurable and “rights-focused”. The AI RMF outlines mechanisms and processes for measuring and managing AI systems and provides guidance on measuring accuracy.
The AI RMF itself is broken into two parts. The first part outlines various risks presented by AI. The second part provides a framework for considering and managing those risks, with a particular focus on stakeholders involved in the testing, evaluation, verification and validation processes throughout the lifecycle of an AI system.
The AI RMF outlines several AI-related risks
The AI RMF outlines the following risks presented by AI: (1) Harm to people – e.g. harm to an individual’s civil liberties, rights, physical or psychological safety or economic opportunity; (2) Harm to organizations – e.g. harm to an organization’s reputation and business operations; and (3) Harm to an ecosystem – e.g. harm to the global financial system or supply chain. It also notes that AI risk management presents unique challenges for organizations, including system transparency, lack of uniform methods or benchmarks, varying levels of risk tolerance and prioritization, and integration of risk management into organizational policies and procedures.
The AI RMF also provides a framework for considering and managing AI-related risks
The “core” of the AI RMF contains a framework for considering and managing these risks. It comprises four functions: “Govern”, “Map”, “Measure”, and “Manage.” These provide organizations and individuals with specific recommended actions and outcomes to manage AI risks.
- Governing – This relates to how AI is managed in an organization, such as creating a culture of risk management, outlining processes, documents and organizational schemes that anticipate, identify and manage AI risks, and providing a structure to align with overall organizational principles, policies and strategic priorities. Categories within the “Govern” prong include:
- Creating and effectively implementing transparent policies, processes and practices across the organization related to the mapping, measuring and managing of AI risks;
- Maintaining policies and procedures to address AI risks and benefits arising from using third-party software and data; and
- Mapping – This establishes the context in which an organization can identify and frame the risks of an AI system (such as who the users will be and what their expectations are). Once mapped, organizations should have sufficient contextual knowledge on the impact of the AI system to decide whether to design, develop or deploy that system. Outcomes from this function should form the basis for the measuring and managing functions. Specific categories within this function include:
- Assessing AI capabilities, targeted usage, goals and expected benefits and costs;
- Mapping risks and benefits for all components of the AI system, including third-party software and data; and
- Determining the impact of the system on individuals, groups, communities, organizations and society.
- Measuring – This function is about using information gathered from the mapping process as well as other tools and techniques to analyze and monitor AI risks. This function may be addressed by implementing software testing and performance assessment methodologies. The “Measure” prong also includes tracking metrics for trustworthy characteristics and impacts of the AI system. It should provide an organization’s management with a basis for making decisions when trade-offs arise. Specific categories include:
- Identifying and applying appropriate methodologies and metrics;
- Evaluating AI systems for trustworthiness;
- Maintaining mechanisms for tracking AI risks; and
- Gathering feedback about the efficacy of measurements being used.
- Managing – This function involves allocating resources to address risks identified through the functions above, and on a regular basis. Organizations should use information generated from the Governing and Mapping functions to manage and decrease the risk of AI system failures by identifying and controlling for risks early. Organizations can implement this function by regularly monitoring and prioritizing AI risks based on assessments from the Mapping and Measuring functions. Specific categories within this function include:
- Planning, preparing, implementing and documenting strategies to maximize AI benefits and minimize negative impacts, including input from relevant AI actors in the design of these strategies; and
- Ensuring that risks that arise are documented and monitored regularly.
The AI RMF also comes with an accompanying “playbook” that provides additional recommendations and actionable steps for organizations. Notably, NIST has already produced “crosswalks” to ISO/IEC standards, the proposed EU AI Act, and the US Executive Order on Trustworthy AI.
The Crosswalk is a mapping document that guides users on how adopting one framework can be used to meet the criteria of the other
To observers familiar with AI governance documentation, it should be apparent that there is complementarity between both frameworks. For instance, the AI Verify framework contains processes that would overlap with the RMF framework for managing AI risks. Both frameworks also adopt risk-based approaches and aim to strike a pragmatic balance between promoting innovation and managing risks.
Similar to other crosswalk initiatives that NIST has already done with other frameworks, this Crosswalk is aimed at harmonizing international AI governance frameworks to reduce fragmentation, facilitate ease of adoption, and reduce industry costs in meeting multiple requirements. Insiders have noted that at the time when the AI Verify framework was released in 2022, NIST was in the midst of organizing public workgroups for the development of the RMF. From there, the IMDA and NIST began to work together, with a common goal of jointly developing the Crosswalk to meet different industry requirements.
Understanding the methodology of the Crosswalk
Under the Crosswalk, AI Verify’s testable criteria and processes are mapped to the AI RMF’s categories within the Govern, Map, Measure and Manage functions. Specifically, the Crosswalk first lists the individual categories and subcategories under the aforementioned four functions. As these 4 core functions address individual governance/trustworthiness characteristics (such as safety, accountability and transparency, explainability and fairness) collectively, the second column of the Crosswalk – which denotes the AI Verify Testing Framework – sets out the individual principle, testable criteria, and process and/or technical test that correlates to the relevant core function under the AI RMF.
A point worth noting is that the mapping is not “one-to-one”; each NIST AI RMF category may have multiple equivalents. Thus, for instance, AI Verify’s Process 9.1.1 for Accountability (indicated in the Crosswalk as “Accountability 9.1.1”) appears for both “Govern 4” and “Govern 5” under the AI RMF. This is to reflect the differences in nature of both documents – while the AI RMF is a risk management framework for the development and use of AI, AI Verify is a testing framework to assess the performance of an AI system and the practices associated with the development and use of this system. To achieve this mapping, the IMDA and NIST have had to compare both frameworks at a granular level – down to individual elements within the AI Verify Testing Framework – to achieve alignment. This can be seen from the Annex below, which sets out for comparison the “crosswalked” elements, as well as identifies the individual testable criteria and processes in the AI Verify Testing Framework.
Other aspects of understanding the Crosswalk document are set out below (in a Q&A format):
No. These have technically not been mapped. Technical tests on areas like explainability and robustness are performed by the testing algorithm within the AI Verify Testing Toolkit. Given that the AI RMF does not have a technical testing toolkit, there is technically no equivalent in the AI RMF to map to.
However, technical tests provided by AI Verify may still be used to help organizations meet outcomes under the AI RMF. For instance, under AI RMF Measure 2.9, a suggested action is to “Explain systems using a variety of methods, e.g. visualizations, model extraction, feature importance, and others.” In this regard, explainability testing under AI Verify provides information on feature performance.
To use another example, under AI RMF Measure 2.11, one of the suggested actions is to “Use context-specific fairness metrics to examine how system performance varies across groups, within groups, and/or for intersecting groups. Metrics may include statistical parity, error-rate equality, statistical parity difference, equal opportunity difference, average absolute odds difference, standardized mean difference, percentage point differences.” Accordingly, AI Verify’s fairness testing allows system deployers to test the fairness of their models according to these fairness metrics.
Certain testable criteria under AI Verify only contain one testing process, whereas, in others, testable criteria contain multiple testing processes. In the latter case, where only one of these processes has been mapped to the AI RMF, the Crosswalk indicates and maps only that specific testing process.
No. Where the Crosswalk refers to entire sections of AI Verify’s Testing Framework, this indicates that all corresponding testing processes in those sections are relevant to achieving the relevant outcome under the AI RMF.
This is currently being incorporated into the latest version of the AI Verify testing tool.
The Crosswalk shows that practical international cooperation in AI governance and regulation is possible
The global picture on AI regulation and governance is shifting rapidly. Since the burst of activity around the development of AI ethical principles and frameworks in the late 2010s, the landscape is becoming increasingly complex.
It is now defined within the broad strokes of the development of AI-specific regulation (in the form of legislation, such as the proposed EU AI Act, Canada’s AI and Data Act or Brazil’s AI Bill), the negotiation of an international Treaty on AI under the aegis of the Council of Europe, executive action putting the onus on government bodies when contracting AI systems (with President’s Biden Executive Order as chief example), the provision of AI-specific governance frameworks as self-regulation, and guidance by regulators (such as Data Protection Authorities issuing guidance on how providers and deployers of AI systems can rely on personal data respecting data protection laws). This varied landscape leaves little room for a coherent global approach to govern a quintessentially borderless technology.
In this context, the Crosswalk as a government-to-government effort shows that it is possible to find a common language between prima facie different self-regulatory AI governance frameworks, paving the way to interoperability or a cross-border interchangeable use of frameworks. Its practical relevance for organizations active both in the US and Singapore cannot be overstated.
The Crosswalk also provides a model for future crosswalks or similar mapping initiatives that will support a more coherent approach to AI governance across borders, potentially opening the path for more instances of meaningful and practical international cooperation in this space.