Choosing the Best AI Tools: How to Consider Ethics and Risks when Implementing AI Systems

By Andrew Gamino-Cheong - Last Updated: October 21st, 2024

Nearly every software platform claims it has incorporated GenAI features into its products or services, whether customers want it or not. World-class AI systems are just a prompt and API call away. Yet it has become increasingly difficult to distinguish the capabilities of these AI systems or understand the trustworthiness of AI vendors. 

Competition drives many organizations to experiment with AI capabilities, but, in doing so, risk eroding trust with existing customers. This is especially true in regulated industries where customer trust may be an organization’s key differentiator. 

This raises significant questions around assessing an AI vendor’s trustworthiness and responsibility, including the potential risks and ethical aspects of any given solution. How can you assess not just the performance capabilities but the ethical risks before you choose and implement an AI solution? 

The Drawbacks of a Scoreboarding Approach to Assessing Software

Generally, vendors disclose statistics on their models’ performance across various automated benchmarks. However, this ‘scoreboarding’ approach fails to consider many of the non-quantitative aspects of a model’s design decisions, as well as the potential impacts on an organization. 

Relying on vendor-provided benchmarking data fails to reveal: 

  • Key model limitations

  • Ethical and legal risks based on data collection practices

  • Potential model drift over time

A Due Diligence Review: Deep Analysis for Better Software Selection

When it comes to analyzing risks and ethical concerns surrounding your software choice, a thorough due diligence review can unearth those issues.

A vendor due diligence review starts with asking key questions about the vendor’s AI system. An organization may gain valuable insights simply from reviewing publicly accessible information about the vendor’s system(s). However, an ideal review will include a deeper analysis that involves representatives with relevant experience in AI, cybersecurity, legal, and the applicable domain for the AI system. 

When conducting the review, consider the following aspects of the application and the vendor:

  • The vendor’s maturity, reputation, and trustworthiness

  • Datasets used to train the AI model (hint: bigger isn’t always better)

  • Limitations of the AI core model 

  • Integration with existing systems

Vendor Trustworthiness and Maturity 

Generally, more mature vendors will implement policies and processes to identify and mitigate their AI systems’ risks. As more organizations begin to adopt AI risk standards (e.g., ISO 42001 or the NIST AI Risk Management Framework), there will likely be more internal organizational controls requiring processes for vendors to oversee their AI systems. 

These controls often include having regular model performance reviews, setting clear metrics for performance and robustness, and signing Service Level Agreements to address customer feedback or reporting incidents. Organizations that do not have clear AI policies, lack AI expertise, or evade answering questions about their AI development process, may prove to be riskier partners. 

Beyond evaluating a vendor’s  trustworthiness, each AI solution should be reviewed within the context of its intended use cases. When performing this solution specific evaluation, there should be three separate areas of focus:

  • Training datasets

  • Specifics about the AI model 

  • Systems integrated into the AI model

Evaluating Datasets: The Root of AI Ethics and Risks

Some of the biggest limitations and risks stem from the original datasets used to train or fine-tune the system. Organizations can ask for a datasheet during vendor diligence. This document should contain information about the dataset’s collection, how it was cleaned or processed, a summary of statistics or demographic breakdowns, and evaluations of the data quality. Reviewing these datasheets should help organizations understand whether the model is an appropriate fit for their use case. 

However, understand that larger datasets do not necessarily alleviate risks. Many AI tool creators have sought out the biggest datasets they could collect, but along the way have opened themselves to huge risks. Such risks include copyright infringement, privacy law violations, or data sources that are ethically or legally problematic for those using the model. Rather, organizations should consider whether the dataset represents the population that it services, poses any potential ethical or legal risks around the data’s collection, and contains the relevant information for any specialized uses.

Evaluating the Core AI Model: Training, Architecture, and More

Organizations should also request a model card to understand the core AI model and its relevant limitations. A model card should contain high level information about the model’s architecture, how it was trained, and in-depth information about how it was evaluated and tested. 

Information about the model should help organizations understand whether the model is explainable, what tasks the model was specifically evaluated for, and the model training’s environmental impact. 

The best model cards also list out the model’s specific risks and limitations, the implemented mitigations measures, and who is responsible for managing risks across downstream deployers. 

While many organizations are hesitant to share trade secrets about model architecture, refusal to share the results of model benchmarks or other testing activities should be a massive red flag. 

Understandably, model cards or other model documentation can be extremely complex. Yet, some recent projects like the Stanford AI Model Transparency Index, or websites like aimodelratings.com, make it easier for non-technical roles to understand the potential risks across models.

How The AI Model Interacts with Your Existing Systems

AI models are also incorporated into existing software systems, additional infrastructure, or software layers that complement the actual AI model. These additional layers may introduce various AI risks and challenges, and understanding the overall system can help you decide if this is the right tool choice for your organization. 

You’ll want to understand:  

  • What cybersecurity protections are in place 

  • How the system is monitored for performance

  • Whether it can directly integrate with existing data systems to either retrieve information or act on the user’s behalf. 

The AI system’s information should also include documentation on how to interact with the system, what steps users can take to address system malfunctions, and how to report incidents or feedback about the system. 

So… Which AI Tool Is Best for You? 

Deciding which AI tools to use requires more than simply relying on performance benchmarks or product marketing materials. Organizations need to conduct appropriate due diligence on all the other aspects of the AI tool’s they procure. Having appropriate vendor review processes in place can ensure that organizations are able to maximize the benefits of AI, and minimize its risks. 

Andrew Gamino-Cheong is the co-founder & CTO of Trustible, a Washington, DC-based AI governance software company focused on helping organizations manage and mitigate AI risk, build trust, and accelerate responsible AI development.