Section 1 - Risk identification and evaluation
In general, in the enterprise space, where customers can tailor and use AI systems in various contexts, the most effective approach is for developers to perform a comprehensive assessment of reasonably foreseeable potential adverse impacts from the use of the system.
Embedding trust into Salesforce’s models, products, and features requires close collaboration with our Technology and Product teams. At the heart of this effort is the Trusted AI Review process, led by Responsible AI and Tech (RAIT) product managers within Salesforce’s Office of Ethical and Humane Use (OEHU). This process enables potential risks to be identified early, mitigated effectively, and tracked transparently.
During the review, RAIT product managers work closely with product teams to understand the product’s use cases, tech stack, and intended audience. They conduct a risk assessment to identify and categorize potential risk scenarios under sociotechnical harm subtypes. For each risk vector, the inherent and residual risk are identified so as to have a complete and clear understanding of its impact.
Our AI Acceptable Use Policy (AI AUP) includes a detailed list of uses for which our customers are not allowed to use our generative AI tools, including:
- Automated decision-making processes with legal effects
- Individualized advice from licensed professionals
- Explicitly predicting an individual’s protected characteristics
- Engaging in coordinated inauthentic behavior for the purposes of manipulating public debate & political campaigns
- Child sexual exploitation and misuse
- Weapons development
- Adult content and dating apps
Additionally, our AI AUP specifies that our generative AI tools are not intended for high risk uses that could result in the death or serious bodily injury of any person or in other catastrophic damage, including through warfare or the operation of critical infrastructure.
We evaluate our AI models against trust and safety metrics (e.g., bias, privacy, robustness) to ensure that they perform at the highest level. When a model scores below a certain range on one of these metrics, we use adversarial testing to better understand how that manifests in practice. For example, knowing the exact toxicity score does not reveal what kind of toxic content a LLM might generate. However, once you identify toxicity as a potential issue, you can focus your adversarial testing specifically on that, rather than testing for all types of risks.
Our Responsible AI & Technology team implements red teaming practices to improve the safety of our AI products. The team has conducted several end-to-end adversarial tests to prevent bias, toxicity, and ensure alignment to ethical tech commitments for our own models and applications as well as those of our partners.
Our AI services are also part of our vulnerability and incident management processes under security. Salesforce's Product Security Strategy & Advisory Security team proactively identifies and evaluates AI risks throughout the lifecycle using design reviews and penetration testing across their environments and specifically for Agentforce. Their "ExploitAI" activities are crucial for uncovering and validating AI-specific vulnerabilities beyond traditional security flaws. This includes focusing on authorization, system interaction, manipulation of goals and instructions, hallucination exploitation, impact analysis, knowledge poisoning, memory and context issues, multi-agent exploitation, resource exhaustion, and supply chain attacks. Recognizing that AI systems exhibit behavioral changes rather than simple failures, their testing evaluates a spectrum of responses, explicitly addressing concerns like bias, "jailbreaks," and data access violations. This comprehensive approach combines standard security practices with AI-focused testing to ensure the confidentiality, integrity, and availability of our AI features.
There are two main ways to go about red teaming — manual and automated — both of which are employed at Salesforce.
Manual testing leverages human testers who think like adversaries, using their expertise to craft complex and sophisticated attack strategies that automated systems might overlook. Examples include hackathons (for example, Salesforce’s XGen Hackathon had teams compete to identify vulnerabilities in our next-generation text generation models); or bug bounties (these are excellent once a product is launched to catch new harms that weren’t discovered during pre-launch. We incentivize our employees to identify and report vulnerabilities through our Bug Bounty Program and host ethical bug bounties. To date, Salesforce has invested more than $23M in our bug bounty program.)
As of September 2024, our teams have conducted 19 internal and 2 external red teaming exercises across our suite of generative AI models and applications. Through pre-launch testing, we have reduced toxic, biased, and/or unsafe outputs by 35% in an AI marketing feature; added guardrails for AI Agents to prevent bias and increase transparency; and partnered with one of our systems integrator partners to decrease biased outputs in generated content from 69% to 4%.
Automated testing is used as an enhancement, not replacement, of human-driven testing and evaluation. This type of testing involves the use of scripts, algorithms, and software tools to simulate a vast number of attacks or adversarial scenarios in a short period, systematically exploring the risk surface of the system. One approach we’ve been taking to automate some of our tests is called “fuzzing,” where we generate randomized test cases based on successful human attacks from manual testing (confirmed by to have been successful either in our manual testing, or through other publicly known attacks), deliver these test cases to the target model and collects outputs, and then assess whether each test case passed or failed.
Another way we test our products is by performing precautionary “prompt injection attacks”, by crafting prompts specifically designed to make an AI model ignore previously established instructions or boundaries. Anticipating actual cybersecurity threats like these is essential to refining the model to resist actual attacks.
Employee Trust Testing: To surface more subtle forms of bias that users experience, we tapped employees from across Salesforce’s global workforce to evaluate the trustworthiness of prompt responses. Participants engaged with the AI in simulated real-world scenarios, creating detailed personas to represent varied user experiences and using those personas to explore interactions that probed for biases and inconsistencies.
Salesforce has developed the world's first LLM benchmark for CRM to assess the efficacy of generative AI models for business applications. This benchmark evaluates LLMs for sales and service use cases across accuracy, cost, speed, and trust and safety based on real CRM data and expert evaluations. What sets this benchmark apart is the human evaluations by both Salesforce employees and actual external customers and the fact that it is based on real-world datasets from both Salesforce and customer operations.
Trust and safety (T&S) benchmarking is a critical aspect of our evaluation process. By evaluating our AI models against T&S metrics (e.g., bias, privacy, robustness), we can ensure that they perform at the highest level. When a model scores below a certain range on one of these metrics, we use adversarial testing to better understand how that manifests in practice.
We also use the quantitative T&S metrics in this paper “TrustLLM: Trustworthiness in Large Language Models”.
In terms of caveats, we would like to note that all quantitative metrics are limited in communicating the nature of a particular risk. For example, a specific toxicity score cannot communicate how often a model or system might generate toxic content or the severity of that toxicity. Additionally, some models may have been trained on publicly available evaluation datasets and, therefore, may score especially well on a benchmark (i.e., overfitting). Adversarial testing can provide qualitative data to help organizations understand exactly what kind of toxicity is generated and how easily it is generated (e.g., with expected use or only via significant attacks).
Regarding external independent expertise:
- We have engaged experts to perform penetration tests (through our Security Team’s Bug Bounty program). We also recently chose to outsource testing of two of our Einstein for Developers (E4D) products and our research multimodal model, PixelPlayground.
- The Ethical Use Advisory Council is our overarching body that guides the Office of Ethical and Humane Use in its product and policy recommendations to leadership. This Advisory Council was established in 2018 and is composed of external experts from academia and civil society along with internal VP+ level executives and frontline employees.
- Through our Responsible Disclosure Policy, we encourage responsible reporting of any vulnerabilities that may be found in our site or applications. Salesforce remains committed to working with security researchers to verify and address any reported potential vulnerabilities.
Regarding mechanisms to receive reports of risks, incidents or vulnerabilities by third parties:
- Incidents and vulnerabilities can be discovered and reported by our customers with the help of the Einstein Trust Layer, which includes audit trails for use in third-party reporting. Customers get a full audit trail of their generative AI transactions (prompts, responses, trust signals, user feedback) in their Data Cloud, fully under the customers' control, for them to use in their monitoring.
- We also have feedback mechanisms within the AI features so customers can indicate whether the generated output was inaccurate, inappropriate, etc.
- Customers, partners, and the general public can submit reports concerning incidents and vulnerabilities through our security email alias, as well as through the bug bounty program.
Salesforce employees participate in several working groups of the NIST Artificial Intelligence Safety Institute Consortium (AISIC). The Consortium brings together more than 280 organizations to develop science-based and empirically backed guidelines and standards for AI measurement and policy.
A full list of Salesforce’s compliance certifications and attestations can be found here.
We published Sustainable AI Policy Principles, which recommend the consideration of environmental impact when determining the risk of AI systems and the establishment of energy efficiency standards for high-risk systems. We also co-developed the AI Energy Score project which introduces such a standard.
At Salesforce, we are committed to knowledge sharing across industry, government, and civil society to advance trusted AI in society, and regularly share our principles, practices, and learnings.
In 2024, we published 20+ blog posts dedicated to the ethical and humane use of AI ranging from the top risks and related guidelines for generative AI to how we’ve built trust into our AI.
To help our customers dive deeper, we’ve also created resources and guides like the National Institute of Standards and Technology (NIST) AI Risk Management Framework quick-start guide and our Human at the Helm action pack. We also published the world’s first LLM Benchmark for CRM, which includes trust and safety metrics for each model.
Members of our Office of Ethical and Humane Use have held active positions on various AI councils around the globe, including the U.S. National AI Advisory Committee; the U.S. Chamber of Commerce Commission on Artificial Intelligence Competitiveness, Inclusion, and Innovation; Singapore’s Ethical AI Advisory Council; the Freedom Online Coalition’s Advisory Network; the Washington State Artificial Intelligence Task Force; the Oregon State Taskforce on Artificial Intelligence; and the U.S. National Institute of Standards and Technology (NIST). We are also an active member of the UNESCO Global Business Council for the Ethics of AI and Singapore’s AI Verify AI Foundation.
We are proud to have endorsed several global voluntary commitments. These include pledges to conduct internal and external testing before product release, sharing information on risks and vulnerabilities with each other and government entities, and conducting research to address societal risks. To that end, we have signed onto the following voluntary commitments and industry pledges:
- The EU AI Pact;
- Canada’s Voluntary Code of Conduct on the Responsible Development and Management of Advanced Generative AI Systems;
- The Seoul AI Business Pledge;
- The Trento AI Declaration.
Salesforce is also an active member of several industry alliances, including the Data & Trust Alliance, Data Provenance Initiative, AI Alliance, WEF AI Governance Alliance, Sustainable AI Coalition, and the WEF Global Future Council on Data Equity. By engaging in these partnerships, we collaborate with other technology leaders to develop standards, frameworks, and practices that promote responsible AI use across sectors, addressing data integrity, bias mitigation, and sustainable AI deployment.
We are grateful to the G7 and the OECD for their work on the reporting framework.
Salesforce is the #1 AI CRM, helping companies connect with their customers in a whole new way. We pioneered cloud-based CRM in 1999, and today we are leading the shift to trusted, agentic AI. With Agentforce, Salesforce enables organizations to deploy autonomous AI agents that act on unified, real-time data across their systems, helping every employee deliver more personalized, efficient, and secure customer experiences. Our trusted platform powers AI, data, and CRM applications across sales, service, marketing, commerce, and IT, so every team can work smarter and drive meaningful business outcomes.
Salesforce’s commitment and innovative leadership on responsible and trustworthy AI is inspired by the high standards required for successful outcomes by our enterprise customers around the world. Our enterprise customers require AI that performs at the highest levels of trust and safety, and that addresses their priorities – accuracy, robustness, auditability, privacy, security, and toxicity and bias mitigation.


























