This is part two in our ongoing series on well documented AI failures. With a focus on creating GenAI trust, Tumeryk provides an AI Trust Score™ that allows businesses with consumer-facing interactions to measure, monitor, and mitigate bias and hallucinations in generative AI. Today we cover a well-publicized GenAI failure in the legal industry.
In May 2024, the news broke that attorneys Steven Schwartz and Peter LoDuca of the firm Levidow, Levidow & Oberman will be charged in a New York court for using fake citations from OpenAI’s ChatGPT in legal research for a personal injury case they were handling.
After completing legal precedence research in OpenAI / ChatGPT, the two attorneys presented a series of cases that were entirely made up by the generative AI solution. The cases ChatGPT presented the lawyer in his research were Varghese v. China South Airlines, Martinez v. Delta Airlines, Shaboon v. EgyptAir, Petersen v. Iran Air, Miller v. United Airlines, and Estate of Durden v. KLM Royal Dutch Airlines — all bogus. These cases did not exist and had fake judicial warnings.
Stan Seibert, senior director of community innovation at data science platform Anaconda at the time said “One of the worst AI blunders we saw this year was the case where lawyers used ChatGPT to create legal briefs without checking any of its work,” Mr. Seibert continued, “It is a pattern we will see repeated as users unfamiliar with AI assume it is more accurate than it really is. Even skilled professionals can be duped by AI ‘hallucinations’ if they don’t critically evaluate the output of ChatGPT and other generative AI tools.” Experiences like this undermine the commercial feasibility of generative AI use cases and erode the trust businesses put into automated and customer-facing implementations. These further enforce a scoring solution to quickly identify bias and hallucination in GenAI to minimize potential damage and mitigate further harm.
Tumeryk offers an AI Trust Score™, modeled after the FICO® Credit Score, for enterprise AI application developers. This tool helps identify underperforming AI models and establishes automated guardrails to assess, monitor, and mitigate these models before they affect consumers or the public. The Tumeryk AI Trust Score™ is the ultimate in AI Risk Assessment, Mitigation, and Management.