Thursday thoughts: How (not) to use artificial intelligence at Tax Court
The US Tax Court struck a petitioner’s pretrial memorandum from the record when the judge discovered it included citations to nonexistent court cases, appearing to be artificial intelligence hallucinations. Despite the infraction, the court spared the petitioner and counsel from monetary punishment, instead taking the opportunity to discuss the effects of generative AI on legal and judicial proceedings.
In the order, the judge stated that the Court could not find three of the four cases cited in the memorandum. For those three, the names of the cases cited were real, but they were not at the locations given, and the actual cases were unrelated to the petitioner’s issue. And the locations of the hallucinated cases cited pointed to different cases that were also unrelated to the current case.
The judge found the memorandum violated the Court’s Rule 33, which requires counsel’s signatures on submissions to the Court. The US Tax Court Rules of Practice and Procedure require that counsel sign every pleading, and the effect of that signature is the following:
Counsel or a party signing a pleading certifies that the signer has read the pleading; that, to the best of the signer’s knowledge, information, and belief formed after reasonable inquiry, it is well grounded in fact and is warranted by existing law or by a nonfrivolous argument for extending, modifying, or reversing existing law or for establishing new law; and that it is not presented for any improper purpose, such as to harass, cause unnecessary delay, or needlessly increase the cost of litigation. (Rule 33, emphasis added.)
“In short,” the judge summarized, “it is the attorney who signs a pleading who is responsible for its content.”
In a hearing to address the erroneous citations, counsel admitted to the Court that she had recently joined a new firm and relied on a paralegal to draft the memorandum. Neither counsel nor the Court could confirm whether AI generated the memorandum; however, according to the judge, it “has the hallmarks of a document prepared with the assistance of a large language model.”
The judge then took advantage of the opportunity to discuss the use of AI, despite admitting “it is unclear whether, or to what extent, some form of AI may have been used to assist in preparing petitioner’s Pretrial Memorandum.”
Generative AI and the risk of hallucinations
Generative AI, which could have produced the memorandum at issue, is an extension of machine learning (ML). ML applies algorithms–a “model”–to relatively large sets of data to observe patterns and then classify new data based on those patterns. The classic example involves inputting thousands of images of dogs to train a model and then asking it whether or not a new image contains a dog.
Where ML stops at classifying data, generative AI goes a step further by producing new content. Back to the example using cats and dogs: a generative AI model can take what it knows and produce a unique image of dog.
But complications can arise. For example, often in images of living creatures, for example, models will include an unrealistic number of arms, legs, fingers, eyes, or other appendages. Newer models have fixed some of these issues, but an image of a person with six fingers on each hand or two knees in one leg was a tell-tale sign of an AI-generated image.
We call these unrealistic features in AI-generated content hallucinations. The term gives us an anthropomorphic metaphor for what happens when a model produces an erroneous output.
Models that produce written output also generate hallucinations the form of nonsensical or false statements, such as the incorrect court case citations in the memorandum. Generative AI models produce output that appears accurate, but an informed reader can quickly discover hallucinations.
It is nearly impossible to predict when and how often a model will hallucinate. And attempts to prevent hallucinations, such as through prompt engineering, cannot guarantee complete accuracy. So although AI models can speed up the process of producing technical written content, someone with subject matter expertise should review the output before sharing it.
Using AI in tax practice
The issue in this case was not the use of AI to generate the memorandum; rather, it was counsel’s failure to review it before signing and submitting it to the Court. The paralegal who generated the memorandum could have easily made up the cases and citations herself.
Discussing the use of AI, the judge wrote “The Court is not in the business of dictating to attorneys the extent to which they can or should rely on advancing technology to assist them in representing their clients.” In fact, the Court makes the case that AI can help those who lack the financial resources to retain legal counsel avail themselves of the judicial system on a near equal footing.
AI provides a potentially useful tool for reviewing, summarizing, and producing content for tax professionals as well. But we must remain vigilant in our use of it. Circular 230 §10.22(a) requires that practitioners “must exercise due diligence… In determining the correctness of oral or written representations made by the practitioner to the Department of the Treasury; and… In determining the correctness of oral or written representations made by the practitioner to clients with reference to any matter administered by the Internal Revenue Service.” This requirement predates widespread use of AI, and applies to all of our work.
Now your thoughts…
How are you using AI in your practice? What safeguards do you have in place around using AI? If you haven’t started using it, why not?
To read the Court’s order, see “Order that petitioner’s Pretrial Memorandum is deemed stricken.” Gary Thomas v. Commissioner, Docket No. 10795–22 (USTC).
Hat tip to David Fazio, EA, USTCP for sharing the case and order with the InCite community. If you aren’t a member of InCite yet, I can’t recommend it enough!