TEI Roundtable No. 53: Using AI for Tax Law Research
How tax professionals can thoughtfully leverage—and embrace—the acceleration of artificial intelligence

print this article

As artificial intelligence continues its scorching rise, in-house tax professionals are turning to it more frequently to gain workflow efficiencies. One particular area in which AI can do just that is tax law research—if done with human oversight. Using AI for research requires careful attention to detail and knowledge about how AI tools such as large language models work, so Tax Executive convened a panel of experts to chat through advantages, drawbacks, use cases, and where tax and AI are headed together. They are Will Matthews, director of product management for tax research at Bloomberg Tax; Michael Bernard, vice president and chief tax officer at Vertex; and Senay Redda, vice president of solution consulting at Thomson Reuters. Tax Executive’s senior managing editor, Sam Hoffmeister, moderated the discussion.

Sam Hoffmeister: Thank you all for joining. First question: How often are you seeing tax professionals and tax teams using AI in their workflows, particularly for tax law research?

Will Matthews: In the last few years, since AI-enabled chatbots first came on the scene—we all know which ones I’m talking about—people have been looking for ways to work that into one task or another. I think a lot of that started and people got most comfortable early on with getting, you know, drafting help or help cleaning up communications that they were putting together. And that’s become more frequent. More junior practitioners that we talk to seem to be more comfortable with having an AI tool take a pass at what they’ve already decided is what they want to send and [then] cleaning it up for tone or formatting otherwise. And we’ve seen people get more and more comfortable with using it for pure research. Those are other workflows either before or after the research process is done. But within research, we’ve seen steadily increasing adoption. I think everybody got all the same trainings—or has over the last few years—about hallucinations, and about how these models work, and about what you have to watch out for when you’re trying to get facts and really specific intelligence from these models. On my side, as we’ve done work to bring transparency to the way that we’re enabling tax research so that users really know where answers came from and how it relates to more authoritative sources than what you might think of the chatbot as being, I think we’ve seen pretty steady adoption, and it’s becoming more of a routine part of the research process.

Michael Bernard: I can pick up a little bit just on what Will said there, too. I think what we’re seeing is more in professional settings with professional service providers. They’re using the research components of it a lot more in a daily sense. The reason is there’s a lot of subject matter experts in those areas—maybe they understand leasing taxation or health care taxation—and they actually are producing, say, libraries that eventually can be used in the AI process. So, from that standpoint, I think it’s much more well adopted than maybe in the corporate setting. What we’ve seen more in the corporate setting is exactly what Will mentioned. We’ve seen it with text, chatbots. We’ve seen it with product classifications. I think other places where they’ve readily adopted it is with tax notices. If you get a notice, that notice comes in, you can respond to the auditor in an appropriate manner, and then that AI tool can start looking for the files that it will need in order to produce the information that auditor actually wanted. So, I think that’s where we see it—more in the organizational areas or operational areas for corporate departments, and a little bit more on the research side in the professional services setting.

Senay Redda: Just to add on to that, we’re serving a lot of different needs in the tax, legal, and risk space. We see more general usage, things like generative AI in pockets, but we also have AI assistant products like CoCounsel and Checkpoint Edge With CoCounsel. For those products, many of the queries are focused on automating activities like reviewing contracts, redlining documents, and delivering straightforward answers to customers’ tax research questions.

Hoffmeister: What types of tax law research queries or tasks are particularly well suited—or not—for AI tools such as LLMs [large language models] or tax-specific products?

Bernard: I’d say just a couple things. One is, because you have to spend time teaching the AI tool how to actually think of things and what processes it’s going to go through to give you a really good answer, where we’ve seen a lot of our customer bases use them is based upon value and volume. For example, if you have a unique situation or a tax research item, that’s not always the best place for AI. But if, let’s say, you’re in an indirect space and you have to support multiple businesses, everything from where you sell food products to leasing to durable goods, one of the things that happens a lot of times with indirect tax groups is they actually get a lot of queries from salespeople, or probably from the procurement group: “I’m buying this product” or “I’m selling this product,” and rather than just relying on [a] general clause within the contract in terms of how taxes are applied, “Can I get the rate? Is it exempt? Is it exempt for a particular purpose? Do we have a certificate online, an exemption certificate?” So, a lot of times what we’re finding is that kind of volume is particularly good for corporate departments and other service providers to really use all that information and then work through that answer at some point in an AI fashion to actually answer those questions, particularly as it relates to the number of inquiries that it might get.

Redda: I think AI is particularly good—and better than a traditional search engine—at pulling in more synthesized answers. So, if you can get pretty specific with some of your questions, you will get a much more synthesized answer than you would in a search engine. For instance, you can ask a question: “If you spun off a C corp, how does it become an S corp?” And you’ll get a step-by-step answer in a very different way than you would if you put that into Google. So, the more specific you can be with your question and tailor or tie it to a fact base that it’s been trained on, the better answer you’ll get.

Matthews: I think the last point you mentioned, Senay, about the fact base that it’s trained on, this is the whole ball game for all these tools. Whether you’re working with a custom-trained model or you’re doing retrieval-augmented generation, the quality of the response you’re getting is going to depend entirely on the quality of the underlying material. We focused on, in our case, building tools that make use of the BNA [Bureau of National Affairs] portfolios and other analysis that we have that we know we can stand behind. And then we make sure as we’re delivering responses from those tools that all of those responses are well grounded in that source material. And it’s exactly right that if you’re asking for a synthesis across materials that relate to a specific fact pattern, they tend to be pretty good at those kinds of use cases. On the other side, you asked about use cases that they’re not as well suited for. I think we see persistent challenges in Excel or calculation tools—math and math-like outputs. I think these LLMs and AI tools in general are getting better, but they still tend to trip over things that require that kind of calculation.

Hoffmeister: Could you please dive a little bit deeper into those advantages as well as the drawbacks for using AI tools for research?

Redda: One challenge in leveraging publicly available LLMs is confidentiality. If you’re prompting or uploading proprietary data, you might be hesitant to do so because it becomes the data that can be used to develop and improve the LLM. That differs from some proprietary tools that will have explicit terms and conditions around how that data is going to be used. Another challenge is hallucinations. One of our solution consultants was talking to a customer at a recent conference, and they could tell one of the speakers had used an LLM to generate some of the content, and he knew that because some of it wasn’t accurate. Generative AI is designed with a bent to give an answer and, in doing so, can hallucinate. So, those are some challenges. There are however many gen AI products and the research products Thomson Reuters provides are trained on our materials and cite it in results so users can validate outputs in a way they would be unable to validate with a public large language model.

Bernard: If I could add just a little bit there, too. Obviously, the advantage is it can accelerate your research. I do agree with Senay that there are hallucinations that do occur. Sometimes I think that gives us a learning opportunity to say, “Was that question asked in a particular way to produce that, or was the database it was pointed at just not the right database for that kind of question?” So, there’s things around that as well. Both Will and Senay mentioned something about adjacencies; I think there’s a lot to learn from adjacencies, because there is this idea that maybe you have a question that you want answered, but also there’s some reasoning beyond that. We talked about that earlier. I really find that even though the answer that comes in may not answer the question that you posed, it can produce some pretty good authorities, whether it’s in the code or the regs or the cases that really helped you move along your learning in that particular area. Those are some of the benefits that we actually see as well.

Matthews: We’ve certainly heard from users that even if they aren’t experts on a particular question and so they’re starting from the position of having to take an AI-generated answer with a grain of salt, because it’s tough to depend on it for any kind of factual feedback, it’s still useful to validate that a query you put in made sense. If you get documents that are more targeted to the question you’re asking when using an AI tool than you would have gotten otherwise, that’s still speeding up the process of doing that research. I don’t think we can ever get away—at least in the foreseeable future—from the reality, though, that users really need to understand these systems’ limits on factual recall. Senay mentioned that the specifics in a lot of the law that people are looking up don’t lend themselves cleanly to the approach of finding the most likely next word in a sequence here, which is the core of how a lot of these systems work. You need to really know the fact that you’re relying on and then where it’s just giving you the authority that you can look at. But if it speeds up the process of priming you for the documents that you’re diving into, that is still a win.

Redda: To touch on one other item, most large language models have a training cutoff date so they may not have the most recent information. The models are getting much better at this over time and minimizing the time lag, but that in itself will also bring its own challenges. If everything is real time, it’s going to be using real-time sources of data that will bring its own challenges: trustworthiness, vetting of that data, which will be less apt if it’s recent. Those are some of the trade-offs.

Hoffmeister: What is the process of checking against the AI for accuracy, especially when it comes to perhaps more interpretive areas of tax law?

Matthews: This doesn’t replace the need to really be able to understand the documents that you’re looking at. “Trust but verify” is the name of the game here. Especially if you’ve got a tool that’s showing its work, you still need to look at the underlying documents to make sure that they’re saying what the interpretation of them that came through the AI tool says they’re saying, and to make sure that they apply to the circumstances that you’re trying to apply them to. The reality is, when we’re talking to users, nobody’s comfortable—and nobody’s going to be comfortable in the foreseeable future—telling a stakeholder, “Hey here’s the answer; I got it from a chatbot.” It’s always got to be the chatbot or the AI tool is a means to finding that authoritative source, you’ve just got to check that one.

Bernard: I’d agree with Will on that, too. You either have to have resources outside of your tax department or, if you’re a professional organization, who are experts in that field that you’re actually trying to research to make sure that those things are accurate. One thing that we try to encourage in our corporate departments is this other idea of what we normally call “organizational accuracy structure.” What we mean by that is, let’s say you’ve got a couple researchers in your department, and you give them a question on tax, and you tell them to go research that. They come back and they each write their own memos. One of the things that we think is best for the most accurate approach to all this is that there is a structure around that memo writing, or the approach about how you answer those questions. For example, you can come back and you can answer it on a technical basis for tax purposes, but then would you also be able to answer it for the levels of authority? If you think about it, there’s “more likely than not,” there’s “should,” there’s “well.” That would be another way in which you would want to be answered in that memo. The other thing, too, is most people think once you answer the tax question, you’re done. Well, you’re not. You might have to account for that item differently, or it may need an upfront reserve or reserve at some point. Does it require some kind of disclosure in the financial statements if it’s material enough? And is there a point where you take all of those things into account? The thing that we’ve seen with a lot of tax departments where they’ve been successful is they answer these core sets of questions beyond just the tax answer so that the memos have a structure that is replicable time after time given the questions that that AI is going to answer. And then the last one is just having the most reputable annotated databases that those questions are pointed at.

Redda: Both Will and Michael raised great points. We have a news business, Reuters, and they have a standard, which is get two sources for a given story or data point so data can be validated before it is published. There is a similar need for accuracy when leveraging AI tools, but many times users just don’t know the sources driving the outputs. One approach to gain a higher level of confidence is to use known or trusted sources of data. Our products are trained on tax and legal research we produce, and we can tie back outputs to that source data.

Hoffmeister: Switching gears a bit with this next one. We know in-house corporate tax departments are often stretched thin, and that there’s a need for new talent. Put yourself in a hiring manager’s shoes: What kind of AI-related research skills would you be looking for in a new hire?

Bernard: What we’re seeing and what we actually use here a little bit in our chief tax office or in our chief strategy office is we have someone who can write AI-type tools. They can actually draft these things, and know how to put them together, and talk to subject matter experts, and help them think about what is the process that that person goes through to reach a decision, not just from technical merits but from a judgment standpoint. We feel like we have to have people like that—at least access to people like that. Maybe they won’t be in the tax department, but they might be somewhere in IT, or they might be somewhere else, if we want to have those tools and evaluate them. And the second thing is you also have to have somebody who can evaluate tools very well. We mentioned this earlier about libraries and making sure that your questions are pointing in the right direction. I think those skills absolutely have to be there. Thirdly, I’d say whatever the AI produces, at some level there has to be someone—or a group of people—within a group who can just ensure that that answer is correct at some level based upon the evaluation of the tools and the evaluation of the answer that’s made. So, there’s multiple disciplines, and they don’t always have to reside in the tax department. But the one thing that the tax departments have to do is they have to be much more proactive in understanding that if you want something to solve your problems in builds, you have to be able to scope that just like any IT project and then deliver on it if you want it to be accurate and good. Those are some thoughts. I don’t know how Will and Senay look at it as well.

Matthews: For sure, the openness to new workflows is something that you just can’t avoid these days, and so you want to find people who have an open mind about technology as it’s changing and are thinking really proactively about how they can incorporate new advances into their existing work. It’s also good to look for some healthy realism about where the limits are of these tools. You don’t want people who have outsourced all their judgment to one of these tools. You want to be on the lookout for that risk for sure. I thought what Michael mentioned, too, about looking for people who can work with technology teams are other teams to try to break down their workflows into the discrete components that might make sense to build into an agent to achieve for you, that’s where a lot of this stuff is going, and that’s where we’re really excited about the future. As we can string research tasks together and research-enable other workflows—whether that’s drafting something, or coming up with a calculation, or putting together any other kind of document that outlines the product of research—that involves a bunch of discrete things that need to happen. The more of those that people can really think through at a molecular level, the better able we’ll be to take advantage of the toolset to automate more of that in a productive and still trustworthy way.

Redda: Leveraging AI encompasses many tangible skills. Michael touched on a clear one—prompt building. We all do that when we use AI tools to better help our customers; we have prompt specialists that help them get the best use of our products. It’s even changing how we think about skills and who we want to hire. Our chief people officer published an article saying we’re looking at more skill-based applicants and not purely at traditional channels, like colleges, because there are discrete skills. We want efficiency and quick output from the professionals we hire, and it’s possible to go out there and get it if we focus on the skills required to perform in a role. There’s a program called Gauntlet AI in Austin that is flipping the script on traditional education and 100% focused on building skills in as short a time as possible. The Gauntlet AI program is highly competitive to get into and offers an extremely intense ten-week program, with everything paid for with a guaranteed job of $200K upon graduation. They are taking it to the next level by becoming very utilitarian on skill development, and AI is rapidly accelerating the possibilities of skill-based recruitment.

Hoffmeister: Any final thoughts from any of you?

Bernard: Just one last thing. I was at the TEI Tax Technology Seminar last week [May 5–7] here in Seattle. It was a good session; we had a lot of good things. But one thing that was asked on a panel was this idea for corporate tax professionals, “Hey, we’re building these things now. We’re building them today. A great deal of the success is going to be if we can codevelop these things with you.” They asked a group of about 100 people in that room of corporate tax who would be willing to do that. Very few hands went up. What I saw that was a bit disappointing about that is that AI is here, the technologies are getting better, they are emerging, but I think for those who are developing the technologies and those who are going to be purchasing, there needs to be a closer codevelopment around what you want. What is going to be of value to you? What we’re trying to tell all of our customers is if you get an opportunity to do that with a professional vendor, you need to accept that challenge, because it is coming. It is going to be something that is valuable, and you can be seen as a leader within your own organization if you can develop those types of tools. That was one thing that I would encourage for all of our professional vendors and all of our corporate tax departments, to just reach together and try and develop products together that are useful for both.

Matthews: Wholeheartedly agree. Certainly that’s the best way we have to develop tools for all of our users and anybody who’s involved with us and the process of thinking through and iterating through tools that we’re building. They’re going to see an end result that certainly fits their needs. The other piece of it that relates to that is we’ve talked a lot about the source content that goes into these tools. On the research vendor side, we’ve got access to the content, we’ve got access to the technology, and we’re trying to build tools that answer users’ questions. But so many of the questions that people come to us with involve a lot of their own information and client information, and you’ve got to make sure as you’re working with a vendor on that that you’ve got a trustworthy counterparty who can handle your information in a responsible way. We’ve got our own interest in making sure that our content is protected as it goes through these AI systems. I know all of our users have the same concern. Where we can find common ground on that I think we’ll be able to deliver better and more targeted answers for people.

Bernard: Good point.

Redda: It’s important to be early on the AI adoption curve. Everything is changing very quickly but early adopters will better manage the transitions. At Thomson Reuters, we’re big believers that AI’s going to impact all roles in all functions. We’ve made some notable acquisitions of companies like Casetext [now CoCounsel] and Materia to accelerate and lead in this space. AI is not going to replace professionals, but it will greatly improve their productivity and their ability to scale. We already see many professionals who lead with AI, and if we talk in a year’s time, you’ll see a very different thing then, too.

Bernard: I hope so!

Hoffmeister: Thank you all so much for your time.

Leave a Reply

Your email address will not be published. Required fields are marked *

XHTML: You can use these tags <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>