ChatGPT for Finance: Promise and Peril – ETF Trends
Over the past month or so, my inbox and DMs have been flooded with questions about AI, in no small part due to my interview with Professor Stuart Russell at Berkeley and our recent webinar with ROBO Global’s Zeno Mercer and Resolve Asset Management’s Adam Butler. One crystal-clear pattern has emerged. Everyone wants real-world, specific examples of how AI can and can’t enter into an advisor’s, investor’s, or creator’s workflow.
I aim to please. Today, I will peel back the curtain and give concrete examples of how to use ChatGPT effectively to accelerate investment research and content creation. Meanwhile, I’ll also dispel some of the hyperbole around what ChatGPT can and can’t do.
ChatGPT is a large language model (LLM). In plain talk, that means a bunch of nerds ran a computer program to scrape all the language in the world they could find (e.g., the internet), then built a statistical model of the relationships between words. For example: the words “blue” and “color” show up next to each other often, so the connection between those two words gets a big weight. But “blue” and “apple” do not, so they get a lower weight.
An LLM makes these connections not just between word-pairs but between huge blocks of words all in relationship to each other. Then the model is fine-tuned through a ton of human work and input until the model is released.
Once in the wild, the LLM can be asked to do work (“prompted”) based on its understanding of language. I emphasize that phrasing because, really, that’s all “it” is: An incredibly powerful language modeler. Many “LMMs For Dummies”-style guides exist, but here’s a good one I like that splits the difference between “no math” and “code” in its explanation:
In reality, the most important thing to remember is that this is a model of plausible word connections based on a training set. So if you ask ChatGPT, “What is the biggest ETF?” it’s not actually going to know the answer. It will comb its bank of word connections and say, “OK, presented with those five words, the most plausible answer I have seen before is: SPY.”
But that’s a probability, not a lookup. There’s a chance the model might say “GLD” because that ETF was certainly written about once or twice. So any fact-based request to an LMM will embed the risk of error in its answer. The model doesn’t return facts. It returns plausible replies.
Does that make it useless? No! Because ChatGPT’s real superpower is in helping you take your own words and say them differently.
Here’s an example. Imagine you’ve written your monthly client letter. Your marketing team wants you to write a quick tweet promoting it, maybe even a longer LinkedIn post. But you’re not sure how best to grab readers’ attention. Good news: ChatGPT can help.
This is hugely important. While GPT-3.5 is a novelty toy that occasionally can be coerced into useful work, GPT-4 is a completely different beast of functionality. However, you only get access by paying OpenAI $20 a month (and/or having an API key and coding directly).
So it’s worth remembering that you’re limited in how much work you can do with GPT-4. In the Chat version, you’re limited to making 25 requests every three hours. That’s frequent enough for most broad purposes, but it certainly wouldn’t replace an analyst’s workflow any time soon.
The API version, meanwhile, caps out at $120 a month in usage, pay-as-you-go. It gives you more to play with but not enough to build real business processes on yet, and no casual coder I’ve talked to (so far) has had their quotas increased.
So in GPT-4, I first prime the conversation with a base prompt, where you tell the model what persona it should adopt and how it should behave. Essentially, you’re telling the model what “voice” you want it to have, thereby telling it how to assign weight to the queries you’re about to make. This is called “role prompting,” and it’s a necessary step to getting decent results. (www.learningprompting.org is an incredible resource to really master this stuff).
Here’s my default prompt for doing ETF work.
For this step, I grabbed a short piece I wrote on the banking crisis in March. It’s about 1,000 words, which is a juicy size for ChatGPT; anything over 2500 words, and you’re likely to run out of “context window” — that is, how much the chat window can hang onto at a given time.
So I paste that in my article. Here’s ChatGPT’s reply:
That’s the equivalent of a rambling, ‘haven’t-read-the-book’ book report. Remember, ChatGPT isn’t “smart.” It provides plausible responses based on everything it’s ever seen, as well as what I’ve just shown it. I hit “stop” before I let it ramble on.
Here’s where it gets interesting. To give ChatGPT an idea of what I need, I feed it a pretty boring tweet format, then ask it to make me a new one based on the text of the article:
This is called “one-shot prompting”: I gave ChatGPT an example, and it did the work. The result isn’t awful. But if I give it more examples, it will do more refined work.
The resultant copy doesn’t say much interesting about the article’s content. So I ask it to try again with more specifics:
That’s not too bad. The main takeaway here is that iteration and specifics matter. Because we’re limiting results to a specific, provided text, each of the points ChatGPT cites is, in fact, a major feature of my article.
But let’s try a different approach for the LinkedIn post. Let’s ask ChatGPT to be smart:
Again, I’d give this a “not bad!” The LLM has taken a big, long-form document and spat out a pretty plausible LinkedIn post. Sure, it reads a little sales-y and wonky, but that’s a persistent feature of ChatGPT: Because it’s trying to chain together statistically plausible word connections, it inherently produces the most average (boring) content imaginable unless you ask it to spice things up.
Let’s do just that:
Because I’ve written a fair bit over the years, and my content is easily found on the Internet (ChatGPT’s source material), I’d say this is a pretty plausible “Nadig-ification” of the post. It uses some longer words, the voice is fairly casual, and it uses a communal call to action. I feel seen.
However, this also highlights one of many real legal issues involving using any LLM trained on public data for commercial purposes. When you feed content into ChatGPT, you can copy anybody’s voice or ‘likeness’ without their consent if it’s in the training set somewhere. There’s even a class action lawsuit about this. (While I think the “likeness” and training copyright issues are fascinating and worth discussion, this isn’t my area of direct expertise. A lawyer, I am not.)
AUTHOR’S NOTE: One quick note on prompt engineering: If you’re a ChatGPT pro, you’ll scoff at the unstructured prompts I’ve used in the above example. Phenomenal resources exist to help you become an expert prompter. Indeed, the more specific you are and the more examples you give, the better you can make this process, no matter what you’re trying to use GPT-4 to produce. I’ve deliberately used mediocre plain English prompts here to make a point. But by all means, go get good!
I’d put all of the above in the category of “editorial assistant.” Honestly, it’s how I use the service almost every day. Whether creating a quick summary of a news article, wading through a podcast transcript, or just refining my writing, an LLM can shortcut processes that might have taken me much longer by brain-and-hand alone.
But you might be wondering: Could you get ChatGPT to actually write something new for you? Forget me writing my article; could ChatGPT have written the article for me?
There be dragons.
Here’s what you get if you open a new window and just ask it to pick out what’s important in June 2023.
At first glance, this copy seems totally plausible. Until you realize it doesn’t mention the Fed pausing rate hikes, or the debt ceiling issue, or the impact of AI on earnings calls, or the fact that Japan has fallen out of bed since mid-June, etc. That’s because ChatGPT is simply making stuff up.
When you ask it to provide more specifics, the service cops to it:
If the training data only goes up to September 2021, then certainly ChatGPT wouldn’t be able to provide useful information about June 2023 data. But hey, maybe we can solve this by using OpenAI’s connection to Microsoft Bing:
Let’s try the query again:
Hey, that looks pretty good! So good, in fact, that someone who didn’t know better might just cut and paste it into a document and think “Aha! ChatGPT is writing content for me! Huzzah!”
Just a few minor problems. For example, as of this writing, the S&P is up 4% year-to-date, not 9%. Also, the May inflation data is already out. And the implied hike probabilities are 82%, not 72%. Also, Q3 earnings projections are sitting at +.8%, not -2.2%.
The reason all this data was wrong is buried within that superscript [1] at the end of the text block. That superscript links to the LLM’s sole source in constructing this response: a Forbes article from June 1st.
In other words: when it searches the live internet for data, ChatGPT results will only be as good as the first, most obvious search result. This may be stale or present an incomplete picture.
What’s worse, if you ask ChatGPT to add more detail to its result, the model simply plagiarizes complete blocks of text:
ChatGPT lifted this result straight from the Forbes article, presenting clear ethical and legal problems. Not only that, it also attributes its data point to a gentleman who did not actually talk about rate hikes.
The above example isn’t me cherry-picking results. It’s actually fairly typical. I max out my GPT4 allocation almost every day, testing the edges of things. The reality is, when it comes to finance writing, internet integration isn’t a “feature” of ChatGPT but a horrible and dangerous bug. Again, if you get very specific and learn to iterate through searches, you can get useful ideas from it. Still, to suggest it’s an “unreliable narrator” is an insult to Tyler Durden.
If ChatGPT is so good at the language part, how can users make it work better for, y’know, the reality part?
There are two main solutions. The first is to use plugins. Plugins can connect GPT-4 to known sources of truth. As of this writing, there are 422 plugins available, any three of which you can access through GPT-4 at a time (given your 25 requests every three hours).
Some of these plugins are absolutely brilliant. One of my favorites is ScholarAI, which essentially turns the process of finding academic research from a chore to a joy:
All this appears with just the first screen! If you know what you’re doing, are careful to check sources, and don’t over-extend your ChatGPT requests, you can really get into the weeds to pull sources, find citations, and save an enormous amount of research time for academic deep dives.
However, when it comes to trawling through financial data, ChatGPT isn’t quite there yet. The challenge is that financial data is almost all ‘structured’ — which is good because that means it’s easily plug-and-play in Excel or in various databases. But for a model that trains on people’s words rather than rows of numbers, interpreting these numbers using its skill at language can be an odd fit.
Even still, a variety of finance plugins exist, the best of which I’ve tried is from Daizy. It’s great for simple semantic-search type requests:
That said, in terms of actual workflow, ChatGPT is still painfully slow for these sorts of numerically-complex requests. Whereas you could pull this table from a data system, say, LOGICLY, in seconds rather than minutes:
What’s more, Daizy-enabled ChatGPT tends to fail when you ask it to perform more complex comparisons. This is true of every other Finance plugin I’ve worked with.
For example, finding an ETF or ETFs that meet specific, common investment criteria? Challenging:
The failure isn’t one of access. It’s not that Daizy, or any other plugin, lacks access to the right finance data. Rather, ChatGPT’s language-driven model isn’t a particularly good way of parsing this kind of request — today. The idea of making a comparison requires the ability to hold multiple concepts “in mind” simultaneously and evaluate among them on many vectors: in this case, performance, liquidity, exposure, and the value factor. Theoretically, this is not an impossible task, but it’s beyond the real-world capability of any live tool right now. At least for now, there’s absolutely no way ChatGPT can replace the insight of an actual human being who knows the right questions to ask and, more importantly, can intuitively judge good responses from bad ones based on subject matter expertise.
In other words, you can’t ask a machine designed to simply be “plausible” to actually generate “insight.” Insight comes from asking the right questions in the right context, making intuitive connections and knowing how to determine what’s real from what’s suspect.
That said, ChatGPT can do lots of useful things beyond making comparisons, especially once you step outside the chat window and into the Wild West of standalone applications. For me, probably the single most useful functionality is using ChatGPT to digest very large documents and/or collections of documents.
For example, let’s say I want to quickly understand the ins and outs of the proposed European AI Act (because, let’s face it, the EU has become the global regulator for the internet, whether the U.S. likes it or not). The Act is 108 pages long and vastly too big to paste wholesale into a chat window. A month or two ago, I could only use AI to help through the OpenAI Application Programming Interface. Now, I can just load the document into a website like ChatPDF.com, then ask the AI questions about the document :
For example, above I asked ChatPDF what the proposed Act has to say about knowing who authored content, and it quite nicely tells me:
The proposed Artificial Intelligence Act includes transparency obligations for certain AI systems that generate or manipulate content that appreciably resembles authentic content, such as deep fakes. According to [15] and [35], there should be an obligation to disclose that the content is generated through automated means, subject to appropriate safeguards for the rights and freedoms of third parties. Users of such AI systems should disclose that the content has been artificially generated or manipulated by labeling the artificial intelligence output accordingly and disclosing its artificial origin.
Which is a phenomenally clear summary with specific references now highlighted right in the PDF! As someone who regularly works with massive regulatory documents, this is game-changing.
If you’re willing to wade into code, then comparing and contrasting a library of multiple large documents is possible, but it isn’t easy. (Adam Butler of Resolve covered how this works at an industrial scale in our recent webinar). It’s not “hard,” either. It’s an evening’s worth of work if you’re comfortable downloading libraries and following instructions. If you’re interested, here’s one of many very “script-kiddy” tutorials I’ve been using:
To quote Jurassic Park:
Smarter legal minds than I have pointed out several big issues with using ChatGPT and other generative AI in your published commercial work (for example, here’s an IP lawyer who’s fairly negative on the idea). Some of them include:
Regardless of what might currently hide in the “Terms Of Service,” legal perspectives will likely consider ChatGPT output a “joint work” between you and a machine algorithm that your company doesn’t own. A pretty broad swath of copyright lawyers seem to think this will need to be tested in court. So… do you want to be the test case?
At a minimum, new copyright registration requirements will likely be coming soon. And while you may end up not owning LLM output, you might be on the hook for any factual errors or harm that follows from your use of it.
OpenAI is currently being sued over how well it can duplicate users, and there are both legal and ethical issues associated with how LLMs train to get as good as they are. After all, to generate their output, LLMs use published articles without the authors’ or publishers’ consent. On the other hand, those published articles also were posted on the internet. This means anybody could read them, including a machine. But on the other, other hand, publishing LLM-generated content — especially for financial gain — could violate fair use law, copyright protections, and for all anyone can tell, the Geneva Convention (that’s sarcasm).
Again, do you want to be the test case in court? The person who has to take down all your GPT4 content when someone gets an injunction? If nothing else, the plagiarism and copyright issues have given rise to my very favorite late-stage capitalism concept: algorithmic disgorgement, which is not only a great punk band name but also the mechanism for how the FTC might “unwind” OpenAI’s training by forcing the deletion of improperly acquired data and algorithms and models using it.
Even more concerning: The terms of use from LLMs all include indemnifications.That means platform users, and not OpenAI, may end up being on the hook for any content potentially deemed infringing in the future.
This could be a bottomless liability for websites or content publishers. For an RIA writing content to clients, it’s still probably best to write your own original copy and use ChatGPT as a capable assistant.
The mass media has pretty thoroughly covered that any data you feed into an AI-chatbot as a prompt is pretty much “in the wind.” It exists in a massive database, accessible by hundreds of millions of users worldwide. So as a best practice, don’t put anything confidential or proprietary into ChatGPT.
However, depending on which LLM you’re using and whether you’re using the API or not, you might also be giving up any rights to whatever data you’ve prompted the system with.
Now, for the “rewrite” type of work, I led off with, I’m not personally too worried. The machine doesn’t magically get copyrighted over my submission; it just gets access to it. It might then use it for training again (see problem 2). But you can see how a firm could wade into troubling waters quickly. In fact, some law firms are recommending new employment contracts and employee handbook upgrades that explicitly lay out what can and cant be done with ChatGPT for early-adopting firms.
The current version of the EU AI Act includes provisions that audio, video, and images created by AI will need to be flagged. It seems inevitable that this kind of “right to know” on authorship is going to become both the legal and commercial norm. Even Bloomberg’s been appending this to automated stories for a while now:
This could undermine the perceived value of your communications with your clients. It’s one thing to use AI to produce a summary of something you’ve already written for social media. But it’s another thing entirely to send to clients a weekly market recap that says, “Generated by ChatGPT4” under it. How much value would your clients assign to a button press versus an actual brain?
I’d be willing to bet your clients are payin’ for brains.
I get it: We’re all tired of endless hype cycles. So it’s OK if you haven’t ponied up for a ChatGPT subscription or installed the app on your phone yet. But don’t dismiss this incredible new tool’s very real use cases. The capability set is getting better by the day (literally). Problems that right now require code will become trivial within a matter of months. Expect every major desktop-platform provider — Microsoft, Google, Adobe, Notion, Dropbox, Slack — to roll out feature after feature. It’s already happening. And it’s exciting.
But if I could give any advice, it’s this: Focus less on the whizbang tech and more on where in your day-to-day workflow you could use a little automation.
For advisors, investors, content producers, and folks who are otherwise experts in a field, AI isn’t coming for your job. It’s not replacing what you do. But boy, it can really act as a time-saver and productivity accelerant if you lean in with clear — and skeptical — eyes.
For more news, information, and analysis, visit Vettafi | ETF Trends.