Connect with us

Published

on

Patronus AI cofounders Anand Kannappan and Rebecca Qian

Patronus AI

Large language models, similar to the one at the heart of ChatGPT, frequently fail to answer questions derived from Securities and Exchange Commission filings, researchers from a startup called Patronus AI found.

Even the best-performing AI model configuration they tested, OpenAI’s GPT-4-Turbo, when armed with the ability to read nearly an entire filing alongside the question, only got 79% of answers right on Patronus AI’s new test, the company’s founders told CNBC.

Oftentimes, the so-called large language models would refuse to answer, or would “hallucinate” figures and facts that weren’t in the SEC filings.

“That type of performance rate is just absolutely unacceptable,” Patronus AI cofounder Anand Kannappan said. “It has to be much much higher for it to really work in an automated and production-ready way.”

The findings highlight some of the challenges facing AI models as big companies, especially in regulated industries like finance, seek to incorporate cutting-edge technology into their operations, whether for customer service or research.

The ability to extract important numbers quickly and perform analysis on financial narratives has been seen as one of the most promising applications for chatbots since ChatGPT was released late last year. SEC filings are filled with important data, and if a bot could accurately summarize them or quickly answer questions about what’s in them, it could give the user a leg up in the competitive financial industry.

In the past year, Bloomberg LP developed its own AI model for financial data, business school professors researched whether ChatGPT can parse financial headlines, and JPMorgan is working on an AI-powered automated investing tool, CNBC previously reported. Generative AI could boost the banking industry by trillions of dollars per year, a recent McKinsey forecast said.

But GPT’s entry into the industry hasn’t been smooth. When Microsoft first launched its Bing Chat using OpenAI’s GPT, one of its primary examples was using the chatbot quickly summarize an earnings press release. Observers quickly realized that the numbers in Microsoft’s example were off, and some numbers were entirely made up.

‘Vibe checks’

Part of the challenge when incorporating LLMs into actual products, say the Patronus AI cofounders, is that LLMs are non-deterministic — they’re not guaranteed to produce the same output every time for the same input. That means that companies will need to do more rigorous testing to make sure they’re operating correctly, not going off-topic, and providing reliable results.

The founders met at Facebook parent-company Meta, where they worked on AI problems related to understanding how models come up with their answers and making them more “responsible.” They founded Patronus AI, which has received seed funding from Lightspeed Venture Partners, to automate LLM testing with software, so companies can feel comfortable that their AI bots won’t surprise customers or workers with off-topic or wrong answers.

“Right now evaluation is largely manual. It feels like just testing by inspection,” Patronus AI cofounder Rebecca Qian said. “One company told us it was ‘vibe checks.'”

Patronus AI worked to write a set of over 10,000 questions and answers drawn from SEC filings from major publicly traded companies, which it calls FinanceBench. The dataset includes the correct answers, and also where exactly in any given filing to find them. Not all of the answers can be pulled directly from the text, and some questions require light math or reasoning.

Qian and Kannappan say it’s a test that gives a “minimum performance standard” for language AI in the financial sector.

Here’s some examples of questions in the dataset, provided by Patronus AI:

  • Has CVS Health paid dividends to common shareholders in Q2 of FY2022?
  • Did AMD report customer concentration in FY22?
  • What is Coca Cola’s FY2021 COGS % margin? Calculate what was asked by utilizing the line items clearly shown in the income statement.

How the AI models did on the test

Patronus AI tested four language models: OpenAI’s GPT-4 and GPT-4-Turbo, Anthropic’s Claude2, and Meta’s Llama 2, using a subset of 150 of the questions it had produced.

It also tested different configurations and prompts, such as one setting where the OpenAI models were given the exact relevant source text in the question, which it called “Oracle” mode. In other tests, the models were told where the underlying SEC documents would be stored, or given “long context,” which meant including nearly an entire SEC filing alongside the question in the prompt.

GPT-4-Turbo failed at the startup’s “closed book” test, where it wasn’t given access to any SEC source document. It failed to answer 88% of the 150 questions it was asked, and only produced a correct answer 14 times.

It was able to improve significantly when given access to the underlying filings. In “Oracle” mode, where it was pointed to the exact text for the answer, GPT-4-Turbo answered the question correctly 85% of the time, but still produced an incorrect answer 15% of the time.

But that’s an unrealistic test because it requires human input to find the exact pertinent place in the filing — the exact task that many hope that language models can address.

Llama2, an open-source AI model developed by Meta, had some of the worst “hallucinations,” producing wrong answers as much as 70% of the time, and correct answers only 19% of the time, when given access to an array of underlying documents.

Anthropic’s Claude2 performed well when given “long context,” where nearly the entire relevant SEC filing was included along with the question. It could answer 75% of the questions it was posed, gave the wrong answer for 21%, and failed to answer only 3%. GPT-4-Turbo also did well with long context, answering 79% of the questions correctly, and giving the wrong answer for 17% of them.

After running the tests, the cofounders were surprised about how poorly the models did — even when they were pointed to where the answers were.

“One surprising thing was just how often models refused to answer,” said Qian. “The refusal rate is really high, even when the answer is within the context and a human would be able to answer it.”

Even when the models performed well, though, they just weren’t good enough, Patronus AI found.

“There just is no margin for error that’s acceptable, because, especially in regulated industries, even if the model gets the answer wrong one out of 20 times, that’s still not high enough accuracy,” Qian said.

But the Patronus AI cofounders believe there’s huge potential for language models like GPT to help people in the finance industry — whether that’s analysts, or investors — if AI continues to improve.

“We definitely think that the results can be pretty promising,” said Kannappan. “Models will continue to get better over time. We’re very hopeful that in the long term, a lot of this can be automated. But today, you will definitely need to have at least a human in the loop to help support and guide whatever workflow you have.”

An OpenAI representative pointed to the company’s usage guidelines, which prohibit offering tailored financial advice using an OpenAI model without a qualified person reviewing the information, and require anyone using an OpenAI model in the financial industry to provide a disclaimer informing them that AI is being used and its limitations. OpenAI’s usage policies also say that OpenAI’s models are not fine-tuned to provide financial advice.

Meta did not immediately return a request for comment, and Anthropic didn’t immediately have a comment.

Continue Reading

Technology

Advisors ‘wary’ of bitcoin ETFs are on a slow adoption journey, says BlackRock exec

Published

on

By

Advisors ‘wary’ of bitcoin ETFs are on a slow adoption journey, says BlackRock exec

Jonathan Raa | Nurphoto | Getty Images

The long-awaited bitcoin exchange traded funds launched in January, and financial advisors are on their way – though gradually – toward adopting them, according to BlackRock’s Samara Cohen.

For now, about 80% of bitcoin ETF purchases have likely been coming from “self-directed investors who have made their own allocation, often through an online brokerage account,” she said, speaking at the Coinbase State of Crypto Summit in New York City on Thursday. The iShares Bitcoin Trust (IBIT) was among the funds to debut earlier this year.

Cohen, BlackRock’s chief investment officer of ETF and index investments, noted that hedge funds and brokerages have also been buyers, based on last quarter’s 13-F filings, but registered investment advisors have been a little more “wary.”

CNBC recently polled its Advisor Council about why they and their colleagues are so cautious about the new products, which represent a regulated and familiar investment product for a new asset class that has garnered significant interest in recent years. Responses ranged from bitcoin’s notorious price volatility to the flagship cryptocurrency being too nascent to have established a significant track record. Regulatory compliance and the crypto’s reputation for fraud and scandal were also on advisors’ minds.

“I would call them wary … that’s their job,” Cohen said of the skeptical financial advisors.

“An investment advisor is a fiduciary to their clients,” she added. “This is an asset class that has had 90% price volatility at times in history, and their job is really to construct portfolios and do the risk analysis and due diligence. They’re doing that right now.”

Stock Chart IconStock chart icon

hide content

The iShares Bitcoin Trust (IBIT) in 2024

“This is a moment, in terms of really putting forward important data, risk analytics [and determining] the role bitcoin can play in a portfolio, what sort of allocation is appropriate given an investor’s risk tolerance, their liquidity needs,” she added. “That’s what an advisor is supposed to do, so I think this journey that we’re on is exactly the right one and they’re doing their jobs.”

Cohen said she sees bitcoin ETFs as a bridge between crypto and traditional finance – particularly for investors who may be interested in making an allocation to bitcoin without having to manage their risk across two different ecosystems. Before the ETFs, the existing onramps into crypto were insufficient for what some investors wanted to do, she said.

Coinbase chief financial officer Alesia Haas said bitcoin is “on a slow journey of adoption” – a theme echoed across the conference sessions.

Blue Macellari, head of digital assets strategy for T. Rowe Price, pointed to the 1% allocation that some investors deem to be a safe, comfortable amount. She said she sees portfolio allocations into bitcoin as binary events, where they should be greater than 1% or zero, but she also acknowledged the cautious approach toward adoption.

“There’s a psychological component where people need to test the waters and get comfortable,” Macellari said. “It’s a paradigm shift … it takes time for people to ease their way into it.”

Don’t miss these stories from CNBC PRO:

Continue Reading

Technology

Adobe shares surge 15% for sharpest rally since 2020

Published

on

By

Adobe shares surge 15% for sharpest rally since 2020

Adobe CEO Shantanu Narayen speaks during an interview with CNBC on the floor at the New York Stock Exchange on Feb. 20, 2024.

Brendan Mcdermid | Reuters

Adobe shares surged 15% on Friday, the biggest gain since March 2020, after the software maker reported earnings and revenue that beat analysts’ estimates.

After the bell on Thursday, Adobe reported adjusted earnings per share of $4.48, topping the LSEG consensus estimate of $4.39 per share. Revenue increased 10% from a year earlier to $5.31 billion, exceeding analysts’ estimates of $5.29 billion.

CEO Shantanu Narayen attributed Adobe’s record revenue to its strong growth across Creative Cloud, Document Cloud and Experience Cloud and its advancements in artificial intelligence.

“Our highly differentiated approach to AI and innovative product delivery are attracting an expanding universe of customers and providing more value to existing users,” Narayen said in a press release on Thursday.

New annualized recurring revenue for the Digital Media business, which includes Creative Cloud subscriptions, came in at $487 million, beating the StreetAccount consensus of $437.4 million.

Adobe’s results provide a contrast to what software investors have seen from many industry peers of late. Salesforce shares suffered their worst plunge since 2004 late last month after the cloud software vendor posted weaker-than-expected revenue and issued disappointing guidance. That same week, MongoDB, SentinelOneUiPath and Veeva all pulled down their full-year revenue forecasts.

However, there were positive signs in the sector this week. Oracle shares rallied after the database company announced cloud deals with Google and OpenAI, even as fourth-quarter results fell short of Wall Street expectations. CrowdStrike jumped on Monday following the announcement after the close last Friday that the cybersecurity company would be added to the S&P 500.

JMP analysts, who have the equivalent of a hold rating on Adobe, wrote in a note after the earnings report that the company’s results were uplifting despite a challenging economic environment and increased competition in design software.

“We like how Adobe is integrating AI functionality across its product portfolio,” the analysts wrote.

Meanwhile, analysts from Piper Sandler raised their revenue estimates slightly by $73 million for fiscal 2024 and by $71 million for 2025. 

“Customer reactions to recent innovations were encouraging, as increasing availability of AI-powered solutions are expected to drive further user acquisition” and better average revenue per user, wrote the Piper Sandler analysts, who recommend buying the stock.

Even after Friday’s rally, Adobe shares remain down 12% for the year. The stock closed at $525.31.

Don’t miss these exclusives from CNBC PRO

Adobe CEO Shantanu Narayen: People have been seeing a lot of spend in AI and infrastructure

Continue Reading

Technology

Google-backed Tempus AI pops by as much as 15% in Nasdaq stock market debut

Published

on

By

Google-backed Tempus AI pops by as much as 15% in Nasdaq stock market debut

Tempus AI CEO Eric Lefkofsky on going public: It's been an incredible journey

Tempus AI, a health-care diagnostics company that uses AI to interpret medical tests to help physicians provide more accurate treatment for their patients, rose by as much as 15% in its Nasdaq Stock Market trading debut on Friday, after going public under the ticker symbol “TEM.”

Tempus AI priced 11.1 million shares at $37 apiece on Thursday, at the top of its initial $35 to $37 target range. The company raised $410 million at an implied valuation of just over $6 billion. Its early gains, if they hold, would place the company at a valuation of roughly $7 billion.

Tempus believes that AI can help guide therapy selection and treatment decisions, in conjunction with the patient’s doctor. It generated total revenue of $531.8 million in 2023 and a net loss of $214.1 million.

“We’re on a really good trajectory,” Tempus AI CEO Eric Lefkofsky said on CNBC’s “Squawk Box” Friday morning before shares started trading. “As revenues have been growing quickly, we’re not investing all that gross profit dollar growth back into the business. We’re generating improved leverage every quarter,” he said, adding that he expects the company to be both cash flow and EBITDA positive within the next year.

More coverage of the 2024 CNBC Disruptor 50

Tempus AI is applying some of the most heavily-funded technology concepts — artificial intelligence and data analysis — to building a better, more informed medical profession. The lack of diagnostic testing early in the Covid-19 outbreak was an example of how a system as mature as our health-care infrastructure can still be unprepared for the future.

The Chicago-based company said in its IPO filing, “we endeavor to unlock the true power of precision medicine by creating Intelligent Diagnostics through the practical application of artificial intelligence, or AI, in healthcare. Intelligent Diagnostics use AI, including generative AI, to make laboratory tests more accurate, tailored, and personal. We make tests intelligent by connecting laboratory results to a patient’s own clinical data, thereby personalizing the results.” 

The two-time CNBC Disruptor 50 company’s at-home testing kit was quickly rolled out during the pandemic, but the problem Tempus is attacking is not Covid-specific. The Tempus idea came to Lefkofsky, also known for co-founding Groupon, during frustration with the health-care system after his wife received a breast cancer diagnosis. Oncology is a primary focus and the company’s genomic tests are designed to understand tumors at the molecular level and tailor treatment to individuals.

Morgan Stanley, J.P. Morgan and Allen & Company were the lead underwriters for Tempus AI’s offering.

Investors include Google, Baillie Gifford, Franklin Templeton, NEA and T. Rowe Price, according to PitchBook data.

— CNBC’s Bob Pisani contributed to this reporting.

Sign up for our weekly, original newsletter that goes beyond the annual Disruptor 50 list, offering a closer look at list-making companies and their innovative founders.

Continue Reading

Trending