AI hallucinations

Good morning. In case you missed it, a Canadian woman named Pascale Ferrier just got a 22-year subscription to a Washington prison for sending a not-so-sweet letter to former President Trump. The catch? It was laced with the poison ricin. Ferrier, who had previously enjoyed a Texas jailhouse vacation in 2019, pleaded guilty to breaking the 'no-bio-weapons-in-the-mail' rule. The recipients of her toxic fan mail included Trump and some Texas police officials. None were injured in the process.

Pascale, you’re ruining our image over here, eh.

Let’s jump into today’s storylines.

In today’s digest:

  • Sometimes I’m right, sometimes I…hallucinate?

  • Headline Hustle: Bitcoin tumbles below $26K, UAW strike costs $5 billion

  • Evergrande files for bankruptcy

  • Pulse Points: What’s Trending

TECH

Decoding AI: From hallucinations to math scores

In the ever-evolving world of artificial intelligence, the race to be the best is fierce. CNBC recently delved into this AI arena, pitting some of the industry's top models against each other. The contenders? Microsoft-backed OpenAI's GPT-4, Meta's Llama 2, Anthropic's Claude 2, and Cohere AI. And the results were let's say…a mixed bag.

The Good, The Bad, and The Hallucinating

Researchers from Arthur AI took on the Herculean task of testing these AI giants. Their findings? If these AI models were in high school, GPT-4 would be the math whiz, Llama 2 the average Joe, Claude 2 the self-aware philosopher, and Cohere AI would be the one confidently blurting out wrong answers in class.

AI hallucinations, where models fabricate information, have become a hot topic, especially with the looming 2024 U.S. presidential election. Arthur AI's research aimed to shed light on these "hallucinations" by testing the models on various subjects, from combinatorial mathematics to Moroccan political leaders.

The Scorecard

  • OpenAI's GPT-4: The top performer overall, especially in math. It's improved from its predecessor, GPT-3.5, hallucinating less and acing most tests.

  • Meta's Llama 2: A bit of a daydreamer, it hallucinates more than GPT-4 and Claude 2. It's not the top of the class but not the bottom either.

  • Anthropic's Claude 2: A close competitor to GPT-4, it shines in U.S. presidents' knowledge and is quite self-aware, only answering when sure.

  • Cohere AI: The bold one, never hedging its answers. However, it's also the most likely to "hallucinate" and give overconfident wrong answers.

Hedging and self awareness. An interesting aspect of the research was how these models hedge their answers to the answers they provided to earlier questions. GPT-4, for instance, increased its hedging by 50% compared to its previous version, leading some users to find it more frustrating. Cohere AI, on the other hand, never hedged its bets, always answering straight up. Claude 2 stood out for its self-awareness, accurately gauging its knowledge boundaries.

In closing…Adam Wenchel, co-founder and CEO of Arthur, emphasized the importance of understanding how these models perform in real-world applications. While benchmarks provide insights, the real test lies in how these AIs function in practical scenarios. So, while the race for the best AI continues, it's clear that understanding their strengths, weaknesses, and quirks is crucial for users and businesses alike.

WORLD

Headline Hustle

₿ Bitcoin drops below $26,000. Bitcoin's abrupt tumble to $26,593.68, reflecting a decline of more than 8%, sent shockwaves through the crypto market. The virtual currency's sharp drop came just hours after The Wall Street Journal reported that SpaceX, led by Elon Musk, wrote down the value of its bitcoin holdings by $373 million in 2022 and 2021, and sold the virtual currency. Ryan Rasmussen, a researcher at Bitwise Asset Management, described the selloff as "one of the most brutal minute-by-minute selloffs we've seen in the history of bitcoin." He noted that the decline is "short-sighted and largely retail-driven," with speculation pointing to an Elon Musk/SpaceX-driven selloff. Bitcoin's plunge also coincided with pressure from the Federal Reserve's recent policy meeting minutes. The cryptocurrency slumped to its lowest level in almost two months, adding to the complexity of the situation.

🚗 UAW (union auto worker) strike could cost $5 billion in 10 days. The United Auto Workers (UAW) union is revving up for a potential strike against Detroit's Big Three automakers (General Motors, Ford Motor, and Stellantis) when current labor contracts expire next month. According to a report by Anderson Economic Group (AEG), a Michigan-based consulting firm, the economic loss could quickly accelerate to more than $5 billion after 10 days. The numbers don’t look pretty: GM risks losing $380 million in 10 days, Ford $325 million, and Stellantis $285 million. As the clock ticks down to 11:59 p.m. ET on September 14, and with billions on the line, the auto industry braces for a potential showdown that could shift gears on the economic landscape.

GLOBAL BUSINESS

Evergrande's dramatic fall from grace

In the world of real estate, few names shone as brightly as China's Evergrande Group. Once hailed as the country's second-largest property developer, Evergrande's meteoric rise was the stuff of legends. But as the old saying goes, "The bigger they are, the harder they fall." And fall Evergrande did, all the way to a bankruptcy court in New York.

A borrowing frenzy gone wrong.

Evergrande's story is a cautionary tale of ambition, risk, and the perils of over-leverage. The company, in its heyday, borrowed with an almost reckless abandon. But when the bills started piling up in 2021, the cracks began to show. The default sent shockwaves through China's property market, a sector that once contributed to as much as 30% of the nation's GDP.

But Evergrande wasn't just about towering skyscrapers and luxury apartments. The company had a diverse portfolio that spanned electric vehicles, healthcare, and even theme parks. It's like imagining a real estate giant suddenly deciding to compete with Tesla, Universal Studios, and Mayo Clinic all at once.

With such a vast empire, the numbers were bound to be staggering. Evergrande's debt load skyrocketed to an eye-watering 2.437 trillion yuan (around $340 billion). To put that in perspective, that's roughly equivalent to 2% of China's entire GDP. And if you were one of the brave souls who invested in them, you might want to sit down for this - the company reported a jaw-dropping loss of $81 billion of shareholder money in a span of two years.

A glimmer of hope on the horizon. But it's not all doom and gloom. In the midst of this financial storm, Evergrande unveiled a debt restructuring plan, touted as China's largest on record. The company expressed optimism, stating that the restructuring would "alleviate the company’s pressure of offshore indebtedness" and help it get back on track.

On the bright side…they've got a plan. Evergrande aims to get back on its feet in three years, but they're going to need a cash injection of up to $43.7 billion. And just when things looked bleak, Dubai's NWTN swooped in with a $500 million investment for a slice of Evergrande's EV pie.

Pulse Points

SpaceX finally turns a profit. The Wall Street Journal reported on Thursday that SpaceX achieved profitability in the first quarter, thanks to a significant increase in revenue, according to documents that outline the privately held company's quarterly and annual performance.

Mortgage rates in the US are the highest they’ve been since 2021. This week, US mortgage rates soared to their loftiest point in 21 years. Data released by Freddie Mac on Thursday revealed that the 30-year fixed-rate mortgage averaged 7.09% for the week ending August 17, a climb from the previous week's 6.96%. Just a year ago, the 30-year fixed-rate stood at 5.13%.

Canadian wildfires cause 20,000 to evacuate. On Thursday, Canadian fire crews were engaged in a fierce struggle to keep wildfires from advancing to the northern city of Yellowknife. Following an evacuation order, all 20,000 residents are departing the city by car and plane. Read more about it here.

Thank you for reading! Let us know what you thought about this edition by replying to this email.