- Market Minute
- Posts
- Market Minute: Weekend Spotlight on DeepSeek & Large Language Models
Market Minute: Weekend Spotlight on DeepSeek & Large Language Models

DeepSeek deeply upset markets this week by dropping its R1 large-language model (LLM) that can compete with OpenAI’s ChatGPT. Though ChatGPT still performs better in some ways, R1 claims to cost $6M per training run – far cheaper than U.S. competitors. In this column, I’ll explain parts of why that is.
Mirella Lapata, a professor in the School of Informatics at the University of Edinburgh, in a talk for the Turing Lectures points out that generative A.I. has been around for a long time: Google Translate and Siri both generate answers based on user prompts.
“Size is all that matters, I’m afraid,” she says: the extreme increase in model sizes over the last few years (which needs more chips, more data centers, more power) has brought the number of parameters GPT-4 uses to over a trillion. This is more than a rat’s brain (though below the 100 trillion of the human brain). Looking at language models, however, she points out that the A.I. can digest more of human writing than any human could in their lifetime.
“We’re going to plateau at some point,” she says, as A.I. runs out of data to ingest. But the bigger the model, the more tasks it is able to take on, including unforeseen uses.
DeepSeek’s R1 model is a challenge to the idea that size is everything. It endangers the “biggest is best” thesis that American companies have been pouring money into, that has driven outsized power demands and filled Nvidia’s (NVDA) coffers. Its open-source nature also allows anyone else to build on their work. Bloomberg reports that Microsoft (MSFT) is investigating whether a DeepSeek-linked group improperly accessed OpenAI’s data. Whether or not that’s true, the cat is already out of the bag – and the data may be less important than the actual structure of its A.I. model. Let’s talk about how LLMs work.
Like we’ve talked about previously in this column, the A.I. we’re familiar with – like ChatGPT – is built and trained to predict the most likely answer. Remember, computers can’t read the way that we do, so this prediction model not only is for the answer as a whole, but, say, what word should come next in a sentence.
Researchers building an LLM split up language into its bit components and tokenize them: essentially, giving them an identifying “tag” that the computer can understand. For example, the word “recede” could be broken up into tokens letter by letter, or more expediently, into “re” and “cede”, both of which are reused in other words like “refresh” or “decedent”. The LLM is doing this with hundreds and thousands of different tokens.
Now, it needs to recombine these tokens to output the answer we’re looking for, and we get into some difficult mathematics territory.
Each token is assigned a number or series of numbers, which then is encoded into a vector. A vector, which is a quantity that has both direction and magnitude, is often expressed as a straight arrow. Next, the computer plots these vectors on an axis, like coordinates. The resulting graph is a mathematical representation of words.
That’s very well and good, but the LLM has been asked a question, and needs to figure out the most likely response. It’s broken down the input into something it can understand, now it needs to recombine its tokens into an answer.
Anyone remember the point-slope equation, y = mx + b, which allows you to calculate a particular point on a graph? The computer is also using an equation to find a particular point on its vector graph, which is a parameter. Simply put, a parameter is a way to sort similarities (a very simple one would be “anteater and alimony both begin with ‘a’.”) Think of it like a multi-dimensional Venn diagram: the circles and their overlaps are the parameters. Venn diagrams uncover patterns within a given group: similarities and differences of any kind.
Plotting the parameters on its vector axis, the A.I. discovers what tokens “go together”, and can use that to output an answer. Like mentioned above, A.I. is capable of using over a trillion parameters. That takes enormous computing power. However, at the end of the day, many parameters almost never get used. While U.S. A.I. keeps them anyway, DeepSeek’s researchers pruned the parameters their model can use – drastically lowering the energy and computing power needed.
The behind-the-scenes working of this math allows A.I. to produce incredible answers, but also explains why models struggle with bad data. It has a way to check similarity and probability, but not veracity. It is only as smart as the data it’s trained on – our critical thinking skills remain far beyond what the computer is capable of. However, since it is constantly expanding its dataset from interactions with users and other data feeds, it “learns” and refines its outputs. An A.I. could probably give a decent tarot reading.
Another efficiency DeepSeek introduced is how the A.I.’s data is stored on the chips it has, but that’s a column for another day.
DeepSeek’s model can revolutionize the A.I. industry, which has narrowed to companies with endless pockets: Elon Musk’s Grok, OpenAI and Microsoft’s (MSFT) partnership, Meta Platforms’ (META) Llama. The investment thesis for Nvidia (NVDA) and other advanced semiconductor and tech companies has been that we can only be the best through massive expenditure: more chips, more parameters, more energy. DeepSeek’s model forces us to rethink this thesis, and to see if these companies can retrench and restrategize, or if the industry got too far over its skis and needs a correction.
DeepSeek Discussion
Tune in live from 8 a.m. to 5 p.m. ET, or anytime, anywhere, on‑demand.
Or stream it via thinkorswim® and thinkorswim Mobile, available through our broker-dealer affiliate, Charles Schwab & Co., Inc
Please do not reply to this email. Replies are not delivered to Schwab Network. For inquiries or comments, please email [email protected].
See how your information is protected with our privacy statement.
Charles Schwab and all third parties mentioned are separate and unaffiliated, and are not responsible for one another's policies, services or opinions. Schwab Network is brought to you by Charles Schwab Media Productions Company (“CSMPC”). CSMPC is a wholly owned subsidiary of The Charles Schwab Corporation and is not a financial advisor, registered investment advisor, broker-dealer, or futures commission merchant.