2024. 3. 31. 17:05ㆍU.S. Economic Stock Market Outlook
Why should Nvidia be seen as the most powerful company? [Does CUDA really make Nvidia products irreplaceable in the AI market? (My answer: Not at all)]
There are so many comments here and there that even if AI semiconductors that perform better than Nvidia come out, can they leave Nvidia products because of CUDA, and any alternative product will be impossible to replace Nvidia. Of course, I agree that CUDA is Nvidia's great technology and played a big role in its early growth. I would like to discuss how to view CUDA in the current AI market and whether the AI semiconductor market will be decided only by CUDA.
I think most of the cases where a new SW paradigm comes out are when there is a big change in hardware.
For example, it seems that the multi-thread program has been in full swing since the multi-core was released, but it adds a complicated structure that really hurts the head compared to conventional programming. Since we have to use common resources, semaphore, mutex, synchronization, etc. What about cache cohesion… I wonder if making SW should be really complicated like this, but why should I use it? Single core is slow to develop, and if you stick to the existing architecture anymore, performance improvement becomes physically impossible, so there's nothing we can do. Even if it's not compatible with the SW we've made so far, everyone says, "Let's move on to the concept of multi-threads."
In most cases, it is really difficult to change the system in which exclusive software has been implemented. It is not compatible with the existing investment infrastructure, and SW engineers need to be trained anew, and the reconstruction of the SW industry occurs. As you can see from the reconstruction issues, it is not easy to negotiate reconstruction by persuading people who want to stay here. Nevertheless, reconstruction can be regarded as a case where a new paradigm of hardware comes out and new clothes are worn accordingly, but the new hardware paradigm is not often present. So CPU is still solidified by the x86 system, mobile is largely dominated by ARM ISA (of course, ARM's domain is expanding to servers these days), and each system is solidified.
In the case of CUDA, it was difficult to be recognized as a new SW system as much as (or more) multi-threads. In general, the disadvantage is that the more hardware knowledge is required, the more difficult it is to make SW (multi-threads inevitably require some hardware knowledge). In the case of GPUs, it is quite troublesome to make SW because they use Scratchpad memory like ordinary DSPs. In the case of CPUs, it is quite convenient that the cache itself can only have DRAM for memory, but in the case of GPUs, there is a fatal difficulty that we have to set and manage the internal memory management of the GPU on our own. So, CUDA is in the spotlight for simplifying these management difficulties to some extent, but there is still a disadvantage that you have to understand the GPU structure itself well in order to do CUDA well.
So far, we've talked in general, and let me tell you why CUDA is not an irreplaceable part of Nvidia today.
1. CUDA code is (very) incompatible every time a new GPU comes out.
Nvidia products themselves are changing the hardware structure a lot. For example, new features are added a lot whenever they evolve to the A100, H100, etc. Tensor core is new, precision is new, and since it is a Scratchpad memory, the optimization is fundamentally different every time a chip changes.
In the case of CPUs, because of the convenience of memory management called cache, the software created now is likely to work on the old CPU, while in the GPU, which is a Scratchpad memory method, the previously optimized CUDA code is very likely not to work whenever a new chip comes out. This is especially true for codes that have tight memory management. I think it is right to see that the CUDA code has a separate code for H100 and a code for A100. Unlike CPUs, GPUs have a disadvantage in that when the hardware structure changes, the software basically has to change. We won't continue to use only one product like H100, right? But why is there a saying that CUDA should be used because of its compatibility with the existing CUDA code?
2. High-performance GPUs are very limited in demand
In terms of infrastructure recycling, GPUs are not something that very many people use in their daily lives. If everyone in the home has one H100, and I bought it expensive, but I want to use it for a long time, compatibility is really important. But how many places do they buy H100? Of course, it would be nice to have two or three H100s in each house… So, the main customer base is limited to a few like CSPs. However, CSPs are just expanding their data centers, and they don't have the enormous existing data center compatibility problems. It's just the beginning. CUDA compatibility is a big issue for CSPs who don't even hesitate to make their own AI chips? I don't think so at all.
3. In a big market like LLM, it's not the time to talk about CUDA (only performance and power!)
It's an LLM market that suddenly makes a lot of money, and there are dozens of LLM models, not thousands, and they're all based on Transformer (GPT infrastructure). Of course, if that's the case, all kinds of optimizations are made. In other words, it's best for someone to really do the best calculations needed for GPT calculations at the assembly level and use them as the basic kernel for others.
This is already happening. Maybe it's because the open-source camp like vLLM keeps using the good engines of FastTransformer provided by Nvidia..., They even now release it only with binary code (then you don't know what it's made of). If you look at the latest well-made Transformer-enabled kernels, it's hard for ordinary CUDA SW engineers to make, and people who really know the internal hardware of GPUs release the codes based on fusion. In short, a few experts work hard to code, and others often just bring it.
In other words, the number of people who really need to be familiar with CUDA programming is decreasing to a very small number. There's a sharp decline in the number of people who need to know the CUDA code for a wide range of people. Rather, LLM servers these days see that there will be a significant number of people who don't even know what CUDA is. In this increasingly large and target-clear market, a small number of experts often offer technology. One of the places where the fate of a chip depends on performance is semiconductors, for example, 5G modems. Do we see software convenience when comparing 5G modem chips? No. We only look at two things: power and performance. The same goes for the LLM market right now. First of all, is it possible to run a huge LLM model? It's a question of whether it can be serviced at an overwhelming cost compared to other chips.
Recently, as many people share, model size is growing much faster than GPU development. Indeed, is there a way to run these models? Or can we run LLM services at a reasonable cost? In the meantime, wouldn't it be inappropriate to discuss software compatibility?
I mentioned above that when the hardware paradigm changes, the existing SW system can be ignored. It's even more radical now. In other words, it's a time when the AI model paradigm change, which is much faster than the hardware paradigm, is taking place at a terrifying pace. In the midst of this, we shouldn't be too obsessed with preserving the software ecosystem.
And as I mentioned above, GPU basically requires DSP programming. How can software compatibility be obtained from this structure...
It's very clear, of course, that the concept of CUDA played a really big role, because before, DSP programming was more like an assembly of low-level C programming, and it provided a more convenient interface when it was perceived as a 3D industry..
If CUDA still talks about Nvidia as non-fungible software, I think it's an understandable promotional tool that Nvidia can do. But is this why it's hard for new competitors to come in? First of all, competitors have to show good performance and power. There are companies/people everywhere who want to replace Nvidia. But why aren't there still many companies like that? (My experience is that most companies can't even spin LLM…)…)
I wrote it for a long time, but without the essential discussion or context of CUDA, the phrase 'NVIDIA is irreplaceable because there is CUDA!' seems to come up often, so I discussed it for a long time.
'U.S. Economic Stock Market Outlook' 카테고리의 다른 글
To do something, something has to be different (0) | 2024.03.31 |
---|---|
The console buys a console for a certain game, (0) | 2024.03.31 |
<The World We Were About to Make> (0) | 2024.03.31 |
Originally formulated in the early Alien series, (0) | 2024.03.31 |
To get hurt in Australia and take an X-ray (0) | 2024.03.31 |