![](/static/61a827a1/assets/icons/icon-96x96.png)
![](https://fry.gs/pictrs/image/c6832070-8625-4688-b9e5-5d519541e092.png)
21·
14 days agoSome commenters on this post are clearly not aware of PTX being a part of the CUDA environment. If you know this, you aren’t who I’m trying to inform.
Some commenters on this post are clearly not aware of PTX being a part of the CUDA environment. If you know this, you aren’t who I’m trying to inform.
There seems to be some confusion here on what PTX is – it does not bypass the CUDA platform at all. Nor does this diminish NVIDIA’s monopoly here. CUDA is a programming environment for NVIDIA GPUs, but many say CUDA to mean the C/C++ extension in CUDA (CUDA can be thought of as a C/C++ dialect here.) PTX is NVIDIA specific, and sits at a similar level as LLVM’s IR. If anything, DeepSeek is more dependent on NVIDIA than everyone else, since PTX is tightly dependent on their specific GPUs. Things like ZLUDA (effort to run CUDA code on AMD GPUs) won’t work. This is not a feel good story here.
The paper was published by IEEE and with professors as co-authors. Only the second author is a student. And I wouldn’t dismiss it out of hand like that because of a magazine article. Students come up with breakthroughs all the time. The paper itself says it disproves Yao’s conjecture. I personally plan to implement and benchmark this because the results seem so good. It could be another fibonacci heap situation, but maybe not. Hash tables are so widely used, that it might even be worthwhile to make special hardware to use this on servers, if our current computer architecture is only thing that holds back the performance.
Edit: author sequence