• 0 Posts
  • 31 Comments
Joined 2 years ago
cake
Cake day: June 15th, 2023

help-circle
  • It appears like reasoning because the LLM is iterating over material that has been previously reasoned out. An LLM can’t reason through a problem that it hasn’t previously seen

    This also isn’t an accurate characterization IMO. LLMs and ML algorithms in general can generalize to unseen problems, even if they aren’t perfect at this; for instance, you’ll find that LLMs can produce commands to control robot locomotion, even on different robot types.

    “Reasoning” here is based on chains of thought, where they generate intermediate steps which then helps them produce more accurate results. You can fairly argue that this isn’t reasoning, but it’s not like it’s traversing a fixed knowledge graph or something.


  • All of the “AI” garbage that is getting jammed into everything is merely scaled up from what has been before. Scaling up is not advancement.

    I disagree. Scaling might seem trivial now, but the state-of-the-art architectures for NLP a decade ago (LSTMs) would not be able to scale to the degree that our current methods can. Designing new architectures to better perform on GPUs (such as Attention and Mamba) is a legitimate advancement. Furthermore, the viability of this level of scaling wasn’t really understood for a while until phenomenon like double descent (in which test error surprisingly goes down, rather than up, after increasing model complexity past a certain degree) were discovered.

    Furthermore, lots of advancements were necessary to train deep networks at all. Better optimizers like Adam instead of pure SGD, tricks like residual layers, batch normalization etc. were all necessary to allow scaling even small ConvNets up to work around issues such as vanishing gradients, covariate shift, etc. that tend to appear when naively training deep networks.


  • I agree that pickle works well for storing arbitrary metadata, but my main gripe is that it isn’t like there’s an exact standard for how the metadata should be formatted. For FITS, for example, there are keywords for metadata such as the row order, CFA matrices, etc. that all FITS processing and displaying programs need to follow to properly read the image. So to make working with multi-spectral data easier, it’d definitely be helpful to have a standard set of keywords and encoding format.

    It would be interesting to see if photo editing software will pick up multichannel JPEG. As of right now there are very few sources of multi-spectral imagery for consumers, so I’m not sure what the target use case would be though. The closest thing I can think of is narrowband imaging in astrophotography, but normally you process those in dedicated astronomy software (i.e. Siril, PixInsight), though you can also re-combine different wavelengths in traditional image editors.

    I’ll also add that HDF5 and Zarr are good options to store arrays in Python if standardized metadata isn’t a big deal. Both of them have the benefit of user-specified chunk sizes, so they work well for tasks like ML where you may have random accesses.


  • I guess part of the reason is to have a standardized method for multi and hyper spectral images, especially for storing things like metadata. Simply storing a numpy array may not be ideal if you don’t keep metadata on what is being stored and in what order (i.e. axis order, what channel corresponds to each frequency band, etc.). Plus it seems like they extend lossy compression to this modality which could be useful for some circumstances (though for scientific use you’d probably want lossless).

    If compression isn’t the concern, certainly other formats could work to store metadata in a standardized way. FITS, the image format used in astronomy, comes to mind.





  • It really depends on what you’re looking for. Are you just looking to learn how to print new materials, or do you have specific requirements for a project?

    If it’s the former, I’d say the easiest thing to try is PETG. It prints pretty reasonably on most printers though has stringing issues. It has different mechanical properties that make it suitable for other applications (for example, better temperature resistance and impact strength). It’ll be much less frustrating than trying to dial in ABS for the first time.

    ABS and TPU are both a pretty large step up in difficulty, but are quite good for functional parts. If you insist on learning one of these, pick whichever one fits with your projects better. For ABS you’ll want an enclosure and a well ventilated room (IMO I wouldn’t be in the same room as the printer) as it emits harmful chemicals during printing.




  • Useless is a strong term. I do a fair amount of research on a single 4090. Lots of problems can fit in <32 GB of VRAM. Even my 3060 is good enough to run small scale tests locally.

    I’m in CV, and even with enterprise grade hardware, most folks I know are limited to 48GB (A40 and L40S, substantially cheaper and more accessible than A100/H100/H200). My advisor would always say that you should really try to set up a problem where you can iterate in a few days worth of time on a single GPU, and lots of problems are still approachable that way. Of course you’re not going to make the next SOTA VLM on a 5090, but not every problem is that big.



  • Exactly, the assumption (known as the inductive hypothesis) is completely fine by itself and doesn’t represent circular reasoning. The issue in the “proof” actually arises from the logic coming after this, in which they assume that they can form two different overlapping sets by removing a different horse from the total set of horses, which fails if n=1 (as then they each have a single, distinct horse).


  • I’m fairly certain blockchain GPUs have very different requirements than those used for ML, especially not LLMs. In particular they don’t need anywhere as much VRAM and generally don’t require floating point math, nor do they need features like tensor cores. Those “blockchain GPUs” likely didn’t turn into ML GPUs.

    ML has been around for a long time. People have been using GPUs in ML since AlexNet in 2012, not just after blockchain hype started to die down.






  • This was also one of my concerns with the hype surrounding low cost SLS printers like Micronics, especially if they weren’t super well designed. The powder is incredibly dangerous to inhale so I wouldn’t want a home hobbyist buying that type of machine without realizing how harmful it could be. My understanding is even commercial SLS machines like HP’s MJF and FormLab’s Fuse need substantial ventilation (HEPA filters, full room ventilation, etc.) in order to be operated safely.

    Metal is of course even worse. It has all the same respiratory hazards (the fine particles will likely cause all sorts of long-term lung damage) but it also presents a massive fire and explosion risk.

    I can’t see these technologies making it into the home hobbyist sphere anytime soon as a result, unfortunately.


  • I have tried them, and to be honest I was not surprised. The hosted service was better at longer code snippets and in particular, I found that it was consistently better at producing valid chain of thought reasoning chains (I’ve found that a lot of simpler models, including the distills, tend to produce shallow reasoning chains, even when they get the answer to a question right).

    I’m aware of how these models work; I work in this field and have been developing a benchmark for reasoning capabilities in LLMs. The distills are certainly still technically impressive and it’s nice that they exist, but the gap between them and the hosted version is unfortunately nontrivial.