I keep seeing NSFW posts about “araffe women” and I have no idea what that means. Google has no idea either.

  • IvanOverdrive@lemm.ee
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    4 months ago

    There’s some AI model that used it. Its meaning is related to “relaxed and lounging around”. I think it appears in automated posts. BLIP can interrogate an image and produce a text description. I suspect that’s what happening.

  • Lvxferre@mander.xyz
    link
    fedilink
    arrow-up
    1
    ·
    4 months ago

    It seems to me to be a word that image description generators (and potentially image generators, too) “believe” to exist. If that’s correct that was likely caused by parsing chunks of actual words, such as “arabesque”, “Arabic”, “coffee”, “giraffe”, as if ara+ffe were two actual morphemes (units of meaning).

    For reference, this site claims that [I think?]diffusion models think that a similar word, “arafed”, exists; and that’s basically going slow, taking one’s time, leisurely.

    It isn’t necessarily NSFW by the way.

  • fullofredgoo@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    This sounds like it might be something similar to a ‘noken’, or maybe just a regular token which represents a word fragment, a concept I picked up from this article. https://www.lesswrong.com/posts/c6uTNm5erRrmyJvvD/mapping-the-semantic-void-strange-goings-on-in-gpt-embedding

    “TL;DR: GPT-J token embeddings inhabit a zone in their 4096-dimensional embedding space formed by the intersection of two hyperspherical shells. This is described, and then the remaining expanse of the embedding space is explored by using simple prompts to elicit definitions for non-token custom embedding vectors (so-called “nokens”). The embedding space is found to naturally stratify into hyperspherical shells around the mean token embedding (centroid), with noken definitions depending on distance-from-centroid and at various distance ranges involving a relatively small number of seemingly arbitrary topics (holes, small flat yellowish-white things, people who aren’t Jews or members of the British royal family, …) in a way which suggests a crude, and rather bizarre, ontology. Evidence that this phenomenon extends to GPT-3 embedding space is presented. No explanation for it is provided, instead suggestions are invited.”

    In particular I was reminded of the list of tokens near the beginning of the article, and how it contains not just words, but also fragments of words, prefixes, and things like that. I’m also reminded of another article (which I can’t find right now) about people finding ways to bypass word filters by utilizing nonsense words that the LLM has mistakenly associate with some meaning. From what others have said in this thread, ‘araffe’ sounds like it might be something like that