• theluddite@lemmy.mlOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    22 hours ago

    No need to apologize for length with me basically ever!

    I was thinking how you did it in the second paragraph, but even more stripped down. The algorithm has N content buckets to choose from, then, once it chooses, the success is how much of the video the user watched. Users have the choice to only keep watching or log off for simplicity. For small N, I think that @kersplomp@programming.dev is right on that it’s the multi-armed bandit problem if we assume that user preferences are static. If we introduce the complexity that users prefer familiar things, which I think is pretty fair, so users are more likely to keep watching from a bucket if it’s a familiar bucket, I assume that exploration gets heavily disincentivized and exhibits some pretty weird behavior, while exploitation becomes much more favorable. What I like about this is that, with only a small deviation from a classic problem, it would help explain what you also explain, which is getting stuck in corners.

    Once you allow user choice beyond consume/log off, I think your way of thinking about it, as a turn based game, is exactly right, and your point about bin refinement is great and I hadn’t thought of that.