• Rayspekt@lemmy.world
    link
    fedilink
    English
    arrow-up
    61
    ·
    5 months ago

    When will scientists just self-publish? I mean seriously, nowadays there is nothing between a researcher and publishing their stuff on the web. Only thing would be peer-reviewing, if you want that, but then just organize it without Elsevier. Reviewers get paid jack shit so you can just do a peer-reviewing fediverse instance where only the mods know the people so it’s still double-blind.

    This system is just to dangle carrots in front of young researchers chasing their PhD

    • GingaNinga@lemmy.world
      link
      fedilink
      English
      arrow-up
      68
      ·
      5 months ago

      Because of “impact score” the journal your work gets placed in has a huge impact on future funding. Its a very frustrating process and trying to go around it is like suicide for your lab so it has to be more of a top-down fix because the bottom up is never going to happen.

      Thats why everyone uses sci hub. These publishers are terrible companies up there with EA in unpopularity.

      • WhatAmLemmy@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        ·
        5 months ago

        It sounds like all it would take to destroy the predatory for-profit publication oligarchs is a majority of the top few hundred scientists, across major disciplines, rejecting it and switching to a completely decentralized peer-2-peer open-source system in protest… The publication companies seem to gate keep, and provide no value. It’s like Reddit. The site’s essentially worthless. All of the value is generated by the content creators.

        • GingaNinga@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 months ago

          Ya that would be awesome and I think that movement would gain momentum really fast since most high profile labs have all had to deal with this nonsense.

          That or legislation/open access rules to make these papers more accessible. One can dream.

          • Rolando@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            ·
            5 months ago

            most high profile labs have all had to deal with this nonsense.

            It’s even worse for low profile labs because those publication fees eat up a greater proportion of our budget.

        • skillissuer@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          the thing that they’re supposed to provide is peer review, solve that and we’re good to go. would be easier to do with some kind of central oversight and stable funding, we’re not talking about shitposting instance for 250 people that nobody will notice if it goes down

      • Rayspekt@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 months ago

        I know about impact factor but still this system is shit and only works because people contribute to it.

    • macarthur_park@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      ·
      5 months ago

      When will scientists just self-publish?

      It’s commonplace in my field (nuclear physics) to share the preprint version of your article, typically on arxiv.org. You can update the article as you respond to peer reviewers too. The only difference between this and the paywalls publisher version is that version will have additional formatting edits by the journal.

      If you search for articles on google scholar, it groups the preprint and published versions together so it’s easy to find the non-paywalled copy. The standard journals I publish in even sort of encourage this; you can submit the latex documents and figures by just putting the url to an arxiv manuscript.

      The US Department of Energy now requires any research they fund be made publicly available. So any article I publish is also automatically posted to osti.gov 1 year after its initial publication. This version is also grouped into the google scholar search results.

      It’s an imperfect system, but it’s getting much better than it was even just a decade ago.

      • Rayspekt@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        5 months ago

        Yeah I know about this, but personally in our field I don’t see anybody bothering with preprints sadly. Maybe we should though, sounds like the first step.

    • half coffee@lemy.lol
      link
      fedilink
      English
      arrow-up
      7
      ·
      5 months ago

      We (I’m a CS researcher) already kind of do, I upload almost everything to arxiv.org and researchgate. Some fields support this more than others, though.

      • Rayspekt@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        We should just self publish and then openly argue about it findings like the OG scientists. It didn’t stop them from discovering anything.

        • ✺roguetrick✺@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          5 months ago

          Bone wars electric bugaloo. In the end you really do need a way to discern who is having an appreciable impact in a field in order to know who to fund. I have yet to hear a meaningful metric for that though.

          Edit: I should clarify, the other option is strictly political through an academy of sciences and has historical awfulness associated with it as well.

        • VeganPizza69 Ⓥ@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          Editors can act as filters, which is required when dealing with an excess of information streaming in. Just like you follow celebrities on social media or you follow pseudo-forums like this one, you get a service of information filtration which increases the concentration of useful knowledge.

          In the early days of modern science, the rate of publications was small, make it easier to “digest” entire fields even if there’s self-publishing. The number of published papers grows exponentially, as does the number of journals. https://www.researchgate.net/publication/333487946_Over-optimization_of_academic_publishing_metrics_Observing_Goodhart’s_Law_in_action/figures

          Just like with these forums, the need for moderators (editors, reviewers) grows with the number of users who add content.

    • galoisghost@aussie.zone
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      I agree but if it was that easy it would have been done already and there would already be another evil gatekeeper to hate.

  • Passerby6497@lemmy.world
    link
    fedilink
    English
    arrow-up
    39
    ·
    5 months ago

    That’s where you print the downloaded PDF to a new PDF. New hash and same content, good luck tracing it back to me fucko.

    • Syn_Attck@lemmy.today
      link
      fedilink
      English
      arrow-up
      40
      ·
      edit-2
      5 months ago

      Unfortunately that wouldn’t work as this is information inside the PDF itself so it has nothing to do with the file hash (although that is one way to track.)

      Now that this is known, It’s not enough to remove metadata from the PDF itself. Each image inside a PDF, for example, can contain metadata. I say this because they’re apparently starting a game of whack-a-mole because this won’t stop here.

      There are multiple ways of removing ALL metadata from a PDF, here are most of them.

      It will be slow-ish and probably make the file larger, but if you’re sharing a PDF that only you are supposed to have access to, it’s worth it. MAT or exiftool should work.

      Edit: as spoken about in another comment thread here, there is also pdf/image steganography as a technique they can use.

        • sandbox@lemmy.world
          link
          fedilink
          English
          arrow-up
          10
          ·
          5 months ago

          it’s possible using steganographic techniques to embed digital watermarks which would not be stripped by simply printing to pdf.

            • Syn_Attck@lemmy.today
              link
              fedilink
              English
              arrow-up
              6
              ·
              5 months ago

              You should spread that idea around more, it’s pretty ingenious. I’d add first converting to B&W if possible.

          • Syn_Attck@lemmy.today
            link
            fedilink
            English
            arrow-up
            6
            ·
            edit-2
            5 months ago

            This is a great point. Image watermarking steganography is nearly impossible to defeat unless you can obtain multiple copies of the ‘same’ file from multiple users to look for differences. It could be a change of a single 5-15 pixels from one rgb code off.

            rgb(255, 251, 0)

            to

            rgb(255, 252, 0)

            Which would be imperceptable to the human eye. Depending on the number of users it may need to change more or less pixels.

            There is a ton of work in this field and its very interesting, for anyone considering majoring in computer science / information security.

            Another ‘neat’ technology everyone should know about is machine identification codes, or, the tiny secret tracking dots that color printers print on every page to identify the specific make, model, and serial number (I think?) of the printer the page was printed from. I don’t believe B&W printers have tracking dots, which were originally used to track creators of counterfeit currency. EFF has a page of color printers which do not include tracking dots on printed pages. This includes color LaserJets along with InkJets, although I would not be surprised if there was a similar tracking feature in place now or in the future “for safety and privacy reasons,” but none that I am aware of.

        • Syn_Attck@lemmy.today
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          5 months ago

          Good question. I believe the browser “Print to PDF” function simply saves the loaded PDF to a PDF file locally, so it wouldn’t work (if I’m correct.)

          I’m not an expert in this field, but you can ask on StackExchange or the author of MAT or exiftool. You can also do it yourself (I’ll explain how) by making a PDF with a jpg file with your metadata, opening it and printing to pdf, and then extract the image Do let us know your findings! I’m on a smartphone so can’t do it.

          If you do try it yourself, a note from the linked SE page is that you won’t be able to extract the original file extension (it’s unknown, so you either have to know what it is, or look at the file headers, or try all extensions), so if you use your own .jpg with your own exif data, rename to .jpg when finished (I believe exif is handled differently based on file type.)

          There are multiple tools to add exif data to an image but the exiftool website has some easy examples for our purpose.

          (do this as the first step before adding to the PDF)

          (command line here, but there are exiftool GUIs)

          exiftool -artist=“Phil Harvey” -copyright=“2011 Phil Harvey” YourFile.jpg

          Adds Phil Harvey and the copyright information to the file. If you’re on a smartphone and have the time and really have to know, then hypothetically there should be web-based tools for every step needed. I’m just not familiar with any and it’s possible the web-based tool would remove the metadata when creating or extracting the PDF.

    • ChaoticNeutralCzech@feddit.de
      link
      fedilink
      English
      arrow-up
      4
      ·
      5 months ago

      I know PDF providers who visibly print the customer’s name or number in the header of every page, along with short copyright text. I use qpdf --stream-decompress to make the PDF into human-readable PostScript, and then Python+regex to remove each header text, which stand out a bit from other PDF elements. The script throws an error if more or fewer elements than pages have been removed but that hasn’t happened yet. Processed documents sometimes have screwed-up non-ASCII characters in the Table of Contents for some reason but I don’t have the originas anymore so IDK if it’s my fault. Still, I wouldn’t share the PDFs unless in text-only or printed form because of any other steganographic shenanigans in the file. I would absolutely torrent them if I could repurchase them under a new identity and verify that the files are identical.

      BTW, has anyone figured out how to embed Python code in PDF? The whitespace always gets reencoded as x-coordinates so copy&pasting it never preserves indentation. No, you can’t use the Ogham Space Mark (Unicode’s only non-blank character classified as a space) for indentation in Python, I tried.

    • Olgratin_Magmatoe@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      You’d be safer IRL printing it on a printer without yellow ink, then scanning it, then deleting the metadata from the scan.

  • tuna@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    37
    ·
    5 months ago

    Imagine they have an internal tool to check if the hash exists in their database, something like

    "SELECT user FROM downloads WHERE hash = '" + hash + "';"
    

    You set the pdf hash to be 1'; DROP TABLE books;-- they scan it, and it effectively deletes their entire business lmfaoo.

    Another idea might be to duplicate the PDF many times and insert bogus metadata for each. Then submit requests saying that you found an illegal distribution of the PDF. If their process isn’t automated it would waste a lot of time on their part to find the culprit Lol

    I think it’s more interesting to think of how to weaponize their own hash rather than deleting it

    • thesporkeffect@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      ·
      5 months ago

      That’s using your ass. This is an active threat to society and it demands active countermeasures.

      I’d bet they have a SaaS ‘partner’ who trawls SciHub and other similar sites. I’ll try to remember to see if there is any hint of how this is being accomplished over the next few days.

    • NeatNit@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      18
      ·
      edit-2
      5 months ago

      I feel like this will cause quality degradation, like repeatedly re-compressing a jpeg. Relevant xkcd

      Edit: though obviously for most use cases it shouldn’t matter

      • Passerby6497@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        ·
        5 months ago

        Why would it cause degradation? You’re not recompressing anything, you’re taking the visible content and writing it to a new PDF file.

        • NeatNit@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 months ago

          You’re pushing it through one system that converts a PDF file into printer instructions, and then through another system that converts printer instructions into a PDF file. Each step probably has to make adjustments with the data it’s pushing through.

          Without looking deeply into the systems involved, I have to assume it’s not a lossless process.

          • TomSelleck@lemm.ee
            link
            fedilink
            English
            arrow-up
            5
            ·
            5 months ago

            You should maybe look a bit more into it. How do you think commercial printers or even hobbyists maintain fidelity in their images? Most images pass through multiple programs during the printing process and still maintain the quality. It’s not just copy/paste.

            • NeatNit@discuss.tchncs.de
              link
              fedilink
              English
              arrow-up
              2
              ·
              5 months ago

              They maintain a high quality but not lossless.

              As a trivial example, if you use the wrong paper size (like Letter instead of A4) then it might crop parts of the page or add borders or resize everything. Again I’ll admit, in 99% of cases it doesn’t matter, but it might matter if, say, an embedded picture was meant to be exactly to scale.

              • TomSelleck@lemm.ee
                link
                fedilink
                English
                arrow-up
                2
                ·
                5 months ago

                My friend, I worked in commercial printing for 2 decades. You’re still making assumptions that are wrong. There are ways to transfer files that are lossless and even ways to improve and upscale artwork. Why do you care so much about this?

          • 4am@lemm.ee
            link
            fedilink
            English
            arrow-up
            4
            ·
            5 months ago

            Those printer instructions are called Postscript and they’re the basis of PDF.

            You are thinking that the printing process will rasterize the PDF and then essentially OCR/vector map it back. It’s (usually) not that complicated.

          • Diplomjodler@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            ·
            5 months ago

            You’re still wrong. the only place where it could cause quality loss if embedded bitmap images are compressed with lower quality settings (which you can adjust). PDF is a vector format, i.e. a mathematical description of what is to be rendered on screen. It was explicitly designed to be scalable, transmittable and rendered on a wide variety of devices without quality loss.

            • NeatNit@discuss.tchncs.de
              link
              fedilink
              English
              arrow-up
              1
              ·
              5 months ago

              No point discussing this if neither of us is going to prove it one way or the other.

              Bitmaps are actually a key part of what I was thinking about, so you agree with me there it seems. There’s also the issue of using the wrong paper size. .IIRC Windows usually defaults to Letter for printing even in places where A4 is the only common size and no one has heard of Letter, and most people don’t realise their prints are cropped/resized. This would still apply when printing to PDF.

              • Diplomjodler@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                ·
                edit-2
                5 months ago

                My point is that all these things can be controlled in the settings of your PDF printer driver. So it’s not completely straightforward but definitely doable.

      • Turun@feddit.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 months ago

        I don’t understand the “that’s no how PDFs work” criticism.

        Removing data from the original file is the whole point of the exercise! Of course unique tokens can be hidden in plain sight in images, letter spacing, etc. If we want to make sure to remove that we need to degrade the quality of the PDF so that this information is lost in said lossy conversion.

    • xenoclast@lemmy.world
      link
      fedilink
      English
      arrow-up
      18
      ·
      5 months ago

      There are tools for this already… but it sure would be nice to have a Firefox plugin that scrubs all metadata on downloads by default.

      (Note I’m hoping this exists and someone will Um, Actually me)

      • lastweakness@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 months ago

        You could write a script to automatically watch for new files in a folder and strip metadata from every file i guess. I had done something like that for images way before.

    • 4am@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      Print to PDF might just convert the PDF into Postscript instructions and back again without the original PDF’s metadata, but that probably depends on the Print to PDF software being used and its settings.

  • NeatNit@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 months ago

    I kind of assume this with any digital media. Games, music, ebooks, stock videos, whatever - embedding a tiny unique ID is very easy and can allow publishers to track down leakers/pirates.

    Honestly, even though as a consumer I don’t like it, I don’t mind it that much. Doesn’t seem right to take the extreme position of “publishers should not be allowed to have ANY way of finding out who is leaking things”. There needs to be a balance.

    Online phone-home DRM is a huge fuck no, but a benign little piece of metadata that doesn’t interact with anything and can’t be used to spy on me? Whatever, I can accept it.

    • grue@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      5 months ago

      Doesn’t seem right to take the extreme position of “publishers should not be allowed to have ANY way of finding out who is leaking things”. There needs to be a balance.

      Nah, fuck that; that’s both the opposite of an extreme position and is exactly the one we should take!

      Copyright itself is a privilege and only exists in the first place “to promote the progress of science and the useful arts.” Any entity that doesn’t respect that purpose doesn’t deserve to benefit from it at all.

      • NeatNit@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        You are arguing that Elsevier shouldn’t exist at all, or needs to be forcibly changed into something more fair and more free. I 100% agree with this.

        But my point was in general, not about Elsevier but about all digital publications of any kind. This includes indie publications and indie games. If an indie developer makes a game, and it gets bought maybe 20 copies but pirated thousands of times, do you still say “fuck that” to figuring out which “customer” shared the game?

        I agree with “fuck that” to huge publishers, and by all means pirate all their shit, but smaller guys need some way to safeguard themselves, and there’s no way to decide that small guys can use a certain tool and big guys cannot.

    • cron@feddit.de
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 months ago

      Definitely better than some of the DRM-riddled proprietary eBook formats.