Rose here. Also @umbraroze for non-kbin stuff.

  • 5 Posts
  • 70 Comments
Joined 1 year ago
cake
Cake day: June 14th, 2023

help-circle



  • Reddit has an user data checkout feature (IIRC, check out the user settings or maybe reddit help pages to find it).

    It’s a bit crap though.

    It takes a long time to process, especially if you happened to post in the era when the Reddit data infrastructure was horribly terrible instead of merely ordinarily terrible, and apparently this involves some handwork in the worst cases on behalf of the staff.

    Some data may be missing or truncated. It doesn’t give you data from privated/banned subreddits (which was a fun thing to discover because last time I tried to do this the blackouts were on), and even for legit stuff, long comments/posts may be truncated. Even so, I’m pretty sure that the dumps just straight up didn’t have all of my posts from several years ago, even if those were on public subreddits. So you need to make sure the checked out data is sensible.

    In conjunction to the official dumps, I recommend a few other tools, especially since the dumps aren’t really magnificently usable on their own. One tool that I found personally invaluable is reddit-user-to-sqlite, which allows you to import Reddit data dumps and available live user data (I think it does this by scraping or something, I’m sure it worked despite the API being shut down) to sqlite database, and Datasette is a nice frontend for browsing the posts.

    As for scrubbing, there’s tools for that are supposed to work. I think.



  • Yup. The robots.txt file is not only meant to block robots from accessing the site, it’s also meant to block bots from accessing resources that are not interesting for human readers, even indirectly.

    For example, MediaWiki installations are pretty clever in that by default, /w/ is blocked and /wiki/ is encouraged. Because nobody wants technical pages and wiki histories in search results, they only want the current versions of the pages.

    Fun tidbit: in the late 1990s, there was a real epidemic of spammers scraping the web pages for email addresses. Some people developed wpoison.cgi, a script whose sole purpose was to generate garbage web pages with bogus email addresses. Real search engines ignored these, thanks to robots.txt. Guess what the spam bots did?

    Do the AI bros really want to go there? Are they asking for model collapse?



  • I’m using Finnish keyboard layout (same as Swedish basically).

    I like how AltGr+7/8/9/0 gives me { [ ] }, it’s a very nice grouping. The key next to Z is < > and you get | with AltGr, which is very handy.

    Only thing that’s mildy annoying from programming viewpoint is that for tilde and backtick, the keys do diacritics - you need to press the diacritic key and space. Backtick is especially fun, because it’s shift+acute, space. Meanwhile, the key next to 1 does § ½, which aren’t that handy most of the time. I often just stick backtick on that key if I’m particularly assed to customise keyboard keyouts. Similarly, shift+4 is ¤, which is another not a particularly useful character (but I don’t mind that, because £ $ € all need to be produced with AltGr, which is at least consistent).



  • I’m, like, OK, nuclear power isn’t necessarily a bad thing.
    But power plants like that should probably serve wider municipal needs.

    Building a private nuclear power plant just to power a data center? Well that’s clearly stupid.
    Building a private nuclear power plant just to power a data center focused on a niche application? Well you know how that goes.

    Also, look up SL-1. Disturbingly few Americans I’ve talked to have heard about that. Generally a good argument about why not every single thing should be powered by a tiny dedicated nuclear reactor.




  • We don’t really have this whole tipping thing here.

    I’ve had coffee in two places recently. One was in a hypermarket. I don’t remember what the coffee costs there, because it came free with the meal. If the restaurant staff feel they don’t get paid enough, I don’t care if they get inspiration from France and torch every car in the parking lot. You see, I go to the hypermarket by foot. It’s not that far away.

    The other place I had coffee recently was in the train. 2.80€. I certainly hope the restaurant car staff gets paid well. They’re technically railroad employees, after all. You don’t fuck with railroad workers.


  • In middle of a couple of worldbuilding projects. Haven’t really had much good ideas for the fantasy project lately.

    Ah HA! Maybe I’ll do some mild subversion of expectations.
    Maybe one of the most famous sites in this world, where people come to visit from far and wide, has a tiny old withered tree.
    …I mean, there could be a lot of legitimate logical reasons why this site could me important. Maybe the tree has a really fascinating story behind it.
    Heck, there’s probably many such places on our world too! Can think of at least one from the top of my mind.
    I should write this down.

    Last year I felt really crappy as far as my writing projects go, but in the last few months, if there’s one thing I’ve learned it’s that even smallest ideas can sometimes break the writer’s block. Keep writing them down!




  • umbraroze@kbin.socialtoLinux@lemmy.mlLinux Boomers
    link
    fedilink
    arrow-up
    21
    arrow-down
    1
    ·
    9 months ago

    So yeah, Xfce looks the same as it did 10 years ago.

    And?

    Desktop environment is meant to launch apps and give me windows and maybe have a file manager. Xfce does that. It’s a desktop environment.

    Hey, “modern” desktop environment enthusiasts, if you bring Compiz back from the dead, give us luddites a call, will you? Ohhhh you kids should have seen it back in the day. Windows and Mac users saw Compiz in action and were, like, “wat.” You don’t get them to react that way to modern Linux desktops, no. And all that is lost now. Thanks Wayland.