@ricecake

ricecake@sh.itjust.works · 24 hours ago

Ha! I didn’t see that at first. I love “fuck you so hard that we can and will put a significant dollar value on it being more humiliating”.

ricecake@sh.itjust.works · 1 day ago

The assets were auctioned off to pay his debt to the families of the Sandy Hook shooting.
So effectively they gave money to the families of children killed in a school shooting that he slandered in cruel and vile ways.

Given that the families pretty reasonably dislike him, the added bonus of his creation being used to openly mock him and promote a message they endorse is quality icing on the cake.

ricecake@sh.itjust.works · 1 month ago

Oh, certainly. But common language has a term for high latency already, it’s just not speed related. Everyone knows about a laggy connection on a phone or video call.

Fun fact: TCP has some implicit design considerations around the maximum cost of packet retransmission on a viable link that only works on roughly local planetary scale.
When NASA started to get out to Mars with the space Internet, they needed to tweak tcp to fit retransmission being proportionally much more expensive and let connections live longer before being “broken”.

ricecake@sh.itjust.works · 1 month ago

When talking communication, most people think of the speed with which a unit quantity of information is transmitted, not the latency of that transmission.
Referring to bandwidth as the speed of a communication system is pretty normal, even for people who know how to use the term bandwidth.

ricecake@sh.itjust.works · 1 month ago

Yes, to a degree. A VPN protects you from an attacker on the same WiFi network as you and that’s about it.

Most assaults on your privacy don’t happen like that, and for the most part the attacks that do happen like that are stopped by the website using https and proper modern security.
The benefit of the VPN is that it puts some of that protection under your control, but only as far as your VPN provider.

A VPN is about as much protection from most cyber attacks as a gun is.

They’re not a security tool, they’re a networking tool. They let you do some network stuff securely, and done correctly they can protect from some things, but the point of them is “this looks like a small, simple LAN, but it’s not”.

It’s much easier to package and sell network tools than security tools, and they’re much more accepted by users, since security tools have a tendency to say “no” a lot, particularly when you might be doing something dumb,and users hate being told no, particularly when they’re doing something dumb.

ricecake@sh.itjust.works · 2 months ago

Yeah, it’s definitely faster, but I’m not sure it’s going to make too much of a difference for a Minecraft server.

With setting it up being a bit annoying by hand, I’d still rank the router option higher even if it’s a worse VPN. Otherwise you risk ending up in that yak shaving situation where you’re fighting with routing tables and DNS when you wanted a Minecraft server.

ricecake@sh.itjust.works · 2 months ago

Oh for sure. What I meant was “check router for a built in VPN and use it if it has one, otherwise use wireguard because it’s the easiest”.

The specific VPN doesn’t really matter so much. The built-in one would be the easiest, so checking for a solution that took a few clicks is worth it. :)

ricecake@sh.itjust.works · 2 months ago

I would use something like wireguard, or another VPN service you can host yourself if your router supports it natively.

From the looks of it Minecraft servers seem to have dogshit authentication, so using some form of private network setup is going to be your best move.

ricecake@sh.itjust.works · 2 months ago

Eeeh, I still think diving into the weeds of the technical is the wrong way to approach it. Their argument is that training isn’t copyright violation, not that sufficient training dilutes the violation.

Even if trained only on one source, it’s quite unlikely that it would generate copyright infringing output. It would be vastly less intelligible, likely to the point of overtly garbled words and sentences lacking much in the way of grammar.

If what they’re doing is technically an infringement or how it works is entirely aside from a discussion on if it should be infringement or permitted.

ricecake@sh.itjust.works · edit-2 2 months ago

Basing your argument around how the model or training system works doesn’t seem like the best way to frame your point to me. It invites a lot of mucking about in the details of how the systems do or don’t work, how humans learn, and what “learning” and “knowledge” actually are.

I’m a human as far as I know, and it’s trivial for me to regurgitate my training data. I regularly say things that are either directly references to things I’ve heard, or accidentally copy them, sometimes with errors.
Would you argue that I’m just a statistical collage of the things I’ve experienced, seen or read? My brain has as many copies of my training data in it as the AI model, namely zero, but “Captain Picard of the USS Enterprise sat down for a rousing game of chess with his friend Sherlock Holmes, and then Shakespeare came in dressed like Mickey mouse and said ‘to be or not to be, that is the question, for tis nobler in the heart’ or something”. Direct copies of someone else’s work, as well as multiple copyright infringements.
I’m also shit at drawing with perspective. It comes across like a drunk toddler trying their hand at cubism.

Arguing about how the model works or the deficiencies of it to justify treating it differently just invites fixing those issues and repeating the same conversation later. What if we make one that does work how humans do in your opinion? Or it properly actually extracts the information in a way that isn’t just statistically inferred patterns, whatever the distinction there is? Does that suddenly make it different?

You don’t need to get bogged down in the muck of the technical to say that even if you conceed every technical point, we can still say that a non-sentient machine learning system can be held to different standards with regards to copyright law than a sentient person. A person gets to buy a book, read it, and then carry around that information in their head and use it however they want. Not-A-Person does not get to read a book and hold that information without consent of the author.
Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.

Computers think the same way boats swim. Arguing about the difference between hands and propellers misses the point that you don’t want a shrimp boat in your swimming pool. I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy, I care that it ruins the whole thing for the people it exists for in the first place.

I think all the AI stuff is cool, fun and interesting. I also think that letting it train on everything regardless of the creators wishes has too much opportunity to make everything garbage. Same for letting it produce content that isn’t labeled or cited.
If they can find a way to do and use the cool stuff without making things worse, they should focus on that.

ricecake@sh.itjust.works · 2 months ago

As written the headline is pretty bad, but it seems their argument is that they should be able to train from publicly available copywritten information, like blog posts and social media, and not from private copywritten information like movies or books.

You can certainly argue that “downloading public copywritten information for the purposes of model training” should be treated differently from “downloading public copywritten information for the intended use of the copyright holder”, but it feels disingenuous to put this comment itself, to which someone has a copyright, into the same category as something not shared publicly like a paid article or a book.

Personally, I think it’s a lot like search engines. If you make something public someone can analyze it, link to it, or derivative actions, but they can’t copy it and share the copy with others.

ricecake@sh.itjust.works · 3 months ago

So that’s what third party cookies are. What this does is make it so that when you go to example.com and you get a Google cookie, that cookie is only associated with example.com, and your random.org Google cookie will be specific to that site.

A site will be able to use Google to track how you use their site, which is a fine and valid thing, but they or Google don’t get to see how you use a different site. (Google doesn’t actually share specifics, but they can see stuff like “behavior on one site led to sale on the other”)

ricecake@sh.itjust.works · 3 months ago

https://daniel.haxx.se/blog/2020/12/17/curl-supports-nasa/

https://daniel.haxx.se/blog/2023/02/07/closing-the-nasa-loop/

Their process for validating software doesn’t have a box for “open source”, and basically assumes it’s either purchased, or contracted. So someone in risk assessment just gets a list of software libraries and goes down it checking that they have the required forms.

As the referenced talk mentions, the people using the software understand that all the testing and everything is entirely on them, and that sending these messages is bothersome and unfair, and they’re working on it. Unfortunately, NASA is also a massive government bureaucracy and so process changes are slow, at best.
The TLAs don’t generally help NASA, and getting them involved would unfortunately only result in more messages being sent.

As for contributions, I think that turns into an even worse can of worms, since generally software developed by or for the US government isn’t just open source, but public domain. I think you’d end up with a big mess of licensing horror if you tried to get money or official relationships involved. It’s why sqlite is public domain, since it was developed at the behest of the US.

Mostly just context for what you said. NASA isn’t being arrogant, they’re being gigantic. Doing their due diligence in-house while another branch goes down a checklist, sees they don’t have a form and pops of an email and embarrassing the hell out of the first group.

The time limit thing is weird, but it’s a common practice in bureaucracies, public or private. You stick a timeline on the request to convey your level of urgency and the establish some manner of timeline for the other person to work with. Read the line again, but extremely literally: “we have a time frame of 5 days for a response”. “Our audit timeline guessed that it would take a business week for you to reply, so if you take longer we’re behind schedule”. The threatening version is “your response is required on or before five business days from the date of this message”.
The presumption is that the person on the other end is also working through a task queue that they don’t have much personal investment in, and is generally good natured, so you’re telling them “I don’t expect you to jump on this immediately, but wherever you can find a moment to reply this week would keep anyone from bothering me, and me from needing to send another email or trying to find a phone number”

ricecake@sh.itjust.works · 3 months ago

It has organizational support from ICANN, so it’s not done in total isolation.

ricecake@sh.itjust.works · 3 months ago

Paul Eggart is the primary maintainer for tzdb, and has been for the past 20 years.
Tzdb is the database that maintains all of the information about timezones, timezone changes, leap whatever’s and everything else. It’s present on just about every computer on the planet and plays an important role in making sure all of the things do time correctly.

If he gets hit by a bus, ICANN is responsible for finding someone else to maintain the list.

Sqlite is the most widely used database engine, and is primarily developed by a small handful of people.

ImageMagick is probably the most iconic example. Primarily developed by John Cristy since 1987, it’s used in a hilarious number of places for basic image operations. When a security bug was found in it a bit ago, basically every server needed to be patched because they all do something with images.

ricecake@sh.itjust.works · 3 months ago

I’m not sure I’m hearing anyone saying diversity is a bad thing.

People used “diversity hire” as an attack on Harris, but no one is using it as an attack on Walz, even though everyone basically immediately knew that the VP pick was going to be an older white man if only to make the ticket less of a “leap”.

That an all woman ticket, a ticket with two not-white people, or anything else not “default American politician” would face issue is kinda OPs point that we still have a long way to go to overcome those institutional barriers you mention.

Needing to consider diversity or representation when picking people is a sign that something has already gone wrong.
If the system were just and those barriers didn’t exist, people wouldn’t consider diversity, they’d just pick the best person and the diversity would just be there as consequence of demographics. (In a fair system, the top N% of the population will have a comparable demographic breakdown to the population at large).

It’s a sign of a cultural hangup that we definitely consider diversity, and need to in order to have decent representation, when making these choices, and even more sad that it’s only used as a cudgel against minorities , even when they were the first pick and others are being used to offset their “riskiness”.

ricecake@sh.itjust.works · 3 months ago

Yeah, it definitely might still be a bad data source,and it’s shady either way, just pointing out that “not public data” has a few meanings, and not all of them are synonymous with “private data”.

ricecake@sh.itjust.works · 3 months ago

I feel like that might be bad phrasing on the part of the article. They mainly aggregate public records, like legal document style public records, and they also scrapped data from not-(public record) data, which isn’t the same as (not-public) record data.

I feel like I would want more details to be sure though, but scrapping usually refers to “generally available” data.

ricecake@sh.itjust.works · 3 months ago

Totally. It’s double weird, because it’s not a petitionable issue, it’s a form where you make your case and a committee decides, and they already have the symbol and they just seem to want it to be usable like 💲, which isn’t a thing.

ricecake@sh.itjust.works · 3 months ago

I am aware of the lists and guidelines, I’ve been linking and quoting them to you. :)

It’s their report on the standards that highlights that they don’t think there’s a clear distinction between “emoji” and “character”, and that it’s mostly a matter of user expectation.
Hence some pictograph characters having a default “text” presentation, and some having a default “emoji” presentation. They also clarify that some things with a default “emoji” presentation aren’t in the set of characters people would associate with emoji and shouldn’t be counted if you’re trying.

I understand what you’re saying, which is that the selection criteria is different for a “language symbol” as opposed to a “pictographic symbol”, so they’re different things.
I disagree and think that “default presentation” might be a better metric, but that ultimately it’s about user and platform expectations. The same character can be presented “emoji” style or “text” style depending on context.

In any case, I’d also agree that there’s no viability to the notion that people use the Bitcoin symbol in a way that’s independent of the one meaning that it has, so a colorful cartoony rendition becoming an option doesn’t really fit. “His Christmas gift was $$$” is a sentiment people might express. “The hotel is ₿₿₿” just … Isn’t.