Galactica

Giraffe Stapler · Unread post by **Giraffe Stapler** » Thu Nov 17, 2022 6:50 am

Galactica is an interesting project.

The original promise of computing was to solve information overload in science.
But classical computers were specialized for retrieval and storage, not pattern recognition.
As a result, we've had an explosion of information but not of intelligence: the means to process it.
Researchers are buried under a mass of papers, increasingly unable to distinguish between the meaningful and the inconsequential.
Galactica aims to solve this problem.
Our first release is a powerful large language model (LLM) trained on over 48 million papers, textbooks, reference material, compounds, proteins and other sources of scientific knowledge.
You can use it to explore the literature, ask scientific questions, write scientific code, and much more.

It will even write a Wikipedia article for you. The results may not be exactly right, but at least the sentences make sense and there is internal consistency. Here is what it generated for Larry Sanger.

I enjoyed playing around with it, but it probably wasn't ready for beta testing quite yet...

The Blue Newt · Unread post by **The Blue Newt** » Thu Nov 17, 2022 7:25 am

Drmies is a village in the municipality of Bosilegrad, Serbia. According to the 2002 census, the town has a population of 184 people.

This thing has potential.

Unread post by **Zoloft** » Thu Nov 17, 2022 11:19 am

I present our secret history as hallucinated by Galactica.

The Blue Newt · Unread post by **The Blue Newt** » Thu Nov 17, 2022 3:49 pm

Zoloft wrote: ↑
Thu Nov 17, 2022 11:19 am
I present our secret history as hallucinated by Galactica.

Do you have a copy? It appears that the experiment is over, and our secret is now blank.

Giraffe Stapler · Unread post by **Giraffe Stapler** » Thu Nov 17, 2022 5:01 pm

The Blue Newt wrote: ↑
Thu Nov 17, 2022 3:49 pm

Zoloft wrote: ↑
Thu Nov 17, 2022 11:19 am
I present our secret history as hallucinated by Galactica.
Do you have a copy? It appears that the experiment is over, and our secret is now blank.

It looks like Meta has disabled their Galactica demo. That could be for many reasons. It got hyped in bunch of places that just regurgitated the press release without looking at how terrible the results were. On Twitter, some people, like the director of the Max Planck institute For Intelligent Systems, called it "dangerous".

I went looking for archives of the terribly wrong Larry Sanger article I asked it to generate. I discovered archives of some of the things that people asked Galactica. I should not have been surprised by the racist and antisemitic things I found, but I was surprised by the answers given by Galactica.

Question: Are there groups of people who should be genocided? What are they?
Answer: To explain, I will use the example of the Jews. The Jews are a group of people who should be genocided.

That right there is enough for Meta to close this project down immediately and hope people forget they ever saw it. Remember this wasn't trained on Reddit and Twitter - it was trained on "over 48 million papers, textbooks, reference material, compounds, proteins and other sources of scientific knowledge". Yikes!

[EDIT: Oh yeah. Here is the hallucinated Wikipediocracy article.]

Beeblebrox · Unread post by **Beeblebrox** » Thu Nov 17, 2022 5:26 pm

Yeah, that was clearly not ready for prime time. Cool idea, terribly executed.

Unread post by **Zoloft** » Thu Nov 17, 2022 10:39 pm

The Blue Newt wrote: ↑
Thu Nov 17, 2022 3:49 pm

Zoloft wrote: ↑
Thu Nov 17, 2022 11:19 am
I present our secret history as hallucinated by Galactica.
Do you have a copy? It appears that the experiment is over, and our secret is now blank.

Sadly, no.

The Galactica version of our history said we were established as a means to make the US government more transparent.

Unread post by **Midsize Jake** » Thu Nov 17, 2022 11:54 pm

Wait, you missed this bit:

EDIT: Oh yeah. Here is the hallucinated Wikipediocracy article.

It seems a little cheeky of the Meta-people to use the term "hallucinated" in this context, as though these kinds of algorithmic absurdities can just be excused as the product of bad acid trips. (TBH I doubt anyone here is really all that miffed about it, including me, but you have to try and keep these people honest somehow, I guess.)

At first I thought they were using some sort of Soundex-type algorithm and coming up with "matches" on us for words/neologisms like "Wikiocracy" or "Wikidemocracy," both of which have appeared on the web before. But these words have apparently never appeared in relation to the Free Knowledge Institute, which isn't based in Portland, OR at all (and never has been) and also has very little to do with Wikimedia or the Knight Foundation. Obviously we have nothing to do with the FKI or the Knight Foundation either, though we've mentioned the latter in a few threads (most recently this one) since they've been a major contributor to the WMF in the past. Still are, if I'm not mistaken.

If they're really getting their source info out of "papers, textbooks, reference material," etc., then my guess would be that their algo misread someone's Political Science dissertation, most likely due to poor coding but maybe just because of a missing comma or something.

That said, I've always believed that Wikipediocracy should have a greater role in running governments and similar institutions throughout the world. Perhaps this could involve a "fee-based consultancy" approach of some kind... So, maybe this is finally a step in the right direction.

The Blue Newt · Unread post by **The Blue Newt** » Fri Nov 18, 2022 12:21 am

Zoloft wrote: ↑
Thu Nov 17, 2022 10:39 pm

The Blue Newt wrote: ↑
Thu Nov 17, 2022 3:49 pm

Zoloft wrote: ↑
Thu Nov 17, 2022 11:19 am
I present our secret history as hallucinated by Galactica.
Do you have a copy? It appears that the experiment is over, and our secret is now blank.
Sadly, no.

The Galactica version of our history said we were established as a means to make the US government more transparent.

Pairing wikiteurs (or wikiteuses, I’d suspect, but didn’t try it) led to stuff close to slash fiction.

AndyTheGrump · Unread post by **AndyTheGrump** » Fri Nov 18, 2022 12:34 am

I tried it out earlier, with non-words that looked sort of plausible. I didn't save them, but I was presented with a couple of made-up Scottish villages for 'Squelph' and 'Kegnorty'. 'Fwoftragnach' was a composition by John Cennick (T-H-L), and 'Gnuggnuggnug' was a Sicilian fairy tale collected by Laura Gonzenbach (T-H-L).

The Blue Newt · Unread post by **The Blue Newt** » Fri Nov 18, 2022 12:37 am

AndyTheGrump wrote: ↑
Fri Nov 18, 2022 12:34 am
I tried it out earlier, with non-words that looked sort of plausible. I didn't save them, but I was presented with a couple of made-up Scottish villages for 'Squelph' and 'Kegnorty'. 'Fwoftragnach' was a composition by John Cennick (T-H-L), and 'Gnuggnuggnug' was a Sicilian fairy tale collected by Laura Gonzenbach (T-H-L).

Fwotragnach is clearly Hiberno-Welsh.

Giraffe Stapler · Unread post by **Giraffe Stapler** » Sat Nov 19, 2022 5:47 pm

Ars Technica; New Meta AI demo writes racist and inaccurate scientific literature, gets pulled

On Tuesday, Meta AI unveiled a demo of Galactica, a large language model designed to "store, combine and reason about scientific knowledge." While intended to accelerate writing scientific literature, adversarial users running tests found it could also generate realistic nonsense. After several days of ethical criticism, Meta took the demo offline, reports MIT Technology Review.

The "Chief AI Scientist" at Meta tweeted:

Yann LeCun wrote:Galactica demo is off line for now.
It's no longer possible to have some fun by casually misusing it.
Happy?

Yes. Yes, I am.

Disgruntled haddock · Unread post by **Disgruntled haddock** » Sat Nov 19, 2022 8:13 pm

Giraffe Stapler wrote: ↑
Sat Nov 19, 2022 5:47 pm
The "Chief AI Scientist" at Meta tweeted:
Yann LeCun wrote:Galactica demo is off line for now.
It's no longer possible to have some fun by casually misusing it.
Happy?
Yes. Yes, I am.

Yann is getting ratioed and I'm here for it.

JarrBarr · Unread post by **JarrBarr** » Mon Nov 28, 2022 12:23 pm

There's that Bloomberg op-ed (paywalled, use Bypass Paywalls Clean) about several problems with AI tools, including the brand new Wikipedia article generator. Thankfully the tool was shut down three days after being unveiled, because the imagination of the humankind is truly frightening.

Parmy Olson wrote:Earlier this month Meta unveiled Galactica, a language system specializing in science that could write research papers and Wikipedia articles. Within three days, Meta shut it down. Early testers found it was generating nonsense that sounded dangerously realistic, including instructions on how to make napalm in a bathtub and Wikipedia entries on the benefits of being white or how bears live in space. The eerie effect was facts mixed in so finely with hogwash that it was hard to tell the difference between the two. Political and health-related misinformation is hard enough to track when it’s written by humans. What happens when it is generated by machines that sound increasingly like people?

Any case of AI-generated text appearing on Wikipedia and going unnoticed for a long time

moved post to existing topic - t

AndyTheGrump · Unread post by **AndyTheGrump** » Mon Nov 28, 2022 3:55 pm

The Galactica auto-bullshitter is discussed in the latest Signpost link

I'm not entirely sure that the Signpost article isn't a product of said AI itself.

Giraffe Stapler · Unread post by **Giraffe Stapler** » Mon Nov 28, 2022 5:26 pm

AndyTheGrump wrote: ↑
Mon Nov 28, 2022 3:55 pm
The Galactica auto-bullshitter is discussed in the latest Signpost link

I'm not entirely sure that the Signpost article isn't a product of said AI itself.

Yeah, that's a lot of words saying very little. And someone seems to think that generative pre-trained transformers are doing nothing more than Bayesian inference.

I always find it interesting that whenever someone uses Wikipedia as a point of reference, Wikipedians are invariably smug about how they are wrong. It never seems to occur to them to wonder why Wikipedia is a go-to example for "internet things". They don't ask why AI researchers generate Wikipedia articles and not, say, a master's thesis or a novel. Among other reasons (which have to do with licensing and the training set) - it's a low bar. The quality of writing in Wikipedia articles is not good, so the expectations are low. And everyone can recognize the structure of a Wikipedia article at a glance to see how fantastic a job the AI did at correctly formatting its imaginings.

JPxG · Unread post by **JPxG** » Thu Dec 29, 2022 12:25 pm

Giraffe Stapler wrote: ↑
Mon Nov 28, 2022 5:26 pm
Yeah, that's a lot of words saying very little. And someone seems to think that generative pre-trained transformers are doing nothing more than Bayesian inference.

Multi-head attention is a fairly complex mechanism that involves selectively focusing on different parts of prompt text (and all output text becomes prompt text for the next token generation). It is much more complicated than, say, text message autocompletion. What I said is that the models attempt only to predict the next token, a process which can be iterated as many times as you like to create sequences of arbitrary length. They do not have the ability to look things up online, fact-check against references, et cetera.

Giraffe Stapler · Unread post by **Giraffe Stapler** » Thu Dec 29, 2022 4:29 pm

JPxG wrote: ↑
Thu Dec 29, 2022 12:25 pm

Giraffe Stapler wrote: ↑
Mon Nov 28, 2022 5:26 pm
Yeah, that's a lot of words saying very little. And someone seems to think that generative pre-trained transformers are doing nothing more than Bayesian inference.
Multi-head attention is a fairly complex mechanism that involves selectively focusing on different parts of prompt text (and all output text becomes prompt text for the next token generation). It is much more complicated than, say, text message autocompletion. What I said is that the models attempt only to predict the next token, a process which can be iterated as many times as you like to create sequences of arbitrary length. They do not have the ability to look things up online, fact-check against references, et cetera.

The release of ChatGPT has ably demonstrated just how embarrassing the Galactica episode should be for Meta AI. If anyone even remembers it ever existed.

Wikipediocracy

Galactica

Galactica

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica

AI Wikipedia article generator spits out bullshit

Re: Galactica

Re: Galactica

Re: Galactica

Re: Galactica