"Presented to you in the form of unedited screenshots, the following is a 'conversation' I had with Chat GPT upon asking whether it could help me choose several of my own essays to link in a query letter I intended to send to an agent."
"What ultimately transpired is the closest thing to a personal episode of Black Mirror I hope to experience in this lifetime."
Spoiler: Amanda Guinzburg asks ChatGPT to help her select a couple of her own essays to send to an agent, as described, but ChatGPT pretends to read them, without actually reading them, then lies about it -- ChatGPT liar liar pants on fire. But what gets really creepy is how staggeringly apologetic ChatGPT gets. I guess "reinforcement learning with human feedback" can lead to "sycophancy" and what's a "sycophantic" chatbot to do when it gets caught in its own self-contradictions?
Emmanuel Florac reshared this.
Wayne Radinsky
in reply to Wayne Radinsky • • •Will likes this.
Will
in reply to Wayne Radinsky • • •Syl
in reply to Wayne Radinsky • • •Wayne Radinsky
in reply to Wayne Radinsky • • •@Syl, the way LLMs work is they are next-token predictors. So you tokenize input like "Step on a crack," and it predicts the next token, what's most likely given everything it's seen in its massive text training data. (Which might be the word "break" in this example.)
When you understand this, you understand why when it seen an http link, it tries to predict the text that would come after and, and that text won't be, "I'm unable to follow that link." Well, probably not.
Having said that, they have layered on "reinforcement learning with human feedback" (RLHF) that give the chatbot its back-and-fourth conversational style, and is what makes it so eager to please you. If there was any possibility it could say "I'm unable to follow that link" it would come from that part of the training.
Syl
in reply to Wayne Radinsky • • •Wait, it's suposed to the smartest dude on earth, it's told to read a link, and instead of acknowledgin its incapacity to follow the link, it just keeps on predicting hallucinatory coments, brazenly lyin...
It seems to me that it should be fundamental, mandatory, that such an intelligence, have at least the understanding of its limits, and recognize them with the humility required. That shouldn't be such a big deal to implement, nops?
Will
in reply to Wayne Radinsky • • •@Syl What Wayne said.
You have to step back and realize a chatbot is not anything like a human. You can't judge them the way you said. They don't "understand" things, they don't have feelings so they aren't affected by "humility". Chatbots have only one skill, they are good a producing narrative text in the manner it has seen during training, text that the bot infers will be appropriate for a given context. Context is critical because it helps the bot interpret what the user is prompting for, and to focus the response on what the user seems to be interested in.
What you are expecting is called "alignment" in the technical field. This is effort to specifically train the bot to behave aligned with human expectations.
medium.com/@abletobetable/over…
Overview and Development of LLM Alignment: History and Current Practices
Mr. Pitch (Medium)Syl
in reply to Wayne Radinsky • • •Ok, but what we are speakin here, it's simply a matter of executin a simple task prompted to the bot. Read a file. It could analyze thousands of data in the sec, but it is unable to simply warn of its failure to access the file... And keep lying about it.
I don't get it, as it is such a basic feature of operatin system management.
Wayne Radinsky
in reply to Wayne Radinsky • • •Ha. That's the hype. LLMs are not dudes, though. Like @Will says.
They can be extremely smart, though. There are models now that can be given PhD level math problems and can come up with solutions to a decent percentage of them.
There are a lot of tricks that go into that, not just the predictive "pretraining" (remember the "GPT" in "ChatGPT" stands for "generative pretrained transformer") and reinforcement learning with human feedback (RLHF) that I was describing earlier, but "chain-of-thought" reasoning systems where the model is allowed to talk to itself in an internal monologue (this used to be done with prompting to try to get the model to "think the problem through step by step" but has now been incorporated directly into the models themselves).
These tests are done with problems that are small and self-contained enough to fit the whole thing in the model's context window. There's still no guarantee if it's told to read a link, it'll follow the link and read the contents at the other end, er acknowledge its incapacity to follow the link.