Language Models for Hackers

This is not an exhaustive list of LLM literature. This is an opinionated collection of papers from the LLM landscape useful for hackers.

This document will keep getting updated.

If you have any questions, DM me on Twitter at: @nishantiam and follow for general updates.

I presume you already know Attention is all you need, GPT-3 and GPT-4.

Prompting

We can ask LLMs questions, and get answers. Example:

Untitled

In Zero-shot Chain of Thought, after a question is asked you add the phrase, “Let us think step by step”, and the GPT models output a better result. For example a prompt will be like:

Question: What is the elevation range for the area that the eastern sector of the
Colorado orogeny extends into?
Thought: **Let’s think step by step**. The eastern sector of Colorado orogeny extends
into the High Plains. High Plains rise in elevation from around 1,800 to
7,000 ft, so the answer is 1,800 to 7,000 ft.
Answer: 1,800 to 7,000 ft
Question: <Your Question>
Thought: **Let’s think step by step.** <Agent Writes>

Why it works? In For effective CoT it takes two to tango the authors say:

First, the presence of factual patterns in a prompt is practically immaterial to the success of COT. Second, our results conclude that the primary role of intermediate steps may not be to facilitate learning “how” to solve a task. The intermediate steps are rather a beacon for the model to realize “what” symbols to replicate in the output to form a factual answer. As such, the patterns are merely a channel to “trick” the model into forming sentences that resemble correct answers. This pathway is facilitated by text, which imbues patterns with commonsense knowledge and meaning

From, OpenAI cookbook:

Although the Let's think step by steptrick works well on math problems, it's not effective on all tasks. The authors found that it was most helpful for multi-step arithmetic problems, symbolic reasoning problems, strategy problems, and other reasoning problems. It didn't help with simple math problems or common sense questions, and presumably wouldn't help with many other non-reasoning tasks either.

You can also give an example Chain of Thought in prompt, to guide the model towards explaining its answer.

Untitled

In MRKL, the prompt is:

Answer the following questions as best you can. You have access to the following tools:
bash: run bash commands in a sandbox machine
search: search the internet for keyword

Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [bash, search]
Action Input: the input to the action
Observation: the result of the action
... (this **Thought/Action/Action Input/Observation** can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: <Your Question>
Thought: <Agent writes>

In ReAct, the prompt is: