I cover 5 other papers, including WizardCoder, Data Constraints (how more epochs could be used), TinyStories, and more, to give context to the results and end with what I think timelines might be and how public messaging could be targeted.
With extracts from Sarah Constantin in Asterisk and Carl Shulman on Dwarkesh Patel, Andrej Karpathy and Jack Clark (co-founder of Anthropic), as well as the Textbooks and TinyStories co-author himself, Ronen Eldan, I hope you get something from this one. And yes, the title of the paper isn’t the best.
Textbooks Paper: https://arxiv.org/pdf/2306.11644.pdf
Karpathy Tweet: https://twitter.com/karpathy/status/1671587087542530049
TinyStories: https://arxiv.org/pdf/2305.07759.pdf
GPT 4 Self-Repair: https://arxiv.org/pdf/2306.09896.pdf
Yao Fu Tweet on Emergent Self-Repair: https://twitter.com/Francis_YAO_/status/1670618013089820674
WizardCoder: https://arxiv.org/pdf/2306.08568.pdf
Evol-Instruct (WizardLM) paper: https://arxiv.org/pdf/2304.12244.pdf
Scaling Data Constrained Language Models: https://arxiv.org/pdf/2305.16264.pdf
Sarah Constantin, Asterisk Magazine: https://asteriskmag.com/issues/03/the-transistor-cliff
Jack Clark Tweet: https://twitter.com/jackclarkSF/status/1673369486869811201
Carl Shulman, Intelligence Explosion, Dwarkesh Patel: https://www.youtube.com/watch?v=_kRg-ZP1vQc
LLMs and BDTs, Oxford: https://arxiv.org/ftp/arxiv/papers/2306/2306.13952.pdf
HumanEval: https://arxiv.org/pdf/2107.03374v2.pdf
Decoder Piece (if anyone wants to know, I think George Hotz is super-naïve on safety): https://the-decoder.com/gpt-4-is-1-76-trillion-parameters-in-size-and-relies-on-30-year-old-technology/#google_vignette
https://www.patreon.com/AIExplained
Add comment