Top Resources: Getting started with NLP and LLMs
Natural Language Processing (NLP) and Large Language Models (LLMs) have revolutionized the way we interact with technology, enabling machines to understand and generate human language with remarkable accuracy. As the field continues to evolve, it offers a wealth of opportunities for developers, researchers, and enthusiasts alike. This blog post aims to provide a curated list of essential resources for those looking to get started on their journey into NLP and LLMs. Whether you are a beginner or looking to deepen your existing knowledge, these resources and reading materials will serve as a solid foundation for your exploration of NLP and LLMs.
Stanford CS224N: Natural Language Processing with Deep Learning
Stanford offers free lectures exploring fundamental concepts and ideas in Natural Language Processing in the context of Deep Learning. Their most recent 2024 lecture series will help you develop an in-depth understanding of both the algorithms available for processing linguistic information and the underlying computational properties of natural languages. The focus is on deep learning approaches: implementing, training, debugging, and extending neural network models for a variety of language understanding tasks.
Topics Include
Computational properties of natural languages
Coreference, question answering, and machine translation
Processing linguistic information
Syntactic and semantic processing
Modern quantitative techniques in NLP
Neural network models for language understanding tasks
Stanford CS229 I Machine Learning I Building Large Language Models
This lecture provides a concise overview of building a ChatGPT-like LLM, covering both pretraining (language modeling) and post-training (SFT/RLHF). For each component, it explores common practices in data collection, algorithms, and evaluation methods. This guest lecture was delivered by Yann Dubois in Stanford’s CS229: Machine Learning course, in Summer 2024.
Stanford CS324: Large Language Models (2022 Lecture Notes)
Massive pre-trained language models have transformed the field of Natural Language Processing. They form the basis of all state-of-the-art systems across a wide range of tasks and have shown an impressive ability to generate fluent text and perform few-shot learning. At the same time, these models are hard to understand and give rise to new ethical and scalability challenges. In this Stanford course, students will learn the fundamentals about the modeling, theory, ethics, and systems aspects of large language models, as well as gain hands-on experience working with them.
The LLM Evaluation Guidebook
This GitHub repository explores LLM Evaluation. It’s designed for beginners and advanced users, and covers the following topics:
Automatic benchmarks
Human evaluation
LLM-as-a-judge
Go to: LLM Evaluation Guidebook
A Guide to Prompt Engineering
OpenAI
This OpenAI guidebook provides excellent tips and best practices on effective Prompt Engineering. It shares strategies and tactics for getting better results from LLMs like GPT-4o. The methods described range from writing clear instructions, to utilizing references, tools and employing systematic testing. They encourage experimentation and the combination of best practices for greater effect.
Go to: OpenAI Guide to Prompt Engineering
HuggingFace
This guide covers the prompt engineering best practices to help you craft better LLM prompts and solve various NLP tasks. You’ll learn: