Top Resources: Getting started with NLP and LLMs

Natural Language Processing (NLP) and Large Language Models (LLMs) have revolutionized the way we interact with technology, enabling machines to understand and generate human language with remarkable accuracy. As the field continues to evolve, it offers a wealth of opportunities for developers, researchers, and enthusiasts alike. This blog post aims to provide a curated list of essential resources for those looking to get started on their journey into NLP and LLMs. Whether you are a beginner or looking to deepen your existing knowledge, these resources and reading materials will serve as a solid foundation for your exploration of NLP and LLMs.

Stanford CS224N: Natural Language Processing with Deep Learning

Stanford offers free lectures exploring fundamental concepts and ideas in Natural Language Processing in the context of Deep Learning. Their most recent 2024 lecture series will help you develop an in-depth understanding of both the algorithms available for processing linguistic information and the underlying computational properties of natural languages. The focus is on deep learning approaches: implementing, training, debugging, and extending neural network models for a variety of language understanding tasks.

Topics Include

  • Computational properties of natural languages

  • Coreference, question answering, and machine translation

  • Processing linguistic information

  • Syntactic and semantic processing

  • Modern quantitative techniques in NLP

  • Neural network models for language understanding tasks

Go to: Lecture Series on YouTube

Stanford CS229 I Machine Learning I Building Large Language Models

This lecture provides a concise overview of building a ChatGPT-like LLM, covering both pretraining (language modeling) and post-training (SFT/RLHF). For each component, it explores common practices in data collection, algorithms, and evaluation methods. This guest lecture was delivered by Yann Dubois in Stanford’s CS229: Machine Learning course, in Summer 2024.

Go to: Lecture recording on YouTube

Stanford CS324: Large Language Models (2022 Lecture Notes)

Massive pre-trained language models have transformed the field of Natural Language Processing. They form the basis of all state-of-the-art systems across a wide range of tasks and have shown an impressive ability to generate fluent text and perform few-shot learning. At the same time, these models are hard to understand and give rise to new ethical and scalability challenges. In this Stanford course, students will learn the fundamentals about the modeling, theory, ethics, and systems aspects of large language models, as well as gain hands-on experience working with them.

Go to: Lecture Notes

The LLM Evaluation Guidebook

This GitHub repository explores LLM Evaluation. It’s designed for beginners and advanced users, and covers the following topics:

Automatic benchmarks

Human evaluation

LLM-as-a-judge

Go to: LLM Evaluation Guidebook

A Guide to Prompt Engineering

OpenAI

This OpenAI guidebook provides excellent tips and best practices on effective Prompt Engineering. It shares strategies and tactics for getting better results from LLMs like GPT-4o. The methods described range from writing clear instructions, to utilizing references, tools and employing systematic testing. They encourage experimentation and the combination of best practices for greater effect.

Go to: OpenAI Guide to Prompt Engineering

HuggingFace

This guide covers the prompt engineering best practices to help you craft better LLM prompts and solve various NLP tasks. You’ll learn:

Go to: Hugging Face Guide to Prompt Engineering

Next
Next

Interactive Tools: Deep Learning, Data, Interpretability, Math