General Information

Time and Location
Tues/Thurs 11:30 AM-12:45 PM, Carr 102

Heather Pon-Barry
Office: Clapp 226
Office Hours: Tues/Thurs 2-3 PM

Mahima Ghale
Office Hours: Tues 5-7 PM, Thurs 7-8 PM in Clapp 218

Course Overview
Natural Language Processing (NLP) is the scientific and engineering discipline of how to get computers to understand and process human language. Speech recognition, machine translation, and search engines are all NLP systems that have revolutionized how we work with information.

This course introduces the fundamental techniques for automated text and speech analysis and understanding. It covers computational algorithms, hands-on practice, and insights from linguistics.

Provisional topics include: language modeling, part-of-speech tagging, speech recognition, speech synthesis, prosodic analysis, conversational dialogue, context-free grammars, syntactic parsing, coreference, text classification, sentiment analysis, and machine translation.


Learning Objectives

CS 211 (data structures) or permission from the instructor. You should be comfortable designing programs, writing code, and you should be comfortable with mathematical reasoning. If you have any questions about whether you have the right background, please talk to me.

Class Format
This class is a mixture of traditional lectures, small group activities, and hands-on lab activities (in Kendade 307). In the lab, you will gain experience with unix tools, text analysis with python, and software for speech analysis, recognition, and synthesis.

Reading Materials
All readings will be available online or on moodle. Most of these are chapters from Speech and Language Processing (3rd edition draft) (Authors: Dan Jurafsky and James Martin). The library has a copy of Speech and Language Processing (2nd edition) on reserve; it is an excellent resource.

Course Piazza site:
We will use piazza for announcements, Q&A, discussions, and reading responses.

There will be 4 or 5 homework assignments. They will be a mixture of conceptual and programming exercises.

Final Project
The final project is an integral part of this course. It will give you an opportunity to creatively extend one of the homeworks or explore a topic of interest to you. There will be milestones throughout the semester to guide your project development.

This includes constructive participation in small group and whole-class discussions, a presentation of your final project, informal presentations throughout the semester, giving structured feedback to your peers, and arriving to class on-time.


Late Days
You can use up to 3 free late days on the homework assignments (<= 24 hours counts as 1 day). After your late days are spent, late homework will be penalized 10% per day late.

If you have a disability for which you require accommodations, please make an appointment to see the instructor within the first two weeks of classes so that we can make appropriate arrangements. You will need to have a letter from the AccessAbility Services Office, located in Wilder Hall B4 (phone: 413-538-2634, ).

Academic Integrity
In all your work for this class, it is very important for you to follow the Honor Code: I will honor myself, my fellow students, and Mount Holyoke College by acting responsibly, honestly, and respectfully in both my words and my deeds. If you are not sure how this applies in a particular context, please ask for clarification. Collaboration on homework assignments is encouraged. However, when you write up your work, it is important that you only write what you understand, and that it is in your own words. If you have any questions about what constitutes an Honor Code violation in this class please ask your instructor.


This is provisional schedule and will be updated throughout the semester.

Unless otherwise noted, chapters listed in the reference column refer to Jurafsky & Martin’s Speech and Language Processing (3rd edition draft). SLP2 refers to the Speech and Language Processing 2nd edition chapters available on moodle.

Date Format Topic Reference Assignment Due
Jan 24 Lecture Introduction [slides] Ch. 1
Jan 26 Lab Unix/Regex Lab [slides] Ch. 2.1; Unix for Poets
Jan 31 Lecture Words; N-grams [slides] Ch. 2.2-2.3, SLP2 Ch. 5.1 HW 1 due
Feb 2 Lab N-gram Lab [moodle]
Feb 7 Lecture Language modeling [slides] Ch. 4.1 HW2 due
Feb 14 Lab Building LMs [slides]
Feb 16 Lecture Smoothing [slides] Ch. 4.3-4.4 Piazza post on Green, Heer, and Manning (2015)
Feb 21 Lecture HMMs [slides] Ch. 9.1-9.4 HW3 due
Feb 23 Discussion Phi Beta Kappa Visiting Scholar Barbara Grosz: Chatbots, Barbie and the ethical challenges they raise “Barbie wants to get to know your child” (2015); Green, Heer, and Manning (2015) Piazza post on “Barbie wants to get to know your child” (2015)
Feb 28 Lecture Viterbi algorithm [handout]
Mar 2 Lab POS Lab [slides]
Mar 7 Guest Lecture Su Lin Blodgett: Demographic Dialectal Variation in Social Media: A Case Study of African-American English HW4 due
Mar 9 No class
Mar 21 Lecture Phonetics, Speech [slides] SLP2 Ch. 7
Mar 23 Lab Dialogue; Spectrograms [slides]
Mar 28 Discussion Lit Review Presentations [slides] Literature Review due; see past examples
Mar 30 Discussion Final Project Pitches
Apr 4 Lecture Syntactic Parsing [slides] Project Proposal due
Apr 6 Lab Sentiment Analysis [slides] Pang and Lee (2008)
Apr 11 Lab Naive Bayes; Project work day
Apr 13 Lab Precision/Recall; Project work day
Apr 18 Lab Project work day Project Progress Report due
Apr 20 Lecture Wrap-up [slides]
Apr 25 Discussion Final Project Presentations
Apr 27 Discussion Final Project Presentations Project Write-up due 5/4