Time and Location
Tues/Thurs 11:30 AM-12:45 PM, Carr 102
Office Hours: Tues 5-7 PM, Thurs 7-8 PM in Clapp 218
Natural Language Processing (NLP) is the scientific and engineering discipline of how to get computers to understand and process human language. Speech recognition, machine translation, and search engines are all NLP systems that have revolutionized how we work with information.
This course introduces the fundamental techniques for automated text and speech analysis and understanding. It covers computational algorithms, hands-on practice, and insights from linguistics.
Provisional topics include: language modeling, part-of-speech tagging, speech recognition, speech synthesis, prosodic analysis, conversational dialogue, context-free grammars, syntactic parsing, coreference, text classification, sentiment analysis, and machine translation.
- Understand the challenges of processing natural language; appreciate the basic linguistic issues that underlie NLP problems.
- Recognize and understand NLP terminology and methods in widespread use in modern NLP systems.
- Gain experience implementing several components of NLP systems.
- Be able to read and understand current research papers published in conferences such as ACL, Interspeech, and SIGDial.
CS 211 (data structures) or permission from the instructor. You should be comfortable designing programs, writing code, and you should be comfortable with mathematical reasoning. If you have any questions about whether you have the right background, please talk to me.
This class is a mixture of traditional lectures, small group activities, and hands-on lab activities (in Kendade 307). In the lab, you will gain experience with unix tools, text analysis with python, and software for speech analysis, recognition, and synthesis.
All readings will be available online or on moodle. Most of these are chapters from Speech and Language Processing (3rd edition draft) (Authors: Dan Jurafsky and James Martin). The library has a copy of Speech and Language Processing (2nd edition) on reserve; it is an excellent resource.
Course Piazza site: https://piazza.com/mtholyoke/spring2017/cs341nl/home
We will use piazza for announcements, Q&A, discussions, and reading responses.
There will be 4 or 5 homework assignments. They will be a mixture of conceptual and programming exercises.
The final project is an integral part of this course. It will give you an opportunity to creatively extend one of the homeworks or explore a topic of interest to you. There will be milestones throughout the semester to guide your project development.
This includes constructive participation in small group and whole-class discussions, a presentation of your final project, informal presentations throughout the semester, giving structured feedback to your peers, and arriving to class on-time.
- Homework assignments: 45%
- Labs and participation: 15%
- Literature review: 10%
- Final Project: 30%
You can use up to 3 free late days on the homework assignments (<= 24 hours counts as 1 day). After your late days are spent, late homework will be penalized 10% per day late.
If you have a disability for which you require accommodations, please make an appointment to see the instructor within the first two weeks of classes so that we can make appropriate arrangements. You will need to have a letter from the AccessAbility Services Office, located in Wilder Hall B4 (phone: 413-538-2634, ).
In all your work for this class, it is very important for you to follow the Honor Code: I will honor myself, my fellow students, and Mount Holyoke College by acting responsibly, honestly, and respectfully in both my words and my deeds. If you are not sure how this applies in a particular context, please ask for clarification. Collaboration on homework assignments is encouraged. However, when you write up your work, it is important that you only write what you understand, and that it is in your own words. If you have any questions about what constitutes an Honor Code violation in this class please ask your instructor.
This is provisional schedule and will be updated throughout the semester.
Unless otherwise noted, chapters listed in the reference column refer to Jurafsky & Martin’s Speech and Language Processing (3rd edition draft). SLP2 refers to the Speech and Language Processing 2nd edition chapters available on moodle.
|Jan 24||Lecture||Introduction [slides]||Ch. 1|
|Jan 26||Lab||Unix/Regex Lab [slides]||Ch. 2.1; Unix for Poets|
|Jan 31||Lecture||Words; N-grams [slides]||Ch. 2.2-2.3, SLP2 Ch. 5.1||HW 1 due|
|Feb 2||Lab||N-gram Lab [moodle]|
|Feb 7||Lecture||Language modeling [slides]||Ch. 4.1||HW2 due|
|Feb 9||SNOW DAY|
|Feb 14||Lab||Building LMs [slides]|
|Feb 16||Lecture||Smoothing [slides]||Ch. 4.3-4.4||Piazza post on Green, Heer, and Manning (2015)|
|Feb 21||Lecture||HMMs [slides]||Ch. 9.1-9.4||HW3 due|
|Feb 23||Discussion||Phi Beta Kappa Visiting Scholar Barbara Grosz: Chatbots, Barbie and the ethical challenges they raise||“Barbie wants to get to know your child” (2015); Green, Heer, and Manning (2015)||Piazza post on “Barbie wants to get to know your child” (2015)|
|Feb 28||Lecture||Viterbi algorithm [handout]|
|Mar 2||Lab||POS Lab [slides]|
|Mar 7||Guest Lecture||Su Lin Blodgett: Demographic Dialectal Variation in Social Media: A Case Study of African-American English||HW4 due|
|Mar 9||No class|
|Mar 21||Lecture||Phonetics, Speech [slides]||SLP2 Ch. 7|
|Mar 23||Lab||Dialogue; Spectrograms [slides]|
|Mar 28||Discussion||Lit Review Presentations [slides]||Literature Review due; see past examples|
|Mar 30||Discussion||Final Project Pitches|
|Apr 4||Lecture||Syntactic Parsing [slides]||Project Proposal due|
|Apr 6||Lab||Sentiment Analysis [slides]||Pang and Lee (2008)|
|Apr 11||Lab||Naive Bayes; Project work day|
|Apr 13||Lab||Precision/Recall; Project work day|
|Apr 18||Lab||Project work day||Project Progress Report due|
|Apr 20||Lecture||Wrap-up [slides]|
|Apr 25||Discussion||Final Project Presentations|
|Apr 27||Discussion||Final Project Presentations||Project Write-up due 5/4|