Course»Course 6»Spring 2017»6.345/HST.728»Homepage

6.345/HST.728  Automatic Speech Recognition

Spring 2017

home page image

Instructor: James R Glass

TA: David Harwath

Lecture:  MW9.30-11  (32-144)
Office Hours (David):  T1-2, R3-4  (32-G440)      

Information: 

This course introduces students to the rapidly developing field of automatic speech recognition and spoken language processing. Its content is divided into four parts. Part I deals with the fundamentals of speech including signal representation, acoustic theory of speech production, and acoustic-phonetics. Part II deals with the basics of acoustic modeling and search including vector quantization, Gaussian mixtures, dynamic time-warping, and search. Part III deals with fundamentals of lexical access including hidden Markov models, language modeling, and finite state transducers. Part IV deals with more advanced methods including adaptation, discriminative techniques, and neural-network based modeling. Lectures will be interspersed with theory and applications.

There will be two 90 minute lectures per week, along with office hours. Lecture material will be front-loaded into the first half of the class, so that students are prepared for the term project which will start after spring break. There will be invited guest lectures on related speech and language topics in the latter half of the class.

There will be five assignments interspersed over the first half of the class as well, which include problems, a laboratory, and a brief in-class test. The assignments will be closely linked with lecture material. The laboratories will involve speech recognition experiments that can be performed on student Athena accounts. All material will be made available on the MIT Stellar course website.

Each of the five assignments will count 10% towards the final grade. The final term project will count towards 50% of the final grade.

The particular topics covered in the first half of the class include: digital signal processing, acoustic theory of speech production, acoustic phonetic properties of speech sounds, vector quantization, Gaussian mixture models, dynamic time warping, Viterbi and A* search, hidden Markov models, language modeling, finite state transducers, speaker adaptation, artificial neural networks, sequential training, and Neural End-to-End ASR

OCW archive available

Announcements

Engaging cluster feedback

Hi Class,

I just posted an assignment in the Homework section for submitting feedback regarding your experiences using the Engaging cluster. We, along with the cluster administrators, are very interested in hearing about what went well with the cluster, what difficulties you encountered, and any ways that you think the computing experience could be improved for future generations of students.

You will not be graded on this, and of course it is optional since you were not required to use the cluster for your project. However, if you did take advantage of the Engaging resources, please take just a few moments to write and upload a small document (a paragraph or two) detailing your experiences. Even if you had a seamless experience with no difficulties, please submit a document saying so!

Best,
Dave

Announced on 16 May 2017  1:24  p.m. by David Harwath

Final project presentation/meeting schedule

Hi Class,



Here is the schedule for the final project presentation meetings.
As a reminder, meetings will be 15 minutes long and will take place
between 9:30 AM and noon on Monday, May 15 through Thursday, May
18. In the times below, Monday is denoted by M, Tuesday by T,
Wednesday by W, and Thursday by R.



Meetings will be held in 32-G431. Please arrive at least 5-10
minutes before your scheduled meeting time so that you are ready to
go as soon as your timeslot begins.



Tyson and Nikhil : M9:30

Chen Gu : M9:45

Hongyin Luo : M10:00

Jen Drexler : M10:15

Dana Gretton : M10:30

Guadalupe Fabre : M10:45

Dan : M11:00

Nick : M11:30

Logan Martin, Logan Ford : M11:45

Kimberly Leon, Leopoldo Calderas, Abhinav Venigalla : T9:30

Andrew Titus : T9:45

Wei-Ning Hsu : T10:00

Matt McEachern : T10:15

Ken Leidal : T10:30

Hsin-Yu : W9:30

Denis Li : W9:45

Liz Salesky : W10:00

Yonatan Belinkov : W10:15

Tina Quach, Amanda Ke : W10:30

jecs : W10:45

Miaorong Wang : W11:00

Shai and boyan : W11:15

Tarfah, Leilani, Danielle : W11:30

Enes Kocabey : T11:45

Hsin-Wei (Wanda) : R10:30

Emmanuel Azuh : R10:45

Iveel Tsogsuren : R11:00

Nazmus Saquib : R11:15

Mohamed AlHajri : R11:30

Di-Chia Chueh : R11:45

Announced on 10 May 2017  11:46  a.m. by David Harwath

End-of-term subject evaluations

Hi Class,

Your feedback is important to us, so please take a moment to fill out your end-of-term course evaluations here:

http://web.mit.edu/subjectevaluation

Your suggestions help to improve the course for future generations of students!

Jim and Dave

Announced on 08 May 2017  2:56  p.m. by David Harwath

Final project presentation specifics

Hi Class,

For your final project presentations, you will have an in-person meeting with Jim and myself (10-15 minutes long) during which we'd like you to talk through your project writeup. At minimum, you should describe: the problem you worked on and how it relates to previous work, what dataset(s) you used, the experiments you performed, and the results and conclusions that you reached. If you have more than one person in your group, we'd like to hear each person talk about their individual contributions to the project.

We understand that people are going to be working on finishing the writing of the report during the last week of classes, so your report doesn't need to be 100% ready to hand in at the time of your meeting (especially if you are scheduled for a Monday or Tuesday timeslot). However, it should have most of your content written, especially if you've been gradually filling it in over the course of the project. As stated in the final project handout, you should submit your final project writeup to Stellar before 11:59 PM on 5/18.

Best,
Dave

Announced on 04 May 2017  2:34  p.m. by David Harwath

Scheduling final project presentations

Hi Class,

If you are registered for credit and are doing a final project, please fill out all timeslots that you are available to present your work in the following doodle poll:

https://doodle.com/poll/6gbqanymgf2bqqwr

When filling out the poll, please fill in the "name" field with the names of all of your group members. Presentations will be 10-15 minutes long.

Best,
Dave

Announced on 04 May 2017  12:19  p.m. by David Harwath

View archived announcements