From: <smmalina@ATHENA.MIT.EDU>
Message-Id: <8906061433.AA19547@M1-142-4.MIT.EDU>
To: smm%nc.mit.edu@mc.lcs.mit.edu
Cc: iagarcia@fenchurch.mit.edu, iagarcia@ATHENA.MIT.EDU
Subject: what's left of my athena account
Date: Tue, 06 Jun 89 10:32:55 EDT


Hi.  Could you please store this file for me over the summer,
and then mail it to me when I get an account at Stanford?

Thank you very much.  I'll write to you soon.

interviews.tex, nsf.mss, stanford.tex, summerres.mss, thesis.tex, two.tex, xin


interviews.tex:
\documentstyle [12pt]{article}

\begin{document}

\begin{center}
{\huge Interviews}

\vspace{3em}

{\large Consulting}
\end{center}

\begin{tabbing}
123456789\=123\=123456789012345678901234567890\=12345\=\kill

24 Jan\>\>	Temple, Barker \& Sloane\>	Management\\
30 Jan\>\>	Monitor\>			Strategy\\
31 Jan\>\>	Arthur Andersen\>		Information Systems\\
\\
8 Feb\>\>	Booz$\cdot$Allen \& Hamilton\>	Management\\
9 Feb\>\>	Braxton Associates\>		Strategy\\
10 Feb\>\>	Boston Consulting Group\>	Strategy\\
10 Feb\>\>	McKinsey Camb. Systems Ctr.\>	Information Systems\\
13 Feb\>\>	LEK Partnership\>		Strategy\\
14 Feb\>\>	Network Strategies\>		Telecommunications\\
\\
6 Mar\>*\>	Design Options\>		Software\\
6 Mar\>*\>	Cambridge Technology Group\>	Information Systems\\
6 Mar\>*\>	Bear Stearns\>			Mathematical Arbitrage\\
13 Mar\>\>	Fletcher \& Company\>		Startups\\
22 Mar\>*\>	Arthur D. Little\>		Automotive\\
\\
\\
\end{tabbing}

\begin{center}
{\large Engineering}
\end{center}

\begin{tabbing}
123456789\=123\=123456789012345678901234567890\=12345\=\kill
\\
12 Oct\>*\>	Oracle\>			Software\\
19 Oct\>*\>	AT\&T\>				Systems Engineering\\
19 Oct\>*\>	HP\>				EE\\
3 Nov\>\>	NCR\>				EE\\
17 Nov\>X\>	IBM\>				Internal Telecomm\\
\\
7 Feb\>\>	GTE\>				Lighting\\
9 Feb\>\>	Xerox\>				Copiers\\
14 Feb\>*\>	Boeing\>			Systems Engineering\\
16 Feb\>\>	JPL\>				Space Exploration\\
16 Feb\>X\>	GE\>				EE\\
28 Feb\>\>	Control Data\>			Fast Computers\\
28 Feb\>\>	VMX/OPCOM\>			Voice Mail\\
\\
1 Mar\>*\>	DEC\>				Computer Design\\
3 Mar\>\>	GM\>				Control Systems\\
10 Mar\>\>	Bellcore\>			Telecommunications\\
16 Mar\>\>	ROLM\>				Communications\\
\\
\\
\end{tabbing}
\begin{center}
{\large Graduate Schools}
\end{center}

\begin{tabbing}
123456789\=123\=123456789012345678901234567890\=12345\=\kill\\
\>\>		MIT\>				Systems, Communications \& Control\\
\>\>		Stanford\>			Engineering -- Economic Systems\\
\>\>		Stanford\>			Communications \& Control\\
\>\>		UC Berkeley\>			Communications \& Control\\
\>\>		Univ. of Illinois\>		Communications \& Control\\
\\
17 Feb\>*\>	Hertz Foundation\>		Fellowship\\
31 Mar\>\>	MIT Lincoln Laboratories\>	Research Assistanceship\\
\end{tabbing}

\begin{center}
{\large Offers}
\end{center}

\begin{tabbing}
123456789012345678901234567\=8901234567\=8901234567890\=12345\=\kill\\
\\
HP San Diego\> \$32,760\\
Oracle\> \$33,000\\
Cambridge Technology Group\> \$33,000\\
Boeing Electronics\> \$35,000\\
\\
\end{tabbing}
\end{document}

nsf.mss:
@device (postscript)
@make (article)
@style [fontfamily = TimesRoman,
	fontscale  = 12]
@style (spacing 2, indent 5, spread 1, leftmargin 1.2 inches, rightmargin 1.2 inches, topmargin 1.5 inches)
@set(page=1)
@begin(MajorHeading) 
NSF essays
@end(MajorHeading)
@BlankSpace(1)


In a concise statement, summarize the objectives of your educational
program and your long-range professional goals.

----------------------------------------------

I want to apply the ideas and techniques of electrical engineering to
practical problems in interdisciplinary situations.  My proposed plan
of studying Systems, Communications, and Control will provide the
modeling and problem solving skills I will need to accomplish my
professional goal.


----------------------------------------------

----------------------------------------------

Describe any educational and personal experiences that you consider
relevant to the goals and objectives stated above and to the proposal
presented in your @i(Proposed Plan of Study and/or Research).  Also
describe below: participation in relevant volunteer and
extracurricular activities; professional work experience or
work-training experience; any other significant accomplishments and
background information.

----------------------------------------------

My summer work experiences and conversations with engineering
professionals have demonstrated that I like what is loosely known as
"systems engineering" -- the process of pulling together knowledge
from many disciplines and determining an approach to a complex
problem.  As an undergraduate and a coop student I have found that I
enjoy adapting to an unfamiliar situation and quickly acquainting
myself with a new field.  For instance, one of my favorite classes as
an undergraduate was Acoustics, because it required the incorporation
of electrical, mechanical, and acoustical systems into a single model.
Working in this particular field at Bose Corporation and becoming an
assistant teaching assistant for the Acoustics class have given me
excellent experience in dealing with interdisciplinary systems.
Another class which has fascinated me and which applies directly to my
proposed graduate studies is Probabilistic Systems Analysis.
Studying systems involving randomness requires logical analysis of a
problem and realizing when certain techniques will work much better
than other equally valid techniques.  In probability, as in acoustics,
I learned the importance of breaking a problem into smaller and more
familiar problems to apply insight from similar problems I have
already modeled.  Classes such as these have sparked my interest in
applying the ideas of electrical engineering to seemingly unrelated
situations.

I have rounded out my undergraduate experience with several
extracurricular activities and part-time jobs.  As a Projection
Subdirector for the MIT Lecture Series Committee I have learned much
about how motion pictures and their soundtracks work.  I also have a
strong interest in music, and I am learning to play the piano.  To
help finance my education I have worked as a clerical assistant at the
Deans' office of the Sloan School of Management.  I also spent two
summers helping my father run his automotive repair shop: I performed
tasks such as delivering customers' cars, picking up parts, answering
the phone, and scheduling appointments.  All of these activities have
given me the opportunity to work with people whose backgrounds were
quite different from mine and to learn from their methods of
approaching problems.

----------------------------------------------

----------------------------------------------

In no more than two pages describe your proposed plan of study and/or
research for the period covered by this fellowship.  A listing of
courses without a written statement is not sufficient.  If you have
not yet formulated a plan of study or research, state how you expect
your study program to further your educational objectives.

----------------------------------------------

As an MIT Electrical Engineering undergraduate, I have appreciated the
variety of topics I have studied.  After much thought, I have
determined that I do not want to restrict myself to a narrow subset of
electrical engineering directed toward a specific product or
technology.  Instead, I want to look at the "big picture."  In
graduate school, I would like to explore how the ideas and techniques
of electrical engineering can be applied to practical problems in many
fields.

Consequently, the research area of Systems, Communications, and
Control fascinates me because it not only provides a firm base in
theory and mathematics, but also applies this theory to a large number
of practical situations.  I am especially interested in many
mathematically oriented fields related to operations research
(considered part of Electrical Engineering at MIT), including
statistics, queueing theory, mathematical programming, and control of
stochatic systems.  Operations research, with its mathematical models
and optimization techniques, offers an excellent opportunity to pursue
my professional interest of applying electrical engineering concepts
to complex interdisciplinary situations.  Furthermore, studying
communication systems would be exciting because modeling these systems
involves understanding concepts from many disciplines, such as
physics, engineering, and mathematics.  I am also interested in signal
processing and its numerous applications; for instance, my senior
thesis deals with analyzing musical data.  Studying and doing research
in these strongly interrelated disciplines in the area of Systems,
Communications, and Control will strengthen my modeling and problem
solving abilities.

-----------------------------

-----------------------------

In no more than two pages, describe any scientific research activities
in which you have participated as an undergraduate, such as experience
in undergraduate research participation programs and student-oriented
studies programs, or research experience gained through summer or
part-time employment, or in work-study programs.

-----------------------------

As an undergraduate, I have worked in a variety of research
environments.  I will start my largest project, my senior thesis, in
January.  Last summer I did some headphone research at Bose
Corporation.  Furthermore, I have done some "research support" work at
two MIT labs as an undergraduate research assistant.

My thesis will be a study of musical timbre based on a broad-band
filter system.  Previous studies of musical timbre recognition have
assumed that the ear separates the input sound signal into narrow,
overlapping frequency bands.  Current research suggests that the
auditory system may be better modeled at normal sound levels as a
collection of broad-band, overlapping filters.  My thesis will consist
of three major parts.  First, I will decide on a (fairly simplistic)
model of the auditory system.  A good starting place would be a
computer-implemented collection of overlapping broad-band filters.
Later, I may investigate filters which change their bandwidth as a
function of the intensity of the incoming signal.  Next, I will use
the model to study the timbre of various musical instruments.  Time
domain analysis will reveal features such as onset patterns,
risetimes, and formants.  The results will be compared to traditional
studies of musical timbre.  Finally, I may use my data to create and
test an algorithm for recognition of instruments.

At Bose corporation I did research on the stability of Bose's active
noise cancelling communication headsets.  These headsets use a
microphone to listen to the noise in the ear cavity, change the phase
of the signal by 180 degrees, and then broadcast the negative signal
through a driver in the ear cavity to cancel the noise.
Unfortunately, this feedback system occasionally becomes unstable.
Suspected causes of this instability include leaks between the earcup
and the head of the user.  A coworker took frequency resposes of the
open-loop feedback system from the microphone to the driver, and I
attempted to model this data with complex pole/zero analysis.  Using a
proprietary software package to analyze the frequency responses, I first
decided how many pole/zero pairs were needed to model the response
within two decibels between 20 Hz and 1 kHz (the operating range of
the unit).  Studying several responses showed that 12 sets would be
optimal since more sets added to the complexity of the model
without adding any more accuracy.  I made pole/zero plots of the
system with a good seal to the head and with a leaky seal caused by a
pencil behind the ear.  I intended to see how the poles and zeros
moved around when the leak was introduced.  Unfortunately, there was
too much variation among the data from different people's
heads to generate clearly recognizable and repeatable groups of poles.
Eventually my supervisor and I decided that this system was simply too
complex to model in this manner.  From this experience I learned that
useful results from research are not guaranteed even if the researcher
is thorough and careful.

At the MIT Center for Space Research's Charge-Coupled Device Lab, I
adapted an image processing software package to run on our computer
system.  Our display system was incompatible with the software, so I
wrote some programs to patch this new software package to the display
routines from another package we were using.  I then wrote a user's
guide which explained how to use the package on our system.  I used this
software package, called FOCAS (Faint Object Classification and
Analysis System) to analyze CCD images of "calibration"
stars to check the accuaracy of the software's location algorithms.
Later, as a new project during winter break, I examined some
charge-coupled device xray detectors to determine their serial charge
transfer efficiency.

Finally, at the MIT Laboratory for Computer Science I taught myself
the C programming language using the well-known Kernighan and Ritchie
book.  I then wrote a low-resolution graphics program to plot
histograms given data in ASCII format.  The graduate student for whom
I worked integrated this graphics program as a display of resource
utilization in a software package of his which modeled a tagged-token
dataflow architecture.


stanford.tex:
\documentstyle[doublespace,12pt]{article}

\begin{document}


\begin{center} 
{\huge Stanford essay}
\end{center}

Write a brief statement concerning both your past work in your
intended field of study and allied fields, your plans for graduate
study (at Stanford) and your subsequent career plans.  This is a very
important part of your application, so please pay particular attention
to clarity and a careful exposition of the relevance of your past work
and future intentions to the program for which you are applying.

----------------------------------------------

As an MIT freshman I decided to major in Electrical Engineering
because I knew that its techniques are applicable to many types of
professions.  My career plans center on applying analytical skills
such as those used in engineering to complex interdisciplinary
problems.  Graduate studies in Engineering-Economic Systems will
provide the ``portable concepts'' necessary for my professional goals.

My undergraduate experiences have demonstrated that I like what is
known as ``systems engineering'' --- the process of pulling together
the information needed for an interdisciplinary project.  This fall I
thoroughly enjoyed being a tutorial instructor for acoustics, one of
my favorite classes.  Understanding acoustical problems requires the
incorporation of electrical, mechanical, and acoustical systems into a
single model.  At Bose Corporation last summer I had the opportunity
to apply acoustical theory to practical problems.  As an engineering
coop student I constructed, tested, and evaluated prototype products
including active noise reduction communication headsets and consumer
electronics.  My thesis research in audio signal processing
consolidated concepts from my math, engineering, and music classes: I
used a broad-band filter system to study musical timbre.  Much of my
work has dealt with systems involving a great deal of uncertainty.
Probabilistic systems analysis, another of my favorite classes, helped
me develop insight into situations where an exact answer can not be
found.

In graduate school I plan to build upon my past work with
interdisciplinary systems and probabilistic analysis.  I especially
want to focus on the complex problems which arise in a business
environment.  For instance, my father runs an independent automotive
repair shop, and he has struggled for thirty years because of his
difficulties with long-term strategies.  I want to help dedicated
people like my father to succeed through better planning.  I am
fascinated by the idea of applying systems engineering concepts to
business problems like strategic planning.  Consequently, I want to
study Business Systems, Decision Analysis, and Economic Analysis.

I want to do my graduate research at Stanford because of the
excellence of the faculty and the outstanding approach of the
Department of Engineering-Economic Systems.  The core classes of this
department combine the topics that interest me in engineering (dynamic
systems), operations research (probability, optimization), and
business (economics, decision analysis).  I am also attracted by the
opportunity to apply theoretical concepts to ``real-life'' problems
through project courses and the PhD internship program.  I am
interested in a variety of professions including consulting,
university teaching and research, and project management.  Challenging
graduate study and research in Engineering-Economic Systems at
Stanford will bring me much closer to these professional goals.

\end{document}


summerres.mss:
@device(Postscript)
@make(text)
@style(indent 0)
@style(justification off)
@style(rightmargin 0.2inches)
@style(leftmargin 0.6inches)
@style(bottommargin 0.1inches)
@style(topmargin 0.7inches)
@style(fontfamily=TimesRoman, fontscale=10)
@display<
@tabclear
@tabset(5.55inches)
@MajorHeading(Stephen M. Malinak)

Campus Address: (until 6/5/89)@\Home Address: 
362 Memorial Drive@\3321 Rowena Road
Cambridge, MA 02139@\Barberton, OH 44203
(617) 225-7275@\(216) 644-7894
(617) 253-3161 (messages)

@tabclear
@tabset(1.3 inches, 4.2 inches,)


@b[OBJECTIVE:] @\A challenging summer position in electrical engineering or a related field.


@b[EDUCATION:] @\MASSACHUSETTS INSTITUTE OF TECHNOLOGY.
@\Candidate for S.B., Electrical Engineering and Computer Science, in June 1989.
@\Thesis: Teaching a computer to recognize musical instruments.
@\Member of Tau Beta Pi and Eta Kappa Nu.
@\GPA: 5.00/5.00 in major, 4.96/5.00 overall.

@\Awarded National Science Foundation fellowship to continue studies at Stanford University
@\in the Department of Engineering-Economic Systems.

@\ST. VINCENT-ST. MARY HIGH SCHOOL.
@\Valedictorian, 1985.
@\Departmental awards in Science, Mathematics, English, and Foreign Languages.
@\-Akron's "Teenager of the Year" for 1985@\-High School Newspaper, Editor-in-Chief
@\-National Merit Scholarship@\-National Honor Society


@b[EXPERIENCE:]
Fall, 1988@\MIT DEPARTMENT OF ELECTRICAL ENGINEERING,  @i(Cambridge, MA)
@\@i[Tutorial Instructor:] Conducted one-on-one tutorials in acoustics focusing on problem-solving
@\techniques. Held office hours, corrected problem sets.

Summer, 1988@\BOSE CORPORATION,  @i(Framingham, MA)
@\@i[Engineering Coop Student:] Construction, testing, and evaluation of prototype products including
@\active noise reduction communication headsets and consumer electronics.

Fall, 1987@\MIT LECTURE SERIES COMMITTEE,  @i(Cambridge, MA)
to present@\@i[Projection Subdirector:] Training and supervising projectionists. Troubleshooting and operating
@\motion picture projectors and sound equipment.

Fall, 1987@\MIT CENTER FOR SPACE RESEARCH,  @i(Cambridge, MA)
Summer, 1987@\@i[Undergraduate Research Assistant:] Participation in characterization of charge-coupled devices
@\for X-ray astronomy. Adaptation and development of image processing software.

Spring, 1987@\MIT LABORATORY FOR COMPUTER SCIENCE,  @i(Cambridge, MA)
@\@i[Undergraduate Research Assistant:] Programming in C to develop a graphics tool to display
@\experimental data from simulation of parallel computer architectures.

Spring, 1987@\MIT SLOAN SCHOOL OF MANAGEMENT,  @i(Cambridge, MA)
Fall, 1986@\@i[Clerical Assistant:] Messenger service, sorting and filing expense accounts, assistance with
Fall, 1985@\preparation of class notes and assignments, requisition of supplies, document reproduction,
@\preparation of bulk mailings, miscellaneous clerical duties.

Summer, 1986@\MAIN CARBURETOR AND IGNITION,  @i(Akron, OH)
Summer, 1985@\@i[Managerial Assistant:]  Purchase of parts, delivery of customer cars, answering the phone,
@\scheduling of appointments, organization of repair stalls.


@b[INTERESTS:] @\Playing piano, chess, travel, and golf.
>


thesis.tex:
\documentstyle[11pt,eecsthesis]{report}

\addtolength{\topmargin}{.125in}
\addtolength{\textheight}{-.25in}

\addtolength{\oddsidemargin}{.125in}
\addtolength{\evensidemargin}{.125in}
\addtolength{\textwidth}{-.25in}

\begin{document}

\title{Teaching a Computer to Recognize Musical Instruments}

\author{Stephen M. Malinak}
\department{Department of Electrical Engineering and Computer Science}

\degree{Bachelor of Science in Electrical Science and Engineering}

\degreemonth{May}
\degreeyear{1989}
\thesisdate{May 16, 1989}

\copyrightnotice{Stephen M. Malinak 1989}

\supervisor{Campbell L. Searle}{Professor of Electrical Engineering}

\chairman{Leonard A. Gould}
  {Chairman, Department Committee on Undergraduate Theses\\$\;$}

\maketitle

\begin{abstractpage}
Previous studies of musical timbre recognition have assumed that the
human ear filters the input sound signal into {\em narrow},
overlapping frequency bands.  Current research suggests that the
auditory system may be better modeled at normal sound levels as a
collection of {\em broad} bandwidth, overlapping filters.  In this
thesis, a twelve-channel, wide bandwidth filter bank model of the
auditory system is used to analyze a collection of notes from two
woodwind instruments (clarinet and flute) and three brass instruments
(French horn, trombone, and trumpet).  Certain transient features of the
attack and spectral features of the steady-state enable the
construction of a musical instrument recognition algorithm.  Other
instrumental characteristics suggest more reliable alternative
algorithms which would use more criteria gained from a statistical
study of many more notes.
\end{abstractpage}


\section*{Acknowledgments}

Professor Searle, my supervisor, was nearly always available (!)\ to
assist me in focusing my research.  I really appreciate his patience
and encouragement.  His wonderful attitude made the entire thesis
experience less stressful and more enjoyable.

\vspace{1.6em}

\noindent Richard Kim and Hugh Secker-Walker have written numerous programs for
filtering, statistical analysis, and graphical display of signals.
The twelve-channel, broad bandwidth filter bank used in this thesis
comes directly from Richard's semi-linear filter bank.  Richard and Hugh
showed me how to use the digital signal processing library upon which
this thesis was built.  I am grateful for their eagerness to help me
immediately any time I asked.

\vspace{1.6em}

\noindent Gill Pratt provided the computer facilities.  Thanks for all of the
disk space.

\vspace{1.6em}

\noindent I would especially like to thank Ken Malsky for giving me the
digitized musical notes.  Using his recordings helped me avoid the
time sink involved in recording new notes.

\vspace{1.6em}

\noindent Thanks to the financial aid office for making MIT financially
possible.

\vspace{1.6em}

\noindent I would like to thank Mom and Dad for their endless support.

\vspace{1.6em}

\noindent Finally, I want to thank my friends who have helped me to survive MIT
and learn much about life, the universe, and everything.

\tableofcontents
\listoffigures
\listoftables


\chapter{Background and Motivation}


\section{Modeling the peripheral auditory system}

Decades of research have shown that the signal processing done by the
peripheral auditory system includes a complicated filtering function.
The bandwidth of the filtering appears to increase with stimulus
intensity.  To model this phenomenon, Richard Kim created a
``semi-linear'' filter bank which consists of twenty sets of linear
fifty-channel filter banks.  Different stimulus levels trigger
different sets of filters: narrow bandwidth at the threshold of
hearing and wide bandwidth at normal ``conversational'' levels
\cite{Kim}.  This study of timbre is based upon a subset of Kim's
model with the filters set at their widest bandwidths.


\section{Previous studies of musical timbre}

Studying musical timbre produces information useful for a variety of
research areas.  Musical notes provide simpler data than speech
waveforms for evaluating auditory system models such as Richard Kim's
filter bank.  Moreover, timbre research leads to better synthesis of
musical sounds.  In fact, John Grey states that timbre analysis will
not be complete until synthesized tones become {\em indistinguishable}
from the original instrument tones \cite[page 16]{Grey}.  Furthermore,
many timbre researchers hope to extend pitch detection and instrument
recognition to computer transcription of orchestral scores.

The human auditory system gathers three kinds of information from
sounds.  A listener detects loudness from the signal's intensity and
pitch from the signal's periodicity.  The auditory system then
determines the timbre of the signal \cite[page 138]{Roe}.  According
to the American National Standards Institute (1960), ``{\em Timbre} is
that attribute of auditory sensation in terms of which a listener can
judge two sounds similarly presented and having the same loudness and
pitch as dissimilar \cite[page 113]{Ross}.''  Because of its
multi-faceted nature, timbre perception depends on a tremendous
variety of qualities from the signal.  Pitch and loudness can be
graded on one-dimensional scales, but timbre requires several
``dimensions.''  Grey studied sixteen instruments and created a
sketchy three-dimensional model for ``timbral space'':

\begin{quote}
One dimension related to the {\em spectral energy distribution}, while
the other two related to the temporal pattern of the attack and decay
of the tones, namely, the presence of {\em low-amplitude,
high-frequency energy}\/ in the initial attack segment and the
presence of {\em synchronicity}\/ in the attacks and decays of the
higher harmonics.  This interpretation is qualified with the following
remarks: 1) it may be important that the energy referred to in the
former dimension was {\em inharmonic}\/; 2) it is also possible that
musical instrument {\em family}\/ relationships were the basis of the
latter dimension \cite[page 67]{Grey}.
\end{quote}


	\subsection{Importance of the attack}

Most timbre researchers have studied individual musical notes played
on one instrument at a time.  Musical notes consist of three temporal
sections: the attack, the steady-state, and the decay.  Generally, the
amplitude of the signal's envelope increases during the attack,
remains relatively constant during the steady-state, and decreases
during the decay.  Figure 1-1 shows a musical note.

\begin{figure}[htbp]
\vspace*{2.75in}
\caption{A musical note}  This is ``Middle C'' (262 Hz) played on a
clarinet.  The attack is from 0.00 sec to 0.15 sec, steady state from
0.15 sec to 1.1 sec, and decay from 1.1 sec to 1.35 sec.
\end{figure}

Early timbre studies focused on the steady-state portion of musical
notes.  Hermann Von Helmholtz determined in 1885 that the steady-state
spectrum correlates strongly with the timbre of a note \cite[page
52]{Dodge}.  The pitch of the tone appears at the {\em fundamental}\/
frequency in the spectrum, although the amplitude of this fundamental
is not necessarily the maximum amplitude of the spectrum.  Energy is
also usually present at several {\em harmonics}\/ --- frequencies
which are integral multiples of the fundamental.  Spectra often
contain some inharmonic energy caused by excitation actions such as
bowing or blowing.  Like the human voice, many instruments have {\em
formants}.  Formants are spectral maxima that tend to remain in a
fixed frequency range regardless of the pitch of the note.  All of
these features give an instrumental signature to the steady-state
spectrum of individual notes.

Recent studies, however, have determined that the attack may be at
least as important as the steady-state for recognition of instruments.
For instance, in 1967 Strong and Clark interchanged the spectra of
notes from different wind instruments while retaining the same time
envelopes, and they presented the hybrid waveforms to listeners.  They
found the spectrum to be more important for identification of certain
instruments, the time envelope for others, and both of equal
importance for the rest \cite[page 119]{Ross}.  Grey's study, quoted
above, noted that the attack plays a major role in two dimensions of
timbral space.  John Bourne found risetimes and temporal onset
patterns of harmonics to be useful for a recognition algorithm
\cite[page 74]{Bourne}.


	\subsection{Characteristics of five instruments}

In 1978 J\"{u}rgen Meyer conducted a study of numerous orchestral
instruments, and the following subsections summarize his findings for
the five instruments analyzed in the next chapter.  Several of these
characteristics demonstrate how the timbre of a type of instrument
depends on several variables such as pitch and loudness of the note
and actions of the human performer.


		\subsubsection{Clarinet}

\begin{itemize}
\item in the low register the odd harmonics are much stronger than the
even harmonics; in the middle register the second harmonic is still
weaker than the fundamental and third, although higher harmonics are more
nearly equal \cite[page 54]{umlatt}.
\item risetime varies from 15 ms for a ``hard'' attack to more than 50
ms for a ``soft'' attack \cite[page 55]{umlatt}.
\item entrance of higher harmonics is delayed in a soft attack
\cite[page 56]{umlatt}.
\end{itemize}


		\subsubsection{Flute}

\begin{itemize}
\item has a longer attack than any other orchestral instrument
\cite[page 50]{umlatt}.
\item harmonic content varies tremendously with dynamics: {\em pp}
notes sound like pure sinusoids while {\em ff} notes contain numerous
strong overtones \cite[page 50]{umlatt}.
\item notes start with ``preliminary tones'' --- high frequency resonances
caused by the start of blowing \cite[page 50]{umlatt}.
\end{itemize}

		\subsubsection{French horn}

\begin{itemize}
\item 10 -- 30 ms preliminary impulse in harmonics below 1000 Hz \cite[page
39]{umlatt}.
\item poor attacks can be characterized by repeated blips which sound like a
rolled ``r'' \cite[page 39]{umlatt}.
\item main formant occurs around 340 Hz \cite[page 37]{umlatt}.
\end{itemize}

		\subsubsection{Trombone}

\begin{itemize}
\item fast risetime: final amplitude reached in about 20 ms for notes
in the high register \cite[page 47]{umlatt}.
\item biggest formant at 480 Hz in low register, 600 Hz in high
register \cite[page 46]{umlatt}.
\item preliminary impulse which sounds harsh in strong attacks and
becomes barely detectable in smooth attacks \cite[page 47]{umlatt}.
\end{itemize}

		\subsubsection{Trumpet}

\begin{itemize}
\item richer in overtones than any other orchestral instrument
\cite[page 42]{umlatt}.
\item main formant at about 1200 Hz \cite[page 42]{umlatt}.
\item ``incisive'' attack lasts 25 -- 30 ms and is marked by high
frequencies and a 5 ms preliminary impulse \cite[page 44]{umlatt}.
\end{itemize}


\section{Timbre recognition: a ``conditioned response''}

Recognition of timbral sources is a complicated and learned process.
A person who hears a note from an unfamiliar type of instrument can
determine pitch, loudness, and a few vague timbral characteristics
(``bright,'' ``nasal,'' ``muddy''), but he cannot identify the source
of the sound.  Juan Roederer describes instrument identification as a
``conditioned response'' consisting of (1) learning: ``{\em
storage}\/ in the memory with an adequate label of identification;''
and (2) responding: ``{\em comparison}\/ with previously stored
and identified information \cite[page 139]{Roe}.''  Hence, a
reasonable approach to teaching a machine to recognize musical
instruments includes: (1) studying and labeling information derived
from various types of musical signals, and (2) constructing an
algorithm to recall this stored information and to identify the tone
source.


\input{two}


\chapter{An Algorithm for Recognition of Musical Instruments}


\section{Creation of algorithm}

The data analysis explained in the previous chapter reveals several
characteristics of musical notes.  These characteristics provide
enough useful information for the creation of a recognition algorithm
which can identify an instrument from the first 320 milliseconds of a
single note.  The simple algorithm presented here uses only three
characteristics: high frequency inharmonic bursts, presence of a
second harmonic, and spectral distribution of energy.  Each test
distributes a certain number of points to each possible instrument,
and the algorithm chooses the instrument with the most points at the
end of these tests.

The decision program includes several of the subroutines used to
analyze the original seventeen notes.  This program first determines
the fundamental pitch of the raw note.  Next, it finds the maximum
value and ``onset point'' of each channel.  Finally, the program
detects the inharmonic bursts found at the start of woodwind notes
(the bursts must be present in at least two channels) and examines the
harmonics of the three channels above the fundamental channel.


\begin{table}[tbp]

	\caption{Points for decision criteria}  The values of channel
maxima for the energy criteria were picked as an approximate best
boundary for the data from the seventeen notes.

\centering

\vspace{1em}

\begin{tabular}{|l||c|c|c|c|c|}
\hline
Instrument & 	Clarinet & 	Flute & 	French horn & 	Trombone & 
	Trumpet\\
\hline \hline
Inharmonic bursts & 15 &	15 &		0 &		0 &
	0\\
\hline
No inharmonic bursts &	0 &	0 &		10 &		10 &
	10\\
\hline \hline
No 2nd harmonic & 20 &		0 &		0&		0 &
	0\\
\hline
2nd harmonic & 	0 &		5 &		5 &		5 &
	5\\
\hline \hline
Channel 10 max $<=$ .04 & 0 &	0 &		20 &		0 &
	0\\
\hline
Channel 10 max $>$ .04 & 5 &	5 &		0 &		5 &
	5\\
\hline \hline
Channel 12 max $>=$ .15 & 0 &	0 &		0 &		0 &
	20\\
\hline
Channel 12 max $<$ .15 & 5 &	5 &		5 &		5 &
	0\\
\hline
\end{tabular}

\end{table}

The decision algorithm awards points to the five possible instruments
according to the values in Table~3.1.  Although it is somewhat
arbitrary, this method of awarding points provides a reasonably
efficient way to utilize the small amount of data gathered from the
seventeen notes.  If only one instrument can pass a test, twenty
points are awarded.  If the note passes this type of criterion, the
appropriate instrument gets all twenty points.  Otherwise, the points
are evenly distributed among the other four instruments.  Since two
instruments can have bursts, the burst criterion distributes 30
points.  This simple algorithm works very well for the seventeen notes
studied.

To speed and automate the processing of new edited notes, a single
shell script controls all of the filtering, data collection, and
decision making.  Hence, once a note has been edited from its
recording, all further processing is done without any human input.
This shell script can run in the ``background'' of a UNIX environment.
Processing a single note takes approximately six minutes on a VAX 11/750
if no one else is using the computer at the same time.

\section{Testing}

The algorithm was first applied to the seventeen notes that had been
carefully analyzed.  Since the decision tests were derived from
characteristics measured for these notes, the algorithm matched all
seventeen notes to the appropriate instruments.  Tables 2.3 and 2.4
summarize the scoring for these notes.

Creating a more challenging test required editing more notes from the
recordings.  These new notes had the same fundamental pitches as the
first notes, but this time the notes were extracted from the downward
runs of the arpeggios rather than the upward runs.  These notes had
not been examined before the creation of the decision criteria.  The
algorithm matched fifteen of these seventeen new notes to the
appropriate instrument.  Tables 3.2 and 3.3 contain the results of
this testing.  Table 4.1 shows a confusion maxtrix for the
identification of all 34 notes.

The algorithm's major flaw is that it sometimes fails miserably if it
does not correctly identify the instrument.  One flute note did not
have the inharmonic attack transients, so the algorithm got confused
and called it a trombone note.  The next chapter suggests a more
intelligent algorithm which would fail more gracefully if it did not
correctly identify an instrument.

Two other problems, however, were handled more successfully.  One
clarinet note had a detectable second harmonic, so the algorithm
called it a flute note.  The flute falls into the same orchestral
category as the clarinet.  Finally, one French horn note's preliminary
impulses circumvented the temporal criteria designed to catch them and
were considered woodwind bursts.  Nevertheless, the other tests
enabled the algorithm to correctly identify this note.


\begin{table}[p]

\centering

\caption{Testing of algorithm on new woodwind notes}

\begin{large}

\vspace{1em}

\begin{tabular}{|l||l|l|l|l||l|l|l|}
\hline
\multicolumn{8}{|c|}{{\sc characteristics}}\\
\hline
instrument &	   \multicolumn{4}{c||}{Clarinet} &
		   \multicolumn{3}{c|}{Flute}\\
\hline
raw note &         C 262 & 	E 330 &		G 392 &		C 523 &
				E 330 & 	G 392 & 	C 523\\
\hline \hline
fundamental &	   262.3 &	326.5 &		395.1 &		524.6 &
				326.5 &		395.1 &		524.6\\
\hline
bursts? &	   yes &	yes &		yes &		yes &
				no &		yes &		yes\\
\hline
harmonic[1] &	   1 &		1 &		1 &		1 &
				2 &		2 &		2\\
\hline
harmonic[2] &	   3 &		1 &		1 &		3 &
				2 &		2 &		3\\
\hline
harmonic[3] &	   3 &		3 &		2 &		3 &
				2 &		3 &		3\\
\hline
channel 10 max &   0.1199 &	0.1284 &	0.0501 &	0.0720 &
				0.0778 &	0.1246 &	0.1121\\
\hline
channel 12 max &   0.0413 &	0.0323 &	0.0090 &	0.0147 &
				0.0317 &	0.0310 &	0.0340\\
\hline
\multicolumn{8}{|c|}{{\sc scores}}\\
\hline
Clarinet &	   45 $\ast\ast$ &	45 $\ast\ast$ &		25 &		45 $\ast\ast$ &
				10 &		25 &		25\\
\hline
Flute & 	   25 &		25 &		30 $\ast\ast$ &		25 &
				15 &		30 $\ast\ast$ &		30 $\ast\ast$\\
\hline
French horn &	   5 &		5 &		10 &		5 &
				20 &		10 &		10\\
\hline
Trombone &	   10 &		10 &		15 &		10 &
				25 $\ast\ast$ &		15 &		15\\
\hline
Trumpet &	   5 &		5 &		10 &		5 &
				20 &		10 &		10\\
\hline
\end{tabular}
\end{large}
\end{table}

\begin{table}[p]

\caption{Testing of algorithm on new brass notes}

\begin{large}

\centering

\vspace{1em}
\begin{tabular}{|l||l|l|l||l|l|l|}
\hline
\multicolumn{7}{|c|}{{\sc characteristics}}\\
\hline
instrument &	   \multicolumn{3}{c||}{French Horn} &
		   \multicolumn{3}{c|}{Trombone}\\
\hline
raw note &      	C 262 &		E 330 &		G 392 &
	      		C 262 & 	E 330 & 	G 392\\
\hline \hline
fundamental &	   	262.3 &		333.3 &		395.1 &
			262.3 &		329.9 &		400.0\\
\hline
bursts? &	   	no &		yes &		no &
			no &		no &		no\\
\hline
harmonic[1] &	   	2 &		1 &		1 &
			2 &		2 &		2\\
\hline
harmonic[2] &		2 &		2 &		2 &
			3 &		2 &		2\\
\hline
harmonic[3] &	   	2 &		3 &		3 &
			3 &		3 &		2\\
\hline
channel 10 max &   	0.0214 &	0.0228 &	0.0205 &
			0.1141 &	0.1732 &	0.1160\\
\hline
channel 12 max &  	0.0050 &	0.0039 &	0.0043 &
			0.0188 &	0.0263 &	0.0201\\
\hline \hline
\multicolumn{7}{|c|}{{\sc scores}}\\
\hline
Clarinet &		5 &		20 &		5 &
			10 &		10 &		10\\
\hline
Flute & 		10 &		25 &		10 &
			15 &		15 &		15\\
\hline
French horn &		40 $\ast\ast$ &		30 $\ast\ast$ &		40 $\ast\ast$ &
			20 &		20 &		20\\
\hline
Trombone &		20 &		10 &		20 &
			25 $\ast\ast$ &		25 $\ast\ast$ &		25 $\ast\ast$\\
\hline
Trumpet &		15 &		5 &		15 &
			20 &		20 &		20\\
\hline
\end{tabular}

\vspace{3em}

\begin{tabular}{|l||l|l|l|l|}
\hline
\multicolumn{5}{|c|}{{\sc characteristics}}\\
\hline
instrument &	   \multicolumn{4}{c|}{Trumpet}\\
\hline
raw note &         C 262 & 	E 330 & 	G 392 & 	C 523\\
\hline \hline
fundamental &	   260.2 &	329.9 &		390.2 &		516.1\\
\hline
bursts? &	   no & 	no &		no &		no\\
\hline
harmonic[1] &	   2 &		2 &		2 &		2\\
\hline
harmonic[2] &	   2 &		3 &		3 &		2\\
\hline
harmonic[3] &	   4 &		3 &		4 &		3\\
\hline
channel 10 max &   0.9019 &	0.9268 &	0.9455 &	0.8304\\
\hline
channel 12 max &   0.3013 &	0.3603 &	0.2943 &	0.3196\\
\hline \hline
\multicolumn{5}{|c|}{{\sc scores}}\\
\hline
Clarinet &	   5 &  	5 &		5 &		5\\
\hline
Flute & 	   10 &		10 &		10 &		10\\
\hline
French horn &	   15 &		15 &		15 &		15\\
\hline
Trombone &	   20 &		20 &		20 &		20\\
\hline
Trumpet &	   40 $\ast\ast$ &	40 $\ast\ast$ &		40 $\ast\ast$ &		40 $\ast\ast$\\
\hline
\end{tabular}

\end{large}

\end{table}


\chapter{Conclusions and Suggestions for Further Work}


\section{Conclusions}

This project was motivated by the desire to see if a wide bandwidth
filter bank simulation of the auditory system can help a computer to
recognize musical instruments.  The filtered notes show numerous
characteristics of their instruments.  Some of these characteristics
enabled the creation of a simple algorithm for instrument recognition
based on the first 320 milliseconds of a single note.  Table 4.1
presents a confusion matrix for the identification of the 34 notes
tested.


\begin{table}[htb]
	\caption{Confusion matrix}
\centering

\begin{picture}(0,0)(0,0)
\put(-176,-88){{\sc \shortstack{A\\c\\t\\u\\a\\l}}}
\end{picture}

\begin{tabular}{|c|l||c|c|c|c|c|}
\cline{3-7}
\multicolumn{2}{l}{} & \multicolumn{5}{|c|}{{\sc Algorithm Guesses}}\\
\cline{3-7}
\multicolumn{2}{l}{} & \multicolumn{1}{|c|}{Clarinet} & Flute & French horn & Trombone & Trumpet\\
\cline{3-7} \hline
	&	Clarinet &	7 & 1 & 0 & 0 & 0\\
\cline{2-2} \cline{3-7}
$\;\;\;$ &Flute &		0 & 5 & 0 & 1 & 0\\
\cline{2-7}
	&	French horn &	0 & 0 & 6 & 0 & 0\\
\cline{2-7}
$\;\;\;$ &	Trombone &	0 & 0 & 0 & 6 & 0\\
\cline{2-7}
	&	Trumpet &	0 & 0 & 0 & 0 & 8\\
\hline
\end{tabular}

\end{table}

\newpage


The project was mostly successful, although the usefulness of the
simple recognition algorithm is rather limited.  Given the restricted
choice of five instruments played by a single performer at one
loudness over a single octave, the algorithm correctly identified the
instruments from which 32 of 34 notes came.  Hence, a wide bandwidth
filter bank can assist in a practical recognition task.


\section{Suggestions for further work}

	\subsection{Possible extensions for algorithm}

Numerous simple extensions could expand this instrument recognition
system without changing its basic approach.  Analyzing more than one
performer's notes would result in a more robust decision process.
Notes from the same instruments but from different octaves could be
analyzed.  Furthermore, the present algorithm could easily be extended
to new instruments after a study of several of their notes.  The study
of new instruments would lead to the introduction of new criteria for
decisions.

	\subsection{Statistical studies}

The twelve-channel filter bank could also be very useful for a more
rigorous statistical study of many notes.  The statistical
characteristics could be used to construct a much more rigorous and
``intelligent'' decision-making system.  A better algorithm would
assign points differently.  Instead of awarding points on an
all-or-nothing basis for each test, different measurement levels would
result in different awarding of points.  Probabilistic decision
analysis models are available for this kind of algorithm.  These
models would make decisions on the basis of assumed probabilistic
distributions derived from extensive study of several thousand notes.
Characteristics such as harmonics, spectral distribution of energy,
attack transients, risetimes, onset patterns, and pitch range could
prove quite helpful for this kind of study.


\begin{thebibliography}{99}

\bibitem{Bourne} Bourne, John B.
{\em Musical Timbre Recognition Based on a Model of the Auditory
\mbox{System.}} Master's Thesis, Massachusetts Institute of
Technology, 1972.

\bibitem{Dodge} Dodge, Charles and Thomas A. Jerse.
{\em Computer Music: Synthesis, Composition, and Performance.}  New
York: Schirmer Books, 1985.

\bibitem{Grey} Grey, John M.
{\em An Exploration of Musical Timbre.}  Doctoral dissertation,
Stanford University, 1975.

\bibitem{Kim} Kim, Richard Y.
{\em A Semi-Linear Filter Bank Model of the Peripheral Auditory
System.}  Master's Thesis, Massachusetts Institute of Technology, 1988.

\bibitem{umlatt} Meyer, J\"{u}rgen.
{\em Acoustics and the Performance of Music.}  Frankfurt: Verlag Das
Musikinstrument, 1978.

\bibitem{Roe} Roederer, Juan G.
{\em Introduction to the Physics and Psychophysics of Music,} Second
Edition.  New York: Springer-Verlag, 1975.

\bibitem{Ross} Rossing, Thomas D.
 {\em The Science of Sound.}  Reading, Massachusetts: Addison-Wesley
Publishing Company, 1982.


\end{thebibliography}


\end{document}

Alternate titlepage:

\begin{titlepage}

\begin{Large}

{\bf Teaching a Computer to Recognize Musical Instruments}

by

Stephen M. Malinak

\end{Large}

\vspace{1em}

Submitted to the Department of Electrical Engineering and Computer Science

In Partial Fulfillment of the Requirements for the Degree of

Bachelor of Science in Electrical Science and Engineering

at the Massachusetts Institute of Technology

May 1989

\copyright Stephen M. Malinak 1989


The author hereby grants to M.I.T. permission to reproduce \\
and to distribute copies of this thesis document in whole or in part.


\begin{flushright}
Author \hrulefill\\
Department of Electrical Engineering and Computer Science\\
May 14, 1989

Certified by \hrulefill\\
Campbell L. Searle\\
Thesis Supervisor

Accepted by \hrulefill\\
Leonard A. Gould\\
Chairman, Department Committee on Undergraduate Theses
\end{flushright}
\end{titlepage}


two.tex:
\chapter{Data Analysis}

\section{The notes}

A study of musical timbre based on a digital bandpass filter system
requires a set of digitized musical recordings.  The ideal data
would be a collection of notes played on only one instrument at a time
and separated by rests.  Furthermore, availability of the same notes
on several instruments allows direct comparisions of color for the
same fundamental tone.

	\subsection{Recording}

Fortunately, Ken Malsky needed precisely the same kinds of data when
he conducted a psychoacoustics experiment in 1987.  He could not find
an appropriate collection of notes, so he and Peter Andrews recorded
and digitized their own set.  These recordings contain arpeggios of
ten orchestral instruments: clarinet, flute, French horn, trombone,
trumpet, alto sax, tenor sax, violin, viola, and cello.  The arpeggios
are in the key of C and cover two octaves of each instrument's range.
The recordings have a sampling rate of 32 kHz, so they contain energy
up to about 16 kHz.  These recordings provided the data used in this
study of musical timbre.

Because selecting instruments with completely different ranges would
make the identification problem too simple, this study focuses on five
instruments with similar ranges: two woodwinds (clarinet and flute)
and three brasses (French horn, trombone, and trumpet).  All four
pitches studied --- middle C (262 Hz), E (330 Hz), G (392 Hz), and C
(523 Hz) --- come from a single octave.  Some of these pitches,
however, are just outside the ranges of certain instruments.  There is
no middle C (262 Hz) for the flute, and there is no C (523 Hz) for the
French horn and trombone.  Thus, a total of seventeen notes from these
five instruments provide raw data for timbre analysis.


	\subsection{Editing}

Previous studies of musical timbre found that most of the information
about the identity of an instrument comes from the attack and
steady-state sections of notes.  Consequently, the raw notes are
edited down to the first 320 milliseconds.  Although risetimes vary
significantly, this duration generally includes the entire attack and
a section of the steady-state.  This study completely ignores the
decay.  The original digitized recordings contain all of the notes
from one instrument in a single file.  Visual examination determined
the precise location of each attack.  Each edited note contains
approximately 500 samples (16 ms) of background noise before the
attack begins.  A total of 20,000 samples (640 ms) are stored on disk
for each note, although nearly all of the analytical programs use only
the first 10,000 samples.  Each edited note is individually normalized
to make the peak absolute amplitude equal to one.  Figures 2-3 through
2-7 at the end of this chapter show time plots of representative notes
from each of the five instruments.


\section{Twelve-channel, wide bandwidth filter bank}

The edited notes are passed through a twelve-channel filter bank which
uses the widest bandwidth setting of Richard Kim's semi-linear filter
bank.  Set at such wide bandwidths, adjacent channels contain mostly
the same information for a given musical note.  Visual analysis of all
fifty channel outputs for a clarinet note showed that an
evenly-spaced selection of twelve channels contains almost as much
information as all fifty.  Figure 2-1 shows the frequency response of
the twelve-channel filter bank used in this experiment.  Kim based the
shape of the wide bandwidth filters on data from a study of cats'
auditory systems: \begin{quote} The high frequency skirt slope is very
high, up to several hundred dB per octave.  The lower frequency skirt
is much less steep. It is as low as 10 dB per octave \cite[page
18]{Kim}. \end{quote} Because of this asymmetry, the peak of a filter
is closer to the high end of the pass band.  Signals at frequencies
below the ``center frequency'' are attenuated less than those at
higher frequencies.  Table 2.1 presents simplified pass bands whose
endpoints are at these center frequencies.  The analysis programs use
this table to determine which channel contains the fundamental
frequency of a note.


\begin{figure}[p]
\vspace{4.5in}
\caption{Twelve-channel filter bank}
\end{figure}

\begin{table}[p]
\caption{Boundaries of the twelve channels}
\centering
\vspace{1em}
\begin{tabular}{|c|c|c||c|c|c||c|c|c|}
\hline
band & low & high & band & low & high & band & low & high\\
\hline \hline
1 &	0.0&      126.8&	5&	372.6&	522.5&	9&	1370&	1868\\
\hline
2 & 	126.9&  181.8&	6&	522.6&	726.0&	10&	1869&	2534\\
\hline
3 &	181.9&  262.1&	7&	726.1&	999.3&	11&	2535&	3432\\
\hline
4 &	262.2&  372.5&	8&	999.4&	1369&	12&	3433&	4597\\
\hline
\end{tabular}
\end{table}


The filtered signals do not contain any energy above 8 kHz.
Consequently, the filtered outputs are downsampled to 16 kHz (by
removing every other sample) to save disk space and allow faster
analysis.  Figures 2-3 through 2-7 show the outputs of the
twelve-channel filter bank for a representative note from each
instrument.  Table 2.2 shows how these notes and their harmonics fall
into the twelve channels of the bandpass filter model.

\begin{table}[p]
 \caption{Notes, harmonics, and channels}
\centering
\vspace{1em}
\begin{tabular}{|l||r|l|r|l|r|l|r|l|r|l|}
\hline
note & \multicolumn{2}{c|}{fund.} & \multicolumn{2}{c|}{2nd} & 
\multicolumn{2}{c|}{3rd} & \multicolumn{2}{c|}{4th} & 
\multicolumn{2}{c|}{5th}\\
\hline \hline
	C &	261.6 &3 &  523.3 &6 &  784.9 &7 &  1047 &8 &  1308 &8\\
\hline
	E &	329.6 &4 &  659.3 &6 &  988.9 &7 &  1319 &8 &  1648 &9\\
\hline
	G &	392.0 &5 &  784.0 &7 & 1176 &8 &  1568 &9 &  1960 &10\\
\hline
	C &	523.3 &6 & 1046.5 &8 & 1570 &9 &  2093 &10 & 2616 &11\\
\hline
\end{tabular}
\end{table}


\section{Methods for analyzing filtered notes}

	\subsection{Visual analysis}

The filtered notes provide much valuable information for
identification of the five instruments.  After studying filter bank
outputs such as those in Figures 2-3 through 2-7, a person can usually
determine which notes belong to which instruments.  Visual analysis of
these filter bank outputs reveals several timbral qualities found in
previous studies.  These characteristics include risetimes, onset
patterns, presence of harmonics, distribution of energy, and
inharmonic attack transients.


	\subsection{Computational analysis}

Getting useful data for computational analysis requires repeatedly
reworking measurement programs.  The values of various parameters
strongly depend on how these parameters are measured.  Consequently,
the measurement techniques described below resulted from several
generations of trial programs.  These techniques provide the most
distinct and reliable contrasts between instruments while retaining
the most consistency among the notes of a single instrument.

Pitch is one of the most important characteristics of a note.  The
subroutine {\tt findpitch} looks at the raw note's 20,000 samples to
determine its fundamental frequency.  This subroutine counts the
number of samples between consecutive positive peaks and examines the
resulting array of interpeak distances.  The fundamental period is the
interpeak distance that occurs most often (the mode).  The subroutine
attempts to ignore harmonics and noise by requiring each new peak to
be greater than a threshold of 90\% of the previous peak.  The
following formula determines the fundamental frequency:
\[frequency\;(H \! z) = \frac{32,000\;(samples/sec)}{period\;(samples)}\]
Another subroutine, {\tt pickband}, determines which of the twelve
bands in Table 2.1 contains this fundamental frequency.

Finding the harmonics present in a signal requires more complicated
analysis.  A modified version of the {\tt findfund} routine with a
different threshold failed because of the phase of the harmonics; for
instance, the second harmonic often peaks somewhere other than the
middle of the fundamental period.  The final version of the harmonic
detector first tracks the fundamental: it finds a peak, looks for
another peak within the fundamental period plus-or-minus two samples,
then looks for the next peak, etc.  If the chain of peaks breaks
before the end of the signal, the wrong starting peak has been
selected ({\em i.e.,} it is not part of the fundamental chain), so the
algorithm starts with the peak immediately following its original
starting point and tries to track again.  This tracking method worked
on all of the bands in which the fundamental period dominates --- the
three bands above the fundamental band.  The {\tt findharm} subroutine
tallies the number of maxima between each pair of fundamental peaks.
The harmonic of the signal is defined to be the most often repeated
number of maxima per period.

Determining the risetime of the envelope of a filter bank output
presents a much more difficult problem than harmonic detection.  These
envelopes are generally not simply rising exponentials.  Some
envelopes steadily increase during the entire period under study.
Other envelopes undulate throughout the ``steady-state''.  For
instance, the ``risetime'' of the trumpet note in Figure 2-2 is quite
debatable.  The filter bank outputs for this 262 Hz trumpet note
appear in Figure 2-7.  Note that the fundamental 262 Hz channel
finishes its rise at 30 ms, but the amplitudes of the higher channels
grow tremendously around 260 ms.  Envelope modulations such as these
tend to confuse computer programs.  Ideally, a decision tree would
allow different criteria to be applied to different types of rises.
In this study, however, the risetime of a signal was defined to be the
time between the first occurrence of 20\% of the peak value and the
first occurence of 80\% of the peak value.  These thresholds are less
susceptible than traditional 10\% -- 90\% endpoints to noise and
signal variability.

\begin{figure}[t]
\vspace*{2.75in}
\caption{What is the risetime of this note?}
\end{figure}

Good risetime measurements should make harmonic onset patterns trivial
to find.  The onset point of a filter bank output channel is merely
the number of the sample at which the signal's rise starts.  In this
analysis the ``onset point'' was the first occurence of 20\% of the
signal's maximum.

Like the risetime, the energy of a signal can be measured several
ways.  Two alternatives were studied: (1) finding the short-term
root-mean-squared ({\sc rms}) value of the signal, or (2) finding the
signal's maximum.  An {\sc rms} measurement of a steady-state waveform
would present the ideal way to measure energy, but several of the
notes studied have fluctuating ``steady-states''.  The amplitude
fluctuations lead to significantly different short-term {\sc rms}
values in different sections of the steady-state.  The short-term {\sc
rms} could be used if it were measured in several sections of the
signal and analyzed in some way.  The peak value of a signal, however,
gives a simpler indication of the relative energies of the channels.
In all of the notes that have been tested, measuring the peak values
of the twelve channels correctly determines whether the note comes
from a trumpet, a French horn, or something else.

Finally, high frequency inharmonic attack transients are detected on a
present/not present basis.  A subroutine checks each channel above the
fundamental for bursts of samples exceeding 10\% of the signal's
maximum value.  Early attempts at burst detection had found the
inharmonic transients but also found the fundamental blips present in
the rise of some French horn notes.  To enable detection of inharmonic
woodwind transients and suppression of brass blips, the inharmonic
burst criterion only includes bursts which start more than 550 and
finish more than 250 samples before the fundamental's onset point.


\section{Conclusions of analyses}

Some characteristics prove to be immediately useful for timbre
recognition based on a single note.  The other characteristics apply
better to a statistical study of several notes from the same
instrument.  A ``useful'' characteristic is defined here to be a
quality of a note which places it into one of two mutually exclusive
groups; for instance, only clarinets have no second harmonic.
Characteristics good for statistical analysis do not individually lead
to definite conclusions; instead, they increase or decrease the
probability of selecting an instrument. Tables 2.3 and 2.4 at the end
of this chapter summarize measurements of characteristics from the
seventeen notes.  The next chapter explains the scores in these
tables.

	\subsection{Immediately useful characteristics}

		\subsubsection{Inharmonic attack transients}

Only woodwind notes contain high frequency inharmonic sputterings
before the start of the tone.  These bursts look like ``noise'' at the
start of the raw notes in Figures 2-3 and 2-4.  In the filtered
waveforms, the bursts appear in the high frequency channels (above 523
Hz), and they occur before or during the rise of the signal in the
fundamental channel.  The flute bursts tend to be longer and more
energetic than the clarinet bursts.  This burst characteristic
provides an excellent way to separate the instruments into two
categories: woodwinds (clarinet and flute) and brass (French horn,
trombone, and trumpet).

		\subsubsection{Presence of harmonics}

As expected, all of the clarinet notes lack a second harmonic.  All of
the notes from the other instruments have a second harmonic in at least
one of the three channels above the fundamental channel.

		\subsubsection{Distribution of energy}

The energy characteristics allow easy differentiation of the brasses.
French horns have very little energy in the high frequency channels.
Channel 10 (2534 Hz center frequency) supplies the French horn
criterion because the French horn's energy differs more from other
instruments' energies in this particular frequency range than in any
other.  The trumpet, on the other hand, has most of its energy at high
frequencies.  For notes in the octave studied, the trumpet has far
more energy in channel 12 (4597 Hz center frequency) than any other
instrument.


	\subsection{Characteristics suitable for statistical analysis}

		\subsubsection{Risetimes}

Measuring risetimes can assist recognition of instruments, but
risetimes of notes from the same instrument can vary significantly.
Furthermore, the risetime is not well defined for several of the notes
studied.  Risetimes of a trumpet and a clarinet differ greatly, but
unsophisticated risetime measurements could place a wavering French
horn into either category.  Risetimes have provided an important
criterion in other studies of musical timbre, and they could certainly
aid classification of a group of notes.  Studying many more notes
to determine probabilistic distributions would make the risetime
measurement a more reliable way to identify which instrument produced
a single note.


		\subsubsection{Onset patterns}

The clarinet shows delayed entry of harmonics, while the trumpet
produces energy at all frequencies instantaneously.  Unfortunately,
the wide bandwidth filters allow the fundamental to obscure the rise
of some of the harmonics.  The filter bank outputs for the clarinet
note in Figure 2-3 demonstrate the problem of obscured onset patterns.
Teaching a computer to see one waveform rising inside another presents
more difficulties than merely determining the outer risetime.
Adaptive filters which start at narrow bandwidths and widen as the
stimulus intensity rises would help to eliminate the problem of
obscured onsets.

		\subsubsection{Preliminary impulses}

Most of the brass notes show preliminary impulses in the filter bank
outputs.  These impulses generate discontinuities in the slope of a
signal's envelope.  The strength and shape of the impulses differ from
one note to the next.


		\subsubsection{Ranges}

Finding the pitch range of a collection of notes from a single
instrument would assist classification of instruments with different
ranges.  In this single octave study, however, ranges of all
instruments are the same except for a few missing high and low notes.


\clearpage

\begin{figure}[b]

\vspace*{7.5in}

	\caption{Clarinet note: C 262 Hz}

\end{figure}

\begin{figure}[b]

\vspace*{7.5in}

	\caption{Flute note: G 392 Hz}

\end{figure}

\begin{figure}[b]

\vspace*{7.5in}

	\caption{French horn note: C 262 Hz}

\end{figure}

\begin{figure}[b]

\vspace*{7.5in}

	\caption{Trombone note: C 262 Hz}

\end{figure}

\begin{figure}[b]

\vspace*{7.5in}

	\caption{Trumpet note: C 262 Hz}

\end{figure}


\clearpage

\begin{table}[p]

\caption{Woodwind data}

\begin{large}

\centering

\vspace{1em}

\begin{tabular}{|l||l|l|l|l||l|l|l|}
\hline
\multicolumn{8}{|c|}{{\sc characteristics}}\\
\hline
instrument &	   \multicolumn{4}{c||}{Clarinet} &
		   \multicolumn{3}{c|}{Flute}\\
\hline
raw note &         C 262 & 	E 330 &		G 392 &		C 523 &
				E 330 & 	G 392 & 	C 523\\
\hline \hline
fundamental &	   262.3 &	326.5 &		395.1 &		524.6 &
				329.9 &		395.1 &		524.6\\
\hline
bursts? &	   yes &	yes &		yes &		yes &
				yes &		yes &		yes\\
\hline
harmonic[1] &	   3 &		1 &		1 &		1 &
				2 &		2 &		2\\
\hline
harmonic[2] &	   3 &		1 &		3 &		3 &
				2 &		2 &		2\\
\hline
harmonic[3] &	   3 &		3 &		4 &		3 &
				2 &		3 &		3\\
\hline
channel 10 max &   0.1250 &	0.1378 &	0.1071 &	0.0925 &
				0.1382 &	0.1619 &	0.0727\\
\hline
channel 12 max &   0.0286 &	0.0253 &	0.0220 &	0.0152 &
				0.0352 &	0.0807 &	0.0214\\
\hline \hline
\multicolumn{8}{|c|}{{\sc scores}}\\
\hline
Clarinet &	   45 $\ast\ast$ &	45 $\ast\ast$ &		45 $\ast\ast$ &		45 $\ast\ast$ &
				25 &		25 &		25\\
\hline
Flute & 	   25 &		25 &		25 &		25 &
				30 $\ast\ast$ &		30 $\ast\ast$ &		30 $\ast\ast$\\
\hline
French horn &	   5 &		5 &		5 &		5 &
				10 &		10 &		10\\
\hline
Trombone &	   10 &		10 &		10 &		10 &
				15 &		15 &		15\\
\hline
Trumpet &	   5 &		5 &		5 &		5 &
				10 &		10 &		10\\
\hline
\end{tabular}

\end{large}

\end{table}


\begin{table}[p]

	\caption{Brass data}

\begin{large}

\centering

\vspace{1em}

\begin{tabular}{|l||l|l|l||l|l|l|}
\hline
\multicolumn{7}{|c|}{{\sc characteristics}}\\
\hline
instrument &	   \multicolumn{3}{c||}{French Horn} &
		   \multicolumn{3}{c|}{Trombone}\\
\hline
raw note &      	C 262 &		E 330 &		G 392 &
	      		C 262 & 	E 330 & 	G 392\\
\hline \hline
fundamental &	   	260.2 &		333.3 &		395.1 &
			262.3 &		329.9 &		400.0\\
\hline
bursts? &	   	no &		no &		no &
			no &		no &		no\\
\hline
harmonic[1] &	   	2 &		1 &		1 &
			2 &		2 &		2\\
\hline
harmonic[2] &		2 &		2 &		2 &
			3 &		2 &		2\\
\hline
harmonic[3] &	   	2 &		3 &		3 &
			3 &		3 &		2\\
\hline
channel 10 max &	0.0194 &	0.0234 &	0.0200 &
			0.1232 &	0.1150 &	0.1155\\
\hline
channel 12 max &	0.0044 &	0.0036 &	0.0052 &
			0.0214 &	0.0204 &	0.0198\\
\hline \hline
\multicolumn{7}{|c|}{{\sc scores}}\\
\hline
Clarinet &		5 &		5 &		5 &
			10 &		10 &		10\\
\hline
Flute & 		10 &		10 &		10 &
			15 &		15 &		15\\
\hline
French horn &		40 $\ast\ast$ &		40 $\ast\ast$ &		40 $\ast\ast$ &
			20 &		20 &		20\\
\hline
Trombone &		20 &		20 &		20 &
			25 $\ast\ast$ &		25 $\ast\ast$ &		25 $\ast\ast$\\
\hline
Trumpet &		15 &		15 &		15 &
			20 &		20 &		20\\
\hline
\end{tabular}

\vspace{3em}

\begin{tabular}{|l||l|l|l|l|}
\hline
\multicolumn{5}{|c|}{{\sc characteristics}}\\
\hline
instrument &	   \multicolumn{4}{c|}{Trumpet}\\
\hline
raw note &         C 262 & 	E 330 & 	G 392 & 	C 523\\
\hline \hline
fundamental &	   260.2 &	329.9 &		390.2 &		524.6\\
\hline
bursts? &	   no & 	no &		no &		no\\
\hline
harmonic[1] &	   2 &		2 &		2 &		2\\
\hline
harmonic[2] &	   2 &		2 &		3 &		3\\
\hline
harmonic[3] &	   3 &		3 &		4 &		4\\
\hline
channel 10 max &   0.6825 &	0.9500 &	0.9162 &	0.7664\\
\hline
channel 12 max &   0.3457 &	0.3864 &	0.3404 &	0.2512\\
\hline \hline
\multicolumn{5}{|c|}{{\sc scores}}\\
\hline
Clarinet &	   5 &  	5 &		5 &		5\\
\hline
Flute & 	   10 &		10 &		10 &		10\\
\hline
French horn &	   15 &		15 &		15 &		15\\
\hline
Trombone &	   20 &		20 &		20 &		20\\
\hline
Trumpet &	   40 $\ast\ast$ &	40 $\ast\ast$ &		40 $\ast\ast$ &		40 $\ast\ast$\\
\hline
\end{tabular}

\end{large}

\end{table}


xin:
attach sipb
xclock -analog -rv =120x120-4-4 &
xterm =80x24-20+50 &
xterm =80x24-20+400 &
xload =240x120-128-4 &