-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathSF2_00A (2).TXT
3589 lines (2846 loc) · 155 KB
/
SF2_00A (2).TXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
SoundFon 2.00a Technical Specification - Page 1 - Printed 26/05/97 at
11.16
Sound Font Technical Specification
Version 2.00a
October 18, 1995
0 About This Document
0.1 Revision History
Revision Issue Date Comments
2.00a 10/18/95 First publicly released draft
0.2 Disclaimers
THIS SPECIFICATION IS PROVIDED OAS ISO WITH NO WARRANTIES WHATSOEVER
INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR
PURPOSE, OR ANY WARRANTEE OTHERWISE ARISING OUT OF ANY PROPOSAL,
SPECIFICATION, OR SAMPLE.
A LICENSE IS HEREBY GRANTED TO COPY, REPRODUCE, AND DISTRIBUTE THIS
SPECIFICATION FOR INTERNAL USE ONLY. NO OTHER LICENSE EXPRESS OR
IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY OTHER INTELLECTUAL PROPERTY
RIGHTS IS GRANTED OR INTENDED HEREBY.
AUTHORS OF THIS SPECIFICATION DISCLAIM ALL LIABILITY, INCLUDING
LIABILITY FOR INFRINGEMENT OF PROPPIETARY RIGHTS, RELATING TO
IMPLEMENTATION OF INFORMATION IN THIS SPECIFICATION. AUTHORS OF THIS
SPECIFICATION ALSO DO NOT WARRANT OR REPRESENT THAT SUCH
IMPLEMENTATION(S) WILL NOT INFRINGE ON SUCH RIGHTS.
This preliminary document is being distributed solely for the purpose
of review and solicitation of comments. It will be updated
periodically. No products should rely on the content of this version
of the document.
SoundFont is a registered trademark of E-mu Systems, Inc. E-mu
Systems licenses a OSoundFont CompatibilityO logo for a nominal fee;
please contact E-muOs SoundFont administrator by FAX at (408) 439-0392
for more information. Users of the information contained herein
should refer to files conforming to the specification as OSoundFont
Compatible,O with appropriate acknowledgement of trademark ownership.
0.3 Comments
Please send comments via e-mail to [email protected]
0.4 Table of Contents
0 About This Document
0.1 Revision History
0.2 Disclaimers
0.3 Comments
0.4 Table of Contents
0.5 Illustrations
1 Introduction
1.1 Scope and Intended Purpose of this Document
1.2 Document Organization
1.3 SoundFont 2 Objectives
1.4 SoundFont 1.x
1.5 Future Enhancements to the SoundFont 2 Standard
2 Terms and Abbreviations
2.1 Data Structure Terminology
2.2 Synthesizer Terminology
2.3 Parameter Terminology
3 RIFF Structure
3.1 General RIFF File Structure
3.2 The SoundFont 2 Chunks and Subchunks
3.3 Redundancy and Error Handling in the RIFF structure
4 SoundFont 2 RIFF File Format
4.1 SoundFont 2 RIFF File Format Level 0
4.2 SoundFont 2 RIFF File Format Level 1
4.3 SoundFont 2 RIFF File Format Level 2
4.4 SoundFont 2 RIFF File Format Level 3
4.5 SoundFont 2 RIFF File Format Type Definitions
5 The INFO-list Chunk
5.1 The ifil Subchunk
5.2 The isng Subchunk
5.3 The INAM Subchunk
5.4 The irom Subchunk
5.5 The iver Subchunk
5.6 The ICRD Subchunk
5.7 The IENG Subchunk
5.8 The IPRD Subchunk
5.9 The ICOP Subchunk
5.10 The ICMT Subchunk
5.11 The ISFT Subchunk
6 The sdta-list Chunk
6.1 Sample Data Format in the smpl Subchunk
6.2 Sample Data Looping Rules
7 The pdta-list Chunk
7.1 The HYDRA Data Structure
7.2 The PHDR Subchunk
7.3 The PBAG Subchunk
7.4 The PMOD Subchunk
7.5 The PGEN Subchunk
7.6 The INST Subchunk
7.7 The IBAG Subchunk
7.8 The IMOD Subchunk
7.9 The IGEN Subchunk
7.10 The SHDR Subchunk
8 Enumerators
8.1 Generator Enumerators
8.1.1 Kinds of Generator Enumerators
8.1.2 Generator Enumerators Defined
8.1.3 Generator Summary
8.2 Source Enumerators
8.3 Transform Enumerators
8.4 Default Modulators
8.5 Precedence and Absolute and Relative values.
9 Parameters and Synthesis Model
9.1 Synthesis Model
9.1.1 Wavetable Oscillator
9.1.2 Sample Looping
9.1.3 Lowpass Filter
9.1.4 Final Gain Amplifier
9.1.5 Effects Sends
9.1.6 Low Frequency Oscillators
9.1.7 Envelope Generators
9.1.8 Modulation Interconnection Summary
9.2 MIDI Functions
9.3 Parameter Units
9.4 On Implementation Accuracy
10 Error Handling
10.1 Structural Errors
10.2 Unknown Chunks
10.3 Unknown Enumerators
10.4 Illegal Parameter Values
10.5 Unusual Values
10.6 Missing Required Parameter or Terminator
10.7 Illegal enumerator
11 Silicon SoundFonts
12 Glossary
0.5 Illustrations
Figure 1 - Ideal Filter Response Section 9.1.3
Figure 2 - Modulation Structure Section 9.1.8
1 Introduction
1.1 Scope and Intended Purpose of this Document
This document is the definitive source for the SoundFont 2 standard.
This document should provide complete and accurate information to
allow any user to correctly construct and interpret SoundFont 2
compatible banks. This document is not intended to provide any
information on the design or implementation of music synthesizers.
1.2 Document Organization
This document is organized such that sections 1 and 2 give
introductory information about the SoundFont 2 standard. Both new and
seasoned musical engineers will get value from the review of
terminology provided in section 2. Sections 3 through 8 provide
increasingly detailed descriptions of the SoundFont 2 standard data
structures. The sections will ultimately serve as reference, but can
be scanned in order to provide sufficient detail for any level of
understanding. Section 9 deals with the Synthesis model supported by
the SoundFont standard, and will be of interest to anyone involved
with the synthesis engine or bank creation. Section 10 specifies
error handling when dealing with SoundFont compatible banks, and will
be of interest primarily to programmers using the SoundFont standard.
The alphabetical glossary in section 11 can be used as a reference for
any unfamiliar or confusing terminology.
1.3 SoundFont 2 Objectives
The SoundFont 2 standard is intended to provide an extensible,
portable, universal interchange format for wavetable synthesizer
OsamplesO and articulation data. The standard is made extensible
largely by the use of enumerated OgeneratorsO and OmodulatorsO so that
additional function units can be added as requirements dictate. The
standard is made portable and universal by the use of precisely
defined and hardware independent parameters, as well as by specific
practices designed to provide support to a broad range of
technologies.
1.4 SoundFont 1.x
The SoundFont standard was originally released in its 1.0 embodiment
with the Creative Technology AWE32 product using the EMU8000 music
synthesis chip. This proprietary format proved very successful, but
experience brought a number of refinements. These initially were
performed in an upward compatible manner to revision 1.5.
However, due to increasing demand for a public downloadable sound
interchange format, Creative Technology determined that a public
disclosure of the SoundFont format would be in its best interest.
Because there were still more improvements required, many of which
could not be supported in a completely compatible manner, Creative
decided to combine public disclosure with the step to a revised
format. The result is the SoundFont 2 standard.
There are several key enhancements contained in the SoundFont 2
standard. The first is the use of relative parameters in the Preset
level. This allows instruments to be adjusted without altering their
self-consistency, providing easy and effective user editing of
instruments. The second is an improvement in the data structures
associated with the samples themselves, again providing key
information which will allow the sound designer to re-use samples with
a minimum of difficulty. An increased specificity in the rules for
sample data produces enhanced portability across various sound
engines. Finally, the addition of modulators produces a robust
structure which can express all the typical function in current and
future wavetable synthesizers.
1.5 Future Enhancements to the SoundFont 2 Standard
The SoundFont 2 standard is designed to allow for enhancements based
on future wavetable synthesis technology capabilities by additional
enumerations of generators and modulators. This will be done as
required in an upwardly compatible manner. Suggestions for additions
can be made via e-mail to [email protected]. In general, our policy
for updating the specification will be based on consumer need, rather
than technological idealism.
It is our expectation to maintain bidirectional compatibility within
the SoundFont 2 standard for some years.
2 Terms and Abbreviations
The following sections introduce terms used within this specification
in a logical order. They are provided both as an introduction to
readers unfamiliar with wavetable synthesis implementation details, as
well as a review and reference for the expert. These and other terms
and abbreviations can also be found arranged alphabetically for
reference in the glossary at the end of this specification.
2.1 Data Structure Terminology
bag - A SoundFont data structure element containing a list of layers
(preset bag) or splits (instrument bag).
big endian - Refers to the organization in memory of bytes within a
word such that the most significant byte occurs at the lowest address.
Contrast Olittle endian.O
byte - A data structure element of eight bits without definition of
meaning to those bits.
BYTE - A data structure element of eight bits which contains an
unsigned value from 0 to 255.
case-insensitive - Indicates that an ASCII character or string treats
alphabetic characters of upper or lower case as identical. Contrast
Ocase-sensitive.O
case-sensitive - Indicates that an ASCII character or string treats
alphabetic characters of upper or lower case as distinct. Contrast
Ocase-insensitive.O
CHAR - A data structure of eight bits which contains a signed value
from -128 to +127.
chunk - The top level division of a RIFF file.
doubleword - A data structure element of 32 bits without definition of
meaning to those bits.
DWORD - A data structure of 32 bits which contains an unsigned value
from zero to 4,294,967,295.
enumerated - Said of a data element whose symbols correspond to
particular assigned functions.
global - Refers to parameters which affect all associated structures.
See Oglobal layerO and Oglobal split.O
global layer - A layer whose generators and modulators affect all
other layers within the preset.
global split - A split whose generators and modulators affect all
other splits within the instrument.
header - A data structure element which describes several aspects of a
SoundFont element.
hydra - A. A nine-headed mythical beast. B. The nine OpdtaO
subchunks which make up the SoundFont articulation data.
instrument - In the SoundFont standard, a collection of splits which
represents the sound of a single musical instrument or sound effect
set.
instrument split - A sample and associated articulation data defined
to play over certain key numbers and velocities. Also simply called a
split.
layer - A subset of a preset containing generators, modulators, and an
instrument. Also termed "preset layer."
level - In the SoundFont structure, this refers either to the preset
and layers (the preset level) or the instrument and splits (the
instrument level).
little endian - A method of ordering bytes within larger words in
memory in which the least significant byte is at the lowest address.
Contrast "big endian."
orphan - Said of a data structure which under normal circumstances is
referenced by a higher level, but in this particular instance is no
longer linked. Specifically, it is an instrument which is not
referenced by any preset layer, or a sample which is not referenced by
any instrument split.
preset - A keyboard full of sound. Typically the collection of
samples and articulation data associated with a particular MIDI preset
number.
preset layer - A subset of a preset containing generators, modulators,
and an instrument. Also simply termed a "layer."
record - A single instrance of a data structure.
RIFF - Acronym for Resource Interchange File Format. The recommended
form for interchange files such as SoundFont compatible files within
Microsoft operating systems.
SHORT - A data structure element of sixteen bits which contains a
signed value from -32,768 to +32,767.
split - A sample and associated articulation data defined to play over
certain key numbers and velocities. Also called an instrument split.
subchunk - A division of a RIFF file below that of the chunk.
terminator - A data structure element indicating the final element in
a sequence.
WORD - A data structure of 16 bits which contains an unsigned value
from zero to 65,535.
word - A data structure element of 16 bits without definition of
meaning to those bits.
2.2 Synthesizer Terminology
articulation - The process of modulation of amplitude, pitch, and
timbre to produce an expressive musical note.
artifact - A (typically undesirable) sonic event which is recognizable
as not being present in the original sound.
attack - That phase of an envelope or sound during which the amplitude
increases from zero to a peak value.
attenuation - A decrease in volume or amplitude of a signal.
AWE32 - The original Creative Technology Sound Blaster product which
contained an EMU8000 wavetable synthesizer and supported the SoundFont
standard.
balance - A form of stereo volume control in which both left and right
channels are at maximum when the control is centered, and which
attenuates only the opposite channel when taken to either extreme.
bank - A collection of presets. See also MIDI bank.
chorus - An effects processing algorithm which involves cyclically
shifting the pitch of a signal and remixing it with itself to produce
a time varying comb filter, giving a perception of motion and fullness
to the resulting sound.
cutoff frequency - The frequency of a filter function at which the
attenuation reaches a specified value.
data points - The individual values comprising a sample. Sometimes
also called sample points. Contrast Osample.O
decay - The portion of an envelope or sound during which the amplitude
declines from a peak to steady state value.
delay - The portion of an envelope or LFO function which elapses from
a key-on event until the amplitude becomes non-zero.
DC gain - The degree of amplification or attentuation a system
presents to a static or zero frequency signal.
digital audio - Audio represented as a sequence of quantized values
spaced evenly over time. The values are called Osample data points.O
downloadable - Said of samples which are loaded from a file into RAM,
in contrast to samples which are maintained in ROM.
dry - Refers to audio which has not received any effects processing
such as reverb or chorus.
EMU8000 - A wavetable synthesizer chip designed by E-mu Systems for
use in Creative Technology products.
envelope - A time varying signal which typically controls the pitch,
volume, and/or filter cutoff frequency of a note, and comprises
multiple phases including attack, decay, sustain, and release.
flat - A. Said of a tone that is lower in pitch than another
reference tone. B. Said of a frequency response that does not
deviate significantly from a single fixed gain over the audio range.
interpolator - A circuit or algorithm which computes intermediate
points between existing sample data points. This is of particular use
in the pitch shifting operation of a wavetable synthesizer, in which
these intermediate points represent the output samples of the waveform
at the desired pitch transposition.
key number - See MIDI key number.
LFO - Acronym for Low Frequency Oscillator. A slow periodic
modulation source.
linear coding - The most common method of encoding amplitudes in
digital audio in which each step is of equal size.
loop - In wavetable synthesis, a portion of a sample which is repeated
many times to increase the duration of the resulting sound.
loop points - The sample data points at which a loop begins and ends.
lowpass - Said of a filter which attenuates high frequencies but does
not attenuate low frequencies.
MIDI - Acronym for Musical Instrument Digital Interface. The standard
protocol for sending performance information to a musical synthesizer.
MIDI bank - A group of up to 128 presets selected by a MIDI "change
bank" command.
MIDI continuous controller - A construct in the MIDI protocol.
MIDI key number - A construct in the MIDI protocol which accompanies a
MIDI key-on or key-off command and specifies the key of the musical
instrument keyboard to which the command refers.
MIDI pitch bend - A special MIDI construct akin to the MIDI continuous
controllers which controls the realtime value of the pitch of all
notes played in a MIDI channel.
MIDI preset - A "preset" selected to be active in a particular MIDI
channel by a MIDI "change preset" command.
MIDI velocity - A construct in the MIDI protocol which accompanies a
MIDI key-on or key-off command and specifies the speed with which the
key was pressed or released.
mono - Short for "monophonic." Indicates a sound comprising only one
channel or waveform. Contrast with "stereo."
oscillator - In wavetable synthesis, the wavetable interpolator is
considered an oscillator.
pan - Short for "panorama." This is the control of the apparent
azimuth of a sound source over 180 degrees from left to right. It is
generally implemented by varying the volume at the left and right
speakers.
pitch - The perceived value of frequency. Generally can be used
interchangably with frequency.
pitch shift - A change in pitch. Wavetable synthesis relies on
interpolators to cause pitch shift in a sample to produce the notes of
the scale.
pole - A mathematical term used in filter transform analysis.
Traditionally in synthesis, a pole is equated with a rolloff of 6dB
per octave, and the rolloff of a filter is specified in "poles."
Preditor - E-mu Systems' proprietary SoundFont 2.00 compatible bank
editing software.
preset - A keyboard full of sound. Typically the collection of
samples and articulation data associated with a particular MIDI preset
number.
Q - A mathematical term used in filter transform analysis. Indicates
the degree of resonance of the filter. In synthesis terminology, it
is synonymous with resonance.
release - The portion of an envelope or sound during which the
amplitude declines from a steady state to zero value or inaudibility.
resonance - Describes the aspect of a filter in which particular
frequencies are given significantly more gain than others. The
resonance can be measured in dB above the DC gain.
resonant frequency - The frequency at which resonance reaches its
maximum.
reverb - Short for reverberation. In synthesis, a synthetic signal
processor which adds artificial spaciousness and ambience to a sound.
sample - This term is often used both to indicate a "sample data
point" and to indicate a collection of such points comprising a
digital audio waveform. The latter meaning is exclusively used in
this specification.
soft - The pedal on a piano, so named because it causes the damper to
be lowered in such a way as to soften the timbre and loudness of the
notes. In MIDI, continuous controller #66 which behaves in a similar
manner.
sostenudo - The pedal on a piano which causes the dampers on all keys
depressed to be held until the pedal is released. In MIDI, continuous
controller #67 which behaves in a similar manner.
sustain - The pedal on a piano which prevents all dampers on keys as
they are depressed from being released. In MIDI, continuous
controller #64 which behaves in a similar manner.
SoundFont - A registered trademark of E-mu Systems, Inc, indicating
files produced by E-mu which conform to the SoundFont standard file
format.
stereo - Literally indicating three dimensions. In this
specification, the term is used to mean two channel stereophonic,
indicating that the sound is composed of two independent audio
channels, dubbed left and right. Constrast monophonic.
synthesis engine - The hardware and software associated with the
signal processing and modulation path for a particular synthesizer.
synthesizer - A device capable of producing ideally arbitrary musical
sound.
tremolo - A periodic change in amplitude of a sound, typically
produced by applying a low frequency oscillator to the final volume
amplifier.
triangular - A waveform which ramps upward to a positive limit, then
downward at the opposite slope to the symmetrically negative limit
periodically.
unpitched - Said of a sound which is not characterized by a perceived
frequency. This would be true of noise-like musical instruments and
of many sound effects.
velocity - In synthesis, the speed with which a keyboard key is
depressed, typically proportionally to the impact delivered by the
musician. See also MIDI velocity.
vibrato - A periodic change in the pitch of a sound, typically
produced by applying a low frequency oscillator to the oscillator
pitch.
volume - The loudness or amplitude of a sound, or the control of this
parameter.
wavetable - A music synthesis technique wherein musical sounds are
recorded or computed mathematically and stored in a memory, then
played back at a variable rate to produce the desired pitch.
Additional timbral adjustments are often made to the sound thus
produced using amplifiers, filters, and effect processing such as
reverb and chorus.
2.3 Parameter Terminology
absolute - Describes a parameter which gives a definitive real-world
value. Contrast to relative.
additive - Describes a parameter which is to be numerically added to
another parameter.
attenuation - A decrease in volume or amplitude of a signal.
cent - A unit of pitch ratio corresponding to the twelve hundredth
root of two, or one hundredth of a semitone, approximately
1.000577790.
centibel - A unit of amplitude ratio corresponding to the two
hundredth root of ten, or one tenth of a decibel, approximately
1.011579454.
cutoff frequency - The frequency of a filter function at which the
attenuation reaches a specified value.
decibel - A unit of amplitude ratio corresponding to the twentieth
root of ten, approximately 1.122018454.
octave - A factor of two in ratio, typically applied to pitch or
frequency.
pitch - The perceived value of frequency. Generally can be used
interchangably with frequency.
pitch shift - A change in pitch. Wavetable synthesis relies on
interpolators to cause pitch shift in a sample to produce the notes of
the scale.
relative - Describes a parameter which merely indicates an offset from
an otherwise established value. Contrast to absolute.
resonance - Describes the aspect of a filter in which particular
frequencies are given significantly more gain than others. The
resonance can be measured in dB above the DC gain.
sample rate - The frequency, in Hertz, at which sample data points are
taken when recording a sample.
semitone - A unit of pitch ratio corresponding to the twelfth root of
two, or one twelfth of an octave, approximately 1.059463094.
sharp - Said of a tone that is higher in pitch than another reference
tone.
timecent - A unit of duration ratio corresponding to the twelve
hundredth root of two, or one twelve hundredth of an octave,
approximately 1.000577790.
3 RIFF Structure
3.1 General RIFF File Structure
The RIFF (Resource Interchange File Format) is a tagged file structure
developed for multimedia resource files, and is described in some
detail in the Microsoft Windows 3.1 SDK Multimedia ProgrammerOs
Reference. the Tagged-file structure is useful because it helps
prevent compatibility problems which can occur as the file definition
changes over time. Because each piece of data in the file is
identified by a standard header, an application that does not
recognize a given data element can skip over the unknown information.
A RIFF file is constructed from a basic building block called a
Ochunk.O In OCO syntax, a chunk is defined:
typedef DWORD FOURCC; // Four-character code
typedef struct {
FOURCC ckID; // A chunk ID identifies the type of
data within the chunk.
DWORD ckSize; // The size of the chunk data in bytes,
excluding any pad byte.
BYTE ckDATA[ckSize]; // The actual data plus a pad byte
if reqOd to word align.
};
Two types of chunks, the ORIFFO and OLISTO chunks, may contain nested
chunks called subchunks as their data.
The ordering requirements of chunks and subchunks within a RIFF file
is not well documented in the RIFF file format. In SoundFont 2.0, the
order of the subchunks withing the INFO chunk is arbitrary, but for
consistency it is recommended that the subchunks be ordered as
presented in this document. The order of the all other chunks and
subchunks is strictly defined and must be maintained as presented in
this document.
3.2 The SoundFont 2 Chunks and Subchunks
A SoundFont 2 compatible RIFF file comprises three chunks: an INFO-
list chunk containing a number of required and optional subchunks
describing the file, its history, and its intended use, an sdta-list
chunk comprising a single subchunk containing any referenced digital
audio samples, and a pdta-list chunk containing nine subchunks which
define the articulation of the digital audio data.
The SoundFont 2 standard allows that the subchunks within the INFO-
list chunk may appear in arbitrary order. However, the order of the
three chunks, and the order of the subchunks within the pdta-list
chunk, is fixed.
The SoundFont 2 specification requires that implementations ignore
unknown subchunks within the INFO-list chunk. Note, however, that
until such subchunks become defined in the specification, inclusion of
additional INFO-list subchunks will preclude the file from conforming
to the SoundFont standard.
A detailed description of the SoundFont 2 RIFF structure is provided
in Section 4.
3.3 Redundancy and Error Handling in the RIFF structure
The RIFF file structure contains redundant information regarding the
length of the file and the length of the chunks and subchunks. This
fact enables any reader of a SoundFont compatible file to determine if
the file has been damaged by loss of data.
If any such loss is detected, the SoundFont compatible file is termed
Ostructurally unsoundO and in general should be rejected. SoundFont
compatible software developers may produce utilities to recover data
from structurally unsound files, producing with or without user
assitance a corrected and structurally sound SoundFont 2 compatible
file.
4 SoundFont 2 RIFF File Format
4.1 SoundFont 2 RIFF File Format Level 0
<SFBK-form> -> RIFF (OsfbkO ; RIFF form header
{
<INFO-list> ; Supplemental Information
<sdta-list> ; The Sample Binary Data
<pdta-list> ; The Preset, Instrument, and Sample
Header data
}
)
4.2 SoundFont 2 RIFF File Format Level 1
<INFO-list> -> LIST (OINFOO
{
<ifil-ck> ; Refers to the version of the Sound
Font RIFF file
<isng-ck> ; Refers to the target Sound Engine
<INAM-ck> ; Refers to the Sound Font Bank Name
[<irom-ck>] ; Refers to the Sound ROM Name
[<iver-ck>] ; Refers to the Sound ROM Version
[<ICRD-ck>] ; Refers to the Date of Creation of
the Bank
[<IENG-ck>] ; Sound Designers and Engineers for
the Bank
[<IPRD-ck>] ; Product for which the Bank was
intended
[<ICOP-ck>] ; Contains any Copyright message
[<ICMT-ck>] ; Contains any Comments on the Bank
[<ISFT-ck>] ; The SoundFont tools used to create
and alter the bank
}
)
<sdta-ck> -> LIST (OsdtaO
{
[<smpl-ck.] ; The Digital Audio Samples
}
)
<pdta-ck> -> LIST (OpdtaO
{
<phdr-ck> ; The Preset Headers
<pbag-ck> ; The Preset Index list
<pmod-ck> ; The Preset Modulator list
<pgen-ck> ; The Preset Generator list
<inst-ck> ; The Instrument Names and Indicies
<ibag-ck> ; The Instrument Index list
<imod-ck> ; The Instrument Modulator list
<igen-ck> ; The Instrument Generator list
<shdr-ck> ; The Sample Headers
}
)
4.3 SoundFont 2 RIFF File Format Level 2
<ifil-ck> -> ifil(<iver-rec>) ; e.g. 2.00
<isng-ck> -> isng(szSoundEngine:ZSTR) ; e.g.
OEMU8000O
<irom-ck> -> irom(szROM:ZSTR) ; e.g. O1MGMO
<iver-ck> -> iver(<iver-rec>) ; e.g. 2.08
<INAM-ck> -> INAM(szName:ZSTR) ; e.g. OGeneral
MIDIO
<ICRD-ck> -> ICRD(szDate:ZSTR) ; e.g. OJuly
15, 1995O
<IENG-ck> -> IENG(szName:ZSTR) ; e.g. OJohn Q.
EngineerO
<IPRD-ck> -> IPRD(szProduct:ZSTR) ; e.g.
OSBAWE32O
<ICOP-ck> -> ICOP(szCopyright:ZSTR) ; e.g.
OCopyright (c) 1995 E-mu Systems, Inc.O
<ICMT-ck> -> ICMT(szComment:ZSTR) ; e.g.
OThis is a commentO
<ISTF-ck> -> ISFT(szTools:ZSTR) ; e.g.
OPreditor 2.00a:Preditor 2.00aO
<smpl-ck> -> smpl(<sample:SHORT>) ; 16 bit
Linearly Coded Digital Audio Data
<phdr-ck> -> phdr(<phdr-rec>)
<pbag-ck> -> pbag(<pbag-rec>)
<pmod-ck> -> pmod(<pmod-rec>)
<pgen-ck> -> pgen(<pgen-rec>)
<inst-ck> -> inst (<inst -rec>)
<ibag-ck> -> ibag(<ibag-rec>)
<imod-ck> -> imod(<imod-rec>)
<igen-ck> -> igen(<igen-rec>)
<shdr-ck> -> shdr(<shdr-rec>)
4.4 SoundFont 2 RIFF File Format Level 3
<iver-rec> -> struct sfVersionTag
{
WORD wMajor;
WORD wMinor;
};
<phdr-rec> -> struct sfPresetHeader
{
CHAR achPresetName[20];
WORD wPreset;
WORD wBank;
WORD wPresetBagNdx;
DWORD dwLibrary;
DWORD dwGenre;
DWORD dwMorphology;
};
<pbag-rec> -> struct sfPresetBag
{
WORD wGenNdx;
WORD wModNdx;
};
<pmod-rec> -> struct sfModList
{
SFModulator sfModSrcOper;
SFGenerator sfModDestOper;
SHORT modAmount;
SFModulator sfModAmtSrcOper;
SFTransform sfModTransOper;
};
<pgen-rec> -> struct sfGenList
{
SFGenerator sfGenOper;
genAmountType genAmount;
};
<inst-rec> -> struct sfInst
{
CHAR achInstName[20];
WORD wInstBagNdx;
};
<ibag-rec> -> struct sfInstBag
{
WORD wInstGenNdx;
WORD wInstModNdx;
};
<imod-rec> -> struct sfInstModList
{
SFModulator sfModSrcOper;
SFGenerator sfModDestOper;
SHORT modAmount;
SFModulator sfModAmtSrcOper;
SFTransform sfModTransOper;
};
<igen-rec> -> struct sfInstGenList
{
SFGenerator sfGenOper;
genAmountType genAmount;
};
<shdr-rec> -> struct sfSample
{
CHAR achSampleName[20];
DWORD dwStart;
DWORD dwEnd;
DWORD dwStartloop;
DWORD dwEndloop;
DWORD dwSampleRate;
BYTE byOriginalKey;
CHAR chCorrection;
WORD wSampleLink;
SFSampleLink sfSampleType;
};
4.5 SoundFont 2 RIFF File Format Type Definitions
The sfModulator, sfGenerator, and sfTransform types are all
enumeration types whose values are defined in subsequent sections.
The genAmountType is a union which allows signed 16 bit, unsigned 16
bit, and two unsigned 8 bit fields:
typedef struct
{
BYTE byLo;
BYTE byHi;
} rangesType;
typedef union
{
rangesType ranges;
SHORT shAmount;
WORD wAmount;
} genAmountType;
The SFSampleLink is an enumeration type which describes both the type
of sample (mono, stereo left, etc.) and the whether the sample is
located in RAM or ROM memory:
typedef enum
{
monoSample = 1,
rightSample = 2,
leftSample = 4,
linkedSample = 8,
RomMonoSample = 0x8001,
RomRightSample = 0x8002,
RomLeftSample = 0x8004,
RomLinkedSample = 0x8008
} SFSampleLink;
5 The INFO-list Chunk
TheINFO-list chunk in a SoundFont 2 compatible file contains three
mandatory and a variety of optional subchunks as defined below. The
INFO-list chunk gives basic information about the SoundFont compatible
bank contained in the file.
5.1 The ifil Subchunk
The ifil subchunk is a mandatory subchunk identifying the SoundFont
specification version level to which the file complies. It is always
four bytes in length, and contains data according to the structure:
struct sfVersionTag
{
WORD wMajor;
WORD wMinor;
};
The WORD wMajor contains the value to the left of the decimal point in
the SoundFont specification version, the WORD wMinor contains the
value to the right of the decimal point. For example, version 2.11
would be implied if wMajor=2 and wMinor=11.
These values can be used by applications which read SoundFont
compatible files to determine if the format of the file is usable by
the program. Within a fixed wMajor, the only changes to the format
will be the addition of Generator, Source and Transform enumerators,
and additional info subchunks. These are all defined as being ignored
if unknown to the program. Consequently, many applications can be
designed to be fully upward compatible within a given wMajor. In the
case of editors or other programs in which all enumerators should be
known, the value of wMinor may be of consequence. Generally the
application program will either accept the file as usable (possibly
with appropriate transparent translation), reject the file as
unusable, or warn the user that there may be uneditable data in the
file.
If the ifil subchunk is missing, or its size is not four bytes, the
file should be rejected as structurally unsound.
5.2 The isng Subchunk
The isng subchunk is a mandatory subchunk identifying the wavetable
sound engine for which the file was optimized. It contains an ASCII
string of 256 or fewer bytes including one or two terminators of value
zero, so as to make the total byte count even. The default isng field
is the eight bytes representing OEMU8000O as seven ASCII characters
followed by a zero byte.
The ASCII should be treated as case-sensitive. In other words
Oemu8000O is not the same as OEMU8000.O
The isng string can be optionally used by chip drivers to vary their
synthesis algorithms to emulate the target sound engine.
If the isng subchunk is missing, not terminated in a zero valued byte,
or its contents are an unknown sound engine, the field should be
ignored and EMU8000 assumed.
5.3 The INAM Subchunk
The INAM subchunk is a mandatory subchunk providing the name of the
SoundFont compatible bank. It contains an ASCII string of 256 or
fewer bytes including one or two terminators of value zero, so as to
make the total byte count even. A typical inam subchunk would be the
fourteen bytes representing OGeneral MIDIO as twelve ASCII characters
followed by two zero bytes.
The ASCII should be treated as case-sensitive. In other words
OGeneral MIDIO is not the same as OGENERAL MIDI.O
The inam string is typically used for the idenitification of banks
even if the file names are altered.
If the inam subchunk is missing, or not terminated in a zero valued
byte, the field should be ignored and the user supplied with an
appropriate error message if the name is queried. If the file is re-
written, a valid name should be placed in the INAM field.
5.4 The irom Subchunk
The irom subchunk is an optional subchunk identifying a particular
wavetable sound data ROM to which any ROM samples refer. It contains
an ASCII string of 256 or fewer bytes including one or two terminators