JULIUS(1)                                               JULIUS(1)



NAME
       Julius - open source multi-purpose LVCSR engine

SYNOPSIS
       julius [-C jconffile] [options ...]

DESCRIPTION
       Julius  is  a high-performance, multi-purpose, free speech
       recognition engine for researchers and developers.  It  is
       capable of performing almost real-time recognition of con-
       tinuous speech with over 60k-word vocabulary on most  cur-
       rent PCs.

       Julius needs an N-gram language model, word dictionary and
       an acoustic model  to  execute  a  recognition.   Standard
       model  formats  (i.e.  ARPA  and  HTK) with any word/phone
       units and sizes are supported, so users can build a recog-
       nition  system for various target using their own language
       model and acoustic models.  For details about basic models
       and their availability, please see the documents contained
       in this package.

       Julius can perform recognition on audio files, live micro-
       phone  input,  network  input and feature parameter files.
       The maximum size of vocabulary is 65,535 words.

RECOGNITION MODELS
       Julius supports the following models.

       Acoustic Models
                 Sub-word HMM (Hidden Markov Model) in HTK format
                 are supported.  Phoneme models (monophone), con-
                 text dependent phoneme models (triphone),  tied-
                 mixture  and phonetic tied-mixture models of any
                 unit can be used.  When using context  dependent
                 models, interword context is also handled.

       Language model
                 2-gram  and  reverse  3-gram language models are
                 used.  The Standard ARPA  format  is  supported.
                 In addition, a binary format N-gram is also sup-
                 ported for efficiency.  The tool mkbingram.  can
                 convert  binary  N-gram  from  the ARPA language
                 models.

SPEECH INPUT
       Both live speech input and recorded speech file input  are
       supported.   Live  input  stream  from  microphone device,
       DatLink (NetAudio) device and tcpip  network  input  using
       adintool  is  supported.  Speech waveform files (16bit WAV
       (no compression), RAW format, and many other formats  will
       be  acceptable if compiled with libsndfile library).  Fea-
       ture parameter files in HTK format are also supported.

       Note that Julius itself can only extract MFCC_E_D_N_Z fea-
       tures  from  speech  data.   If  you  use  an acoustic HMM
       trained by other feature type, only the HTK parameter file
       of the same feature type can be used.

SEARCH ALGORITHM OVERVIEW
       Recognition  algorithm  of  Julius  is based on a two-pass
       strategy.  Word 2-gram and reverse word 3-gram is used  on
       the  respective  passes.  The entire input is processed on
       the first pass, and again the final searching  process  is
       performed  again  for  the  input, using the result of the
       first pass to narrow the search space.  Specifically,  the
       recognition algorithm is based on a tree-trellis heuristic
       search combined with left-to-right frame-synchronous  beam
       search and right-to-left stack decoding search.

       When using context dependent phones (triphones), interword
       contexts are taken into consideration.   For  tied-mixture
       and  phonetic  tied-mixture  models,  high-speed  acoustic
       likelihood calculation is possible using gaussian pruning.

       For  more  details,  see  the related document or web page
       below.

OPTIONS
       The options below specify the models, system behaviors and
       various search parameters.  These option can be set all at
       once at the command line, but it is recommended  that  you
       write  them  in a text file as a "jconf file", and specify
       the file with "-C" option.

   Speech Input
       -input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
              Select speech  data  input  source.   'rawfile'  is
              waveform  file,  and  specified  after startup from
              stdin).  'mic' means microphone device, and  'adin-
              net'  means  receiving waveform data via tcpip net-
              work from an adinnet  client.  'netaudio'  is  from
              DatLink/NetAudio  input,  and  'stdin'  means  data
              input from standard input.

              WAV (no  compression)  and  RAW  (noheader,  16bit,
              BigEndian)  are  supported for waveform file input.
              Other  format  can  be  supported  using   external
              library.  To see what format is actually supported,
              see the help message  using  option  "-help".   For
              stdin input, only WAV and RAW is supported.
              (default: mfcfile)

       -filelist file
              (With  -input  rawfile|mfcfile) perform recognition
              on all files listed in the file.

       -adport portnum
              (With -input adinnet) adinnet port number (default:
              5530)

       -NA server:unit
              (With -input netaudio) set the server name and unit
              ID of the Datlink unit.

       -record directory
              Auto-save input speech data successively under  the
              directory.  Each segmented inputs are recorded to a
              file each by one.  The file name  of  the  recorded
              data  is  generated from system time when the input
              starts, in a style of "YYYY.MMDD.HHMMSS.wav".  File
              format  is  16bit monoral WAV.  Invalid for mfcfile
              input.

   Speech Detection
       Options in this section is invalid for mfcfile input.



       -cutsilence

       -nocutsilence
              Force silence cutting (=speech  segment  detection)
              to  ON/OFF.  (default:  ON for mic/adinnet, OFF for
              files)

       -lv threslevel
              Level threshold (0 - 32767) for speech  triggering.
              If  audio  input amplitude goes over this threshold
              for a period, Julius begin the  1st  pass  recogni-
              tion.   If  the  level  goes below this level after
              triggering, it is the end of  the  speech  segment.
              (default: 2000)

       -zc zerocrossnum
              Zero crossing threshold per a second (default: 60)

       -headmargin msec
              Margin  at the start of speech segment in millisec-
              onds. (default: 300)

       -tailmargin msec
              Margin at the end of speech  segment  in  millisec-
              onds. (default: 400)

       -nostrip
              Julius  by  default  removes  zero samples in input
              speech data.  In some cases, such invalid data  may
              be recorded at the start or end of recording.  This
              option inhibit this automatic removal.

   Acoustic Analysis
       -smpFreq frequency
              Set sampling frequency of input speech in Hz.  Sam-
              pling  rate  can  also  be specified using "-smpPe-
              riod".  Be careful that this  frequency  should  be
              the  same  as  the  trained  conditions of acoustic
              model you use.  This should be specified for micro-
              phone  input  and  RAW  file input when using other
              than default rate.  Also see  "-fsize",  "-fshift",
              "-delwin".
              (default: 16000 (Hz) = 625ns).

       -smpPeriod period
              Set  sampling frequency of input speech by its sam-
              pling period (nanoseconds).  The sampling rate  can
              also  be  specified  using  "-smpFreq".  Be careful
              that the input frequency should be the same as  the
              trained  conditions of acoustic model you use. This
              should be specified for microphone  input  and  RAW
              file  input  when  using  other  than default rate.
              Also see "-fsize", "-fshift", "-delwin".
              (default: 625 (ns) = 16000Hz).

       -fsize sample
              Analysis  window  size  in   number   of   samples.
              (default: 400).

       -fshift sample
              Frame shift in number of samples (default: 160).

       -delwin frame
              Delta  window  size  in number of samples (default:
              2).

       -lofreq frequency
              Enable band-limiting for MFCC  filterbank  computa-
              tion:   set  lower  frequency  cut-off.   Also  see
              "-hifreq".
              (default: -1 = disabled)

       -hifreq frequency
              Enable band-limiting for MFCC  filterbank  computa-
              tion:   set  upper  frequency  cut-off.   Also  see
              "-lofreq".
              (default: -1 = disabled)

       -sscalc
              Perform spectral subtraction  using  head  part  of
              each  file.   With this option, Julius assume there
              are certain length of silence at each  input  file.
              Valid   only  for  rawfile  input.   Conflict  with
              "-ssload".

       -sscalclen
              With "-sscalc", specify the  length  of  head  part
              silence in milliseconds (default: 300)

       -ssload filename
              Perform spectral subtraction for speech input using
              pre-estimated noise spectrum from file.  The  noise
              spectrum  data  should  be  computed  beforehand by
              mkss.  Valid for all speech input.   Conflict  with
              "-sscalc".

       -ssalpha value
              Alpha   coefficient  of  spectral  subtraction  for
              "-sscals" and "-ssload".  Noise will be  subtracted
              stronger  as this value gets larger, but distortion
              of the resulting signal  also  becomes  remarkable.
              (default: 2.0)

       -ssfloor value
              Flooring  coefficient of spectral subtraction.  The
              spectral parameters that go under zero  after  sub-
              traction  will  be substituted by the source signal
              with this coefficient multiplied. (default: 0.5)

   Language Model (word N-gram)
       -nlr 2gram_filename
              2-gram language model file in standard ARPA format.

       -nrl rev_3gram_filename
              Reverse   3-gram  language  model  file.   This  is
              required for the second search pass.   If  this  is
              not  defined  then  only  the  first pass will take
              place.

       -d bingram_filename
              Use binary format language model  instead  of  ARPA
              formats.   The  2-gram and 3-gram model can be com-
              bined and converted to  this  binary  format  using
              mkbingram.  Julius can read this format much faster
              than ARPA format.

       -lmp lm_weight lm_penalty

       -lmp2 lm_weight2 lm_penalty2
              Language model score  weights  and  word  insertion
              penalties   for   the   first   and  second  passes
              respectively.

              The hypothesis language scores are scaled as  shown
              below:

              lm_score1  =  lm_weight * 2-gram_score + lm_penalty
              lm_score2 = lm_weight2 * 3-gram_score + lm_penalty2

              The defaults are dependent on acoustic model:

                First-Pass | Second-Pass
               --------------------------
                5.0 -1.0   |  6.0  0.0 (monophone)
                8.0 -2.0   |  8.0 -2.0 (triphone,PTM)
                9.0  8.0   | 11.0 -2.0 (triphone,PTM, setup=v2.1)

       -transp float
              Additional insertion penalty for transparent words.
              (default: 0.0)

   Word Dictionary
       -v dictionary_file
              Word dictionary file (required).

       -silhead {WORD|WORD[OUTSYM]|#num}

       -siltail {WORD|WORD[OUTSYM]|#num}
              Sentence  start  and end silence word as defined in
              the dictionary.  (default: "<s>" / "</s>")

              Julius deal these words  as  fixed  start-word  and
              end-word  of  recognition.   They can be defined in
              several formats as shown below.

                                       Example
           Word_name                     <s>
           Word_name[output_symbol]   <s>[silB]
           #Word_ID                      #14

            (Word_ID is the word position in the dictionary
             file starting from 0)

       -forcedict
              Ignore dictionary errors and force running.   Words
              with  errors  will  be  dropped  from dictionary at
              startup.

   Acoustic Model (HMM)
       -h hmmfilename
              HMM definition file to use. (required)

       -hlist HMMlistfilename
              HMMList file to use.  Required when using  triphone
              based  HMMs.   This file provides a mapping between
              the logical triphones names genertated  from  phone
              sequence  in  the dictionary and the HMM definition
              names.

       -iwcd1 {max|avg}
              When using a triphone model, select method to  han-
              dle  inter-word  triphone  context on the first and
              last phone of a word in the first pass.

              max: use maximum likelihood of the same
                   context triphones (default)
              avg: use average likelihood of the same
                   context triphones

       -force_ccd / -no_ccd
              Normally Julius determines  whether  the  specified
              acoustic  model  is  a context-dependent model from
              the model names, i.e., whether the model names con-
              tain  character  '+'  and  '-'.  You can explicitly
              specify by these options  to  avoid  mis-detection.
              These will override the automatic detection result.

       -notypecheck
              Disable checking of input parameter type. (default:
              enabled)

   Acoustic Computation
       Gaussian  Pruning will be automatically enabled when using
       tied-mixture based  acoutic  model.   It  is  disabled  by
       default  for non tied-mixture models, but you can activate
       pruning  to  those   models   by   explicitly   specifying
       "-gprune".   Gaussian  Selection  needs  a monophone model
       converted by mkgshmm.

       -gprune {safe|heuristic|beam|none}
              Set the Gaussian pruning technique to use.
              (default:    'safe'    (setup=standard),     'beam'
              (setup=fast) for tied mixture model, 'none' for non
              tied-mixture model)

       -tmix K
              With Gaussian Pruning, specify the number of  Gaus-
              sians  to compute per mixture codebook. Small value
              will speed up  computation,  but  likelihood  error
              will grow larger. (default: 2)

       -gshmm hmmdefs
              Specify monophone hmmdefs to use for Gaussian  Mix-
              ture  Selectio.   Monophone model for GMS is gener-
              ated from an ordinary  monophone  HMM  model  using
              mkgshmm.   This  option is disabled by default. (no
              GMS applied)

       -gsnum N
              When using GMS, specify number of  monophone  state
              to  select  from  whole monophone states. (default:
              24)

   Inter-word Short Pause Handling
       -iwspword
              Add a word entry to the dictionary that should cor-
              respond  to  inter-word short pauses that may occur
              in input  speech.   This  may  improve  recognition
              accuracy  in some language model that has no inter-
              word pause modeling.  The word entry can be  speci-
              fied by "-iwspentry".

       -iwspentry
              Specify  the  word  entry  that  will  be  added by
              "-iwspword".  (default: "<UNK> [sp] sp sp")

       -iwsp  (Multi-path version only)  Enable  inter-word  con-
              text-free   short   pause  handling.   This  option
              appends a skippable short  pause  model  for  every
              word  end.   The  added  model  will  be skipped on
              inter-word context handling.  The HMM model  to  be
              appended can be specified by "-spmodel" option.

       -spmodel
              Specify short-pause model name that will be used in
              "-iwsp". (default: "sp")

   Short-pause Segmentation
       The short pause segmentation can  be  used  for  sucessive
       decoding  of a long utterance.  Enabled when compiled with
       '--enable-sp-segment'.

       -spdur Set the short-pause duration threshold in number of
              frames.   If  a  short-pause  word  has the maximum
              likelihood in successive frames  longer  than  this
              value,  then interrupt the first pass and start the
              second pass. (default: 10)

   Search Parameters (First Pass)
       -b beamwidth
              Beam width (number of HMM nodes) on the first pass.
              This  value  defines  search width on the 1st pass,
              and has great effect on the total processing  time.
              Smaller  width  will speed up the decoding, but too
              small value will result in a  substantial  increase
              of   recognition  errors  due  to  search  failure.
              Larger value will make the search stable  and  will
              lead  to  failure-free  search, but processing time
              and memory usage will grow  in  proportion  to  the
              width.

              Default value is acoustic model dependent:
                400 (monophone)
                800 (triphone,PTM)
               1000 (triphone,PTM, setup=v2.1)

       -sepnum N
              Number of high frequency words to be separated from
              the lexicon tree. (default: 150)

       -1pass Only perform the first pass search.  This  mode  is
              automatically set when no 3-gram language model has
              been specified (-nlr).

       -realtime

       -norealtime
              Explicitly  specify  whether  real-time  (pipeline)
              processing  will  be done in the first pass or not.
              For file input, the default is  OFF  (-norealtime),
              for  microphone,  adinnet  and  NetAudio input, the
              default is ON (-realtime).  This option relates  to
              the  way  CMN  is performed: when OFF CMN is calcu-
              lated for each input independently, when the  real-
              time option is ON the previous 5 second of input is
              always used.  Also refer to "-progout".

       -cmnsave filename
              Save last CMN parameters computed while recognition
              to  the  specified  file.   The  parameters will be
              saved to the file in each time a  input  is  recog-
              nized, so the output file always keeps the last CMN
              parameters.  If output file already exist, it  will
              be overridden.

       -cmnload filename
              Load  initial  CMN parameters previously saved in a
              file by "-cmnsave".  This option enables Julius  to
              recognize  the first utterance of a live microphone
              input or adinnet input with CMN.

   Search Parameters (Second Pass)
       -b2 hyponum
              Beam width (number of hypothesis) in  second  pass.
              If  the count of word expantion at a certain length
              of hypothesis  reaches  this  limit  while  search,
              shorter  hypotheses are not expanded further.  This
              prevents search to fall in breadth-first-like  sta-
              tus  stacking  on  the  same  position, and improve
              search failure.  (default: 30)

       -n candidatenum
              The search continues till 'candidate_num'  sentence
              hypotheses  have been found.  The obtained sentence
              hypotheses are sorted by score, and final result is
              displayed  in  the  order  (see  also the "-output"
              option).

              The possibility that the optimum hypothesis is cor-
              rectly   found   increases   as   this  value  gets
              increased, but the  processing  time  also  becomes
              longer.

              Default  value depends on the  engine setup on com-
              pilation time:
                10  (standard)
                 1  (fast, v2.1)

       -output N
              The top N sentence hypothesis will be Output at the
              end of search.  Use with "-n" option. (default: 1)

       -cmalpha float
              This  parameter  decides  smoothing  effect of word
              confidence measure.  (default: 0.05)

       -sb score
              Score envelope width for enveloped  scoring.   When
              calculating  hypothesis  score  for  each generated
              hypothesis, its trellis expansion and viterbi oper-
              ation will be pruned in the middle of the speech if
              score on a frame goes under [current maximum  score
              of the frame- width].  Giving small value makes the
              second  pass  faster,  but  computation  error  may
              occur.  (default: 80.0)

       -s stack_size
              The maximum number of hypothesis that can be stored
              on the stack during the search.  A larger value may
              give  more stable results, but increases the amount
              of memory required. (default: 500)

       -m overflow_pop_times
              Number of expanded hypotheses required  to  discon-
              tinue  the  search.   If  the  number  of  expanded
              hypotheses is greater then this threshold then, the
              search  is  discontinued at that point.  The larger
              this value is, The longer Julius gets  to  give  up
              search (default: 2000)

       -lookuprange nframe
              When  performing word expansion on the second pass,
              this option sets the number of  frames  before  and
              after  to  look up next word hypotheses in the word
              trellis.   This  prevents  the  omission  of  short
              words,  but  with  a  large  value,  the  number of
              expanded hypotheses increases  and  system  becomes
              slow. (default: 5)

   Forced Alignment



       -walign
              Do viterbi alignment per word units from the recog-
              nition result.  The word boundary  frames  and  the
              average acoustic scores per frame are calculated.

       -palign
              Do viterbi alignment per phoneme (model) units from
              the  recognition  result.   The  phoneme   boundary
              frames  and  the  average acoustic scores per frame
              are calculated.

       -salign
              Do viterbi alignment per HMM state from the  recog-
              nition  result.   The state boundary frames and the
              average acoustic scores per frame are calculated.

   Server Module Mode
       -module [port]
              Run Julius on "Server Module Mode".  After startup,
              Julius  waits  for  tcp/ip  connection from client.
              Once connection is established, Julius start commu-
              nication  with  the client to process incoming com-
              mands from the client,  or  to  output  recognition
              results, input trigger information and other system
              status to the client.  The  multi-grammar  mode  is
              only  supported  at  this  Server Module Mode.  The
              default port number is 10500.  jcontrol  is  sample
              client contained in this package.

       -outcode [W][L][P][S][C][w][l][p][s]
              (Only  for Server Module Mode) Switch which symbols
              of recognized words to be sent to client.   Specify
              'W'  for  output  symbol, 'L' for N-gram entry, 'P'
              for phoneme sequence, 'S' for score,  and  'C'  for
              confidence  score,  respectively.   Capital letters
              are for the second pass (final result),  and  small
              letters  are  for  results  of the first pass.  For
              example, if you want to send only the  output  sym-
              bols and phone sequences as a recognition result to
              a client, specify "-outcode WP".

   Message Output
       -separatescore
              Output the language and acoustic scores separately.

       -quiet Omit  phoneme  sequence  and score, only output the
              best word sequence hypothesis.

       -progout
              Enable progressive output of the partial results on
              the first pass.

       -proginterval msec
              set  the output time interval of "-progout" in mil-
              liseconds.

       -demo  Equivalent to "-progout -quiet"

   OTHERS
       -debug (For debug) output  enoumous  internal  status  and
              debug information.

       -C jconffile
              Load  the  jconf  file.  The options written in the
              file are included and expanded at the point.   This
              option can also be used within other jconf file for
              recursive expansion.

       -check wchmm
              (For debug) turn on interactive check mode of  tree
              lexicon structure at startup.

       -check triphone
              (For debug) turn on interactive check mode of model
              mapping between Acoustic model, HMMList and dictio-
              nary at startup.

       -setting
              Display compile-time engine configuration and exit.

       -help  Display a brief description of all options.

EXAMPLES
       For examples of system usage, refer to the  tutorial  sec-
       tion in the Julius documents.

NOTICE
       Note about jconf files: relative paths in a jconf file are
       interpreted as relative to the jconf file itself,  not  to
       the current directory.

SEE ALSO
       julian(1),  jcontrol(1),  adinrec(1),  adintool(1), mkbin-
       gram(1), mkgsmm(1), wav2mfcc(1), mkss(1)

       http://julius.sourceforge.jp/

DIAGNOSTICS
       Julius normally will return the  exit  status  0.   If  an
       error  occurs, Julius exits abnormally with exit status 1.
       If an input file cannot be found or cannot be  loaded  for
       some  reason  then  Julius  will  skip processing for that
       file.

BUGS
       There are some restrictions to the type and  size  of  the
       models  Julius  can use.  For a detailed explanation refer
       to the Julius documentation.   For  bug-reports,  inquires
       and  comments  please contact julius@kuis.kyoto-u.ac.jp or
       julius@is.aist-nara.ac.jp.

COPYRIGHT
       Copyright (c) 1991-2003 Kyoto University, Japan
       Copyright (c) 1997-2000  Information-technology  Promotion
       Agency, Japan
       Copyright  (c)  2000-2003  Nara  Institute  of Science and
       Technology, Japan

AUTHORS
       Rev.1.0 (1998/02/20)
              Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto
              University)

              Development by Akinobu LEE (Kyoto University)

       Rev.1.1 (1998/04/14)

       Rev.1.2 (1998/10/31)

       Rev.2.0 (1999/02/20)


       Rev.2.1 (1999/04/20)

       Rev.2.2 (1999/10/04)

       Rev.3.0 (2000/02/14)

       Rev.3.1 (2000/05/11)
              Development of above versions by Akinobu LEE (Kyoto
              University)

       Rev.3.2 (2001/08/15)

       Rev.3.3 (2002/09/11)

       Rev.3.4 (2003/10/01)
              Development of above versions by Akinobu LEE  (Nara
              Institute of Science and Technology)

THANKS TO
       From  rev.3.2, Julius is released by the "Information Pro-
       cessing Society, Continuous Speech Consortium".

       The Windows DLL version  was  developed  and  released  by
       Hideki BANNO (Nagoya University).

       The  Windows  Microsoft  Speech API compatible version was
       developed by Takashi SUMIYOSHI (Kyoto University).



                              LOCAL                     JULIUS(1)
