JULIAN(1)                                               JULIAN(1)



NAME
       Julian  -  grammar  based  continuous  speech  recognition
       parser

SYNOPSIS
       julian [-C jconffile] [options ...]

DESCRIPTION
       Julian is a high-performance, multi-purpose,  free  speech
       recognition  parser  based on finite state grammar.  It is
       capable of performing real-time recognition of  continuous
       speech with over thousands of vocabulary.

       Julian is a derived version of Julius, and almost all com-
       ponents are the same except language model related part.

       To execute a recognition, it needs an acoustic model and a
       finite  state  grammar that describes sentence patterns to
       be recognized.  The grammar format is an original one, and
       tools  to create a recognirion grammar are included in the
       distribution.  For acoustic model, standard  format  (i.e.
       HTK)  with  any  word/phone units and sizes are supported.
       So users can build a  recognition  system  customized  for
       specific tasks using own task grammar and acoustic models.
       For details about models  and  how  to  write  a  grammar,
       please see the documents contained in this package.

       Julian can perform recognition on audio files, live micro-
       phone input, network input and  feature  parameter  files.
       The maximum size of vocabulary is 65,535 words.

RECOGNITION MODELS
       Julian supports the following models.

       Acoustic Models
                 Same  as  Julius:  Sub-word  HMM  (Hidden Markov
                 Model) in HTK  format  are  supported.   Phoneme
                 models  (monophone),  context  dependent phoneme
                 models  (triphone),  tied-mixture  and  phonetic
                 tied-mixture  models  of  any  unit can be used.
                 When using context dependent  models,  interword
                 context is also handled.

       Language model
                 The grammar format is an original one, and tools
                 to create a recognirion grammar are included  in
                 the  distribution.   A  grammar  consists of two
                 files: one is a 'grammar'  file  that  describes
                 sentence  structures  in a BNF style, using word
                 'categories' as terminate symbols.  Another is a
                 'voca' file that defines word with its pronunci-
                 ations (i.e. phoneme sequences) for  each  cate-
                 gory.   They  should be converted by mkdfa.pl(1)
                 to a deterministic finite automaton file  (.dfa)
                 and a dictionary file (.dict), respectively.

SPEECH INPUT
       Same as Julius: Both live speech input and recorded speech
       file input are supported. Live input  stream  from  micro-
       phone  device, DatLink (NetAudio) device and tcpip network
       input using adintool is supported.  Speech waveform  files
       (16bit  WAV  (no  compression), RAW format, and many other
       format will be  acceptable  if  compiled  with  libsndfile
       library).   Feature parameter files in HTK format are also
       supported.

       Note that Julian itself can only extract MFCC_E_D_N_Z fea-
       tures  from  speech  data.   If  you  use  an acoustic HMM
       trained by other feature type, only the HTK parameter file
       of the same feature type can be used.

SEARCH ALGORITHM OVERVIEW
       Recognition  algorithm  of  Julian  is based on a two-pass
       strategy.  In the first  pass,  a  high-speed  approximate
       search  is  performed  using  weaker  constraints then the
       given grammar.  Here a LR beam search  using  only  inter-
       category  constraints  extracted  from the grammar is per-
       formed. The second pass re-searches the input,  using  the
       original  grammar  rules and intermediate results from the
       first pass, to gain a high precision result  quickly.   In
       the  second  pass  the  optimal  solution is theoretically
       guaranteed using the A* search.

       When using context dependent phones (triphones), interword
       contexts  are  taken into consideration.  For tied-mixture
       and  phonetic  tied-mixture  models,  high-speed  acoustic
       likelihood calculation is possible using gaussian pruning.

       For more details, see the related  document  or  web  page
       below.

OPTIONS
       The options below specify the models, system behaviors and
       various search parameters.  These option can be set all at
       once  at  the command line, but it is recommended that you
       write them in a text file as a "jconf file",  and  specify
       the file with "-C" option.

       Most are the same as Julius.
       Options only in Julian: -dfa, -penalty1, -penalty2, -look-
       trellis
       Options only in  Julius:  -nlr,  -nrl,  -d,  -lmp,  -lmp2,
       -transp,   -silhead,  -siltail,  -spdur,  -sepnum,  -sepa-
       ratescore


   Speech Input
       -input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
              Select speech  data  input  source.   'rawfile'  is
              waveform  file,  and  specified  after startup from
              stdin).  'mic' means microphone device, and  'adin-
              net'  means  receiving waveform data via tcpip net-
              work from an adinnet  client.  'netaudio'  is  from
              DatLink/NetAudio  input,  and  'stdin'  means  data
              input from standard input.

              WAV (no  compression)  and  RAW  (noheader,  16bit,
              BigEndian)  are  supported for waveform file input.
              Other  format  can  be  supported  using   external
              library.  To see what format is actually supported,
              see the help message  using  option  "-help".   For
              stdin input, only WAV and RAW is supported.
              (default: mfcfile)

       -filelist file
              (With  -input  rawfile|mfcfile) perform recognition
              on all files listed in the file.


       -adport portnum
              (with -input adinnet) adinnet port number (default:
              5530)

       -NA server:unit
              (with -input netaudio) set the server name and unit
              ID of the Datlink unit.

       -record directory
              Auto-save input speech data successively under  the
              directory.  Each segmented inputs are recorded to a
              file each by one.  The file name  of  the  recorded
              data  is  generated from system time when the input
              starts, in a style of "YYYY.MMDD.HHMMSS.wav".  File
              format  is  16bit monoral WAV.  Invalid for mfcfile
              input.

   Speech Detection
       Options in this section is invalid for mfcfile input.

       -cutsilence

       -nocutsilence
              Force silence cutting (=speech  segment  detection)
              to  ON/OFF.  (default:  ON for mic/adinnet, OFF for
              files)

       -lv threslevel
              Level threshold (0 - 32767) for speech  triggering.
              If  audio  input amplitude goes over this threshold
              for a period, Julius begin the  1st  pass  recogni-
              tion.   If  the  level  goes below this level after
              triggering, it is the end of  the  speech  segment.
              (default: 2000)

       -zc zerocrossnum
              Zero crossing threshold per a second (default: 60)

       -headmargin msec
              Margin  at  the start of the speech segment in mil-
              liseconds. (default: 300)

       -tailmargin msec
              Margin at the end of the  speech  segment  in  mil-
              liseconds. (default: 400)

       -nostrip
              Julian  by  default  removes  zero samples in input
              speech data.  In some cases, such invalid data  may
              be recorded at the start or end of recording.  This
              option inhibit this automatic removal.

   Acoustic Analysis
       -smpFreq frequency
              Set sampling frequency of input speech in Hz.  Sam-
              pling  rate  can  also  be specified using "-smpPe-
              riod".  Be careful that this  frequency  should  be
              the  same  as  the  trained  conditions of acoustic
              model you use.  This should be specified for micro-
              phone  input  and  RAW  file input when using other
              than default rate.  Also see  "-fsize",  "-fshift",
              "-delwin".
              (default: 16000 (Hz = 625ns))



       -smpPeriod period
              Set  sampling frequency of input speech by its sam-
              pling period (nanoseconds).  The sampling rate  can
              also  be  specified  using  "-smpFreq".  Be careful
              that the input frequency should be the same as  the
              trained  conditions of acoustic model you use. This
              should be specified for microphone  input  and  RAW
              file  input  when  using  other  than default rate.
              Also see "-fsize", "-fshift", "-delwin".
              (default: 625 (ns = 16000Hz))

       -fsize sample
              Analysis  window  size  in   number   of   samples.
              (default: 400).

       -fshift sample
              Frame shift in number of samples (default: 160).

       -delwin frame
              Delta  window  size  in number of samples (default:
              2).

       -lofreq frequency
              Enable band-limiting for MFCC  filterbank  computa-
              tion: set lower frequency cut-off.
              (default: -1 = disabled)

       -hifreq frequency
              Enable  band-limiting  for MFCC filterbank computa-
              tion: set upper frequency cut-off.
              (default: -1 = disabled)

       -sscalc
              Perform spectral subtraction  using  head  part  of
              each  file.   With this option, Julius assume there
              are certain length of silence at each  input  file.
              Valid   only  for  rawfile  input.   Conflict  with
              "-ssload".

       -sscalclen
              With "-sscalc", specify the  length  of  head  part
              silence in milliseconds (default: 300)

       -ssload filename
              Perform spectral subtraction for speech input using
              pre-estimated noise spectrum from file.  The  noise
              spectrum  data  should  be  computed  beforehand by
              mkss.  Valid for all speech input.   Conflict  with
              "-sscalc".

       -ssalpha value
              Alpha  coefficient  of spectral subtraction.  Noise
              will be subtracted  stronger  as  this  value  gets
              larger, but distortion of the resulting signal also
              becomes remarkable.  (default: 2.0)

       -ssfloor value
              Flooring coefficient of spectral subtraction.   The
              spectral  parameters  that go under zero after sub-
              traction will be substituted by the  source  signal
              with this coefficient multiplied. (default: 0.5)

   Language Model (Finite State Grammar)



       -dfa dfa_filename
              Finite state automaton grammar file. (required)

       -penalty1 float
              Word   insertion   penalty   for  the  first  pass.
              (default: 0.0)

       -penalty2 float
              Word  insertion  penalty  for  the   second   pass.
              (default: 0.0)

   Word Dictionary
       -v dictionary_file
              Word dictionary file (required)

       -spmodel {WORD|WORD[OUTSYM]|#num}
              Name  of  short  pause  model  as  defined  in  the
              hmmdefs.  In Julian,  a  word  whose  pronunciation
              consists  of  only this short pause model is called
              'short  pause  word',  and  handled  especially  in
              recognition:  even  if its appearance in a sentence
              is explicitly specified in the grammar, it  can  be
              skipped  while parsing.  This behavior is for deal-
              ing with insertion and deletion of short pause that
              often  appear  unintensionally  in user utterances.
              They can be specified in a  style  as  shown  below
              (default: "sp").

                                       Example
           Word_name                     <s>
           Word_name[output_symbol]   <s>[silB]
           #Word_ID                      #14

            (Word_ID is the word position in the dictionary
             file starting from 0)

       -forcedict
              Ignore  dictionary errors and force running.  Words
              with errors will  be  dropped  from  dictionary  at
              startup.

   Acoustic Model (HMM)
       -h hmmfilename
              HMM definition file to use. (required)

       -hlist HMMlistfilename
              HMMList  file to use.  Required when using triphone
              based HMMs.  This file provides a  mapping  between
              the  logical  triphones  names  genertated from the
              phonetic representation in the dictionary  and  the
              HMM definition names.

       -iwcd1 {max|avg}
              When  using a triphone model, select method to han-
              dle inter-word triphone context on  the  first  and
              last phone of a word in the first pass.

              max: use maximum likelihood of the same
                   context triphones
              avg: use average likelihood of the same
                   context triphones (default)

       -force_ccd / -no_ccd
              Normally  Julius  determines  whether the specified
              acoustic model is a  context-dependent  model  from
              the model names, i.e., whether the model names con-
              tain character '+' and  '-'.   You  can  explicitly
              specify  by  these  options to avoid mis-detection.
              These will override the automatic detection result.

       -notypecheck
              Disable  checking  of  the  input  parameter  type.
              (default: enabled)

   Acoustic Computation
       Gaussian Pruning will be automatically enabled when  using
       tied-mixture  based  acoutic  model.   It  is  disabled by
       default for non tied-mixture models, but you can  activate
       pruning   to   those   models   by  explicitly  specifying
       "-gprune".  Gaussian Selection  needs  a  monophone  model
       converted by mkgshmm.

       -gprune {safe|heuristic|beam|none}
              Set the Gaussian pruning technique to use.
              (default:     'safe'    (setup=standard),    'beam'
              (setup=fast) for tied mixture model, 'none' for non
              tied-mixture model)

       -tmix K
              With  Gaussian Pruning, specify the number of Gaus-
              sians to compute per mixture codebook. Small  value
              will  speed  up  computation,  but likelihood error
              will grow larger. (default: 2)

       -gshmm hmmdefs
              Specify monophone hmmdefs to use for Gaussian  Mix-
              ture  Selectio.   Monophone model for GMS is gener-
              ated from an ordinary  monophone  HMM  model  using
              mkgshmm.   This  option is disabled by default. (no
              GMS applied)

       -gsnum N
              When using GMS, specify number of  monophone  state
              to  select  from  whole monophone states. (default:
              24)

   Inter-word Short Pause Handling
       -iwsp  (Multi-path version only)  Enable  inter-word  con-
              text-free   short   pause  handling.   This  option
              appends a skippable short  pause  model  for  every
              word  end.   The  added  model  will  be skipped on
              inter-word context handling.  The HMM model  to  be
              appended can be specified by "-spmodel" option.

   Search Parameters (First Pass)
       -b beamwidth
              Beam width (number of HMM nodes) on the first pass.
              This value defines search width on  the  1st  pass,
              and  has great effect on the total processing time.
              Smaller width will speed up the decoding,  but  too
              small  value  will result in a substantial increase
              of  recognition  errors  due  to  search   failure.
              Larger  value  will make the search stable and will
              lead to failure-free search,  but  processing  time
              and  memory  usage  will  grow in proportion to the
              width.

              default value: acoustic model dependent
                400 (monophone)
                800 (triphone,PTM)
               1000 (triphone,PTM, setup=v2.1)

       -1pass Only perform the first pass search.

       -realtime

       -norealtime
              Explicitly  specify  whether  real-time  (pipeline)
              processing  will  be done in the first pass or not.
              For file input, the default is  OFF  (-norealtime),
              for  microphone,  adinnet  and  NetAudio input, the
              default is ON (-realtime).  This option relates  to
              the  way  CMN  is performed: when OFF CMN is calcu-
              lated for each input independently, when the  real-
              time option is ON the previous 5 second of input is
              always used.  Also refer to -progout.

       -cmnsave filename
              Save last CMN parameters computed while recognition
              to  the  specified  file.   The  parameters will be
              saved to the file in each time a  input  is  recog-
              nized, so the output file always keeps the last CMN
              parameters.  If output file already exist, it  will
              be overridden.

       -cmnload filename
              Load  initial  CMN parameters previously saved in a
              file by "-cmnsave".  This option enables Julian  to
              recognize  the first utterance of a live microphone
              input or adinnet input with CMN.

   Search Parameters (Second Pass)
       -b2 hyponum
              Beam width (number of hypothesis) in  second  pass.
              If  the count of word expantion at a certain length
              of hypothesis  reaches  this  limit  while  search,
              shorter  hypotheses are not expanded further.  This
              prevents search to fall in breadth-first-like  sta-
              tus  stacking  on  the  same  position, and improve
              search failure.  (default: 30)

       -n candidatenum
              The search continues till 'candidate_num'  sentence
              hypotheses  have been found.  The obtained sentence
              hypotheses are sorted by score, and final result is
              displayed  in  the  order  (see  also the "-output"
              option).

              The possibility that the optimum hypothesis is cor-
              rectly   found   increases   as   this  value  gets
              increased, but the  processing  time  also  becomes
              longer.

              Default  value depends on the  engine setup on com-
              pilation time:
                10  (standard)
                 1  (fast, v2.1)

       -output N
              The top N sentence hypothesis will be Output at the
              end of search.  Use with "-n" option. (default: 1)

       -cmalpha float
              This  parameter  decides  smoothing  effect of word
              confidence measure.  (default: 0.05)

       -sb score
              Score envelope width for enveloped  scoring.   When
              calculating  hypothesis  score  for  each generated
              hypothesis, its trellis expansion and viterbi oper-
              ation will be pruned in the middle of the speech if
              score on a frame goes under [current maximum  score
              of the frame- width].  Giving small value makes the
              second  pass  faster,  but  computation  error  may
              occur.  (default: 80.0)

       -s stack_size
              The maximum number of hypothesis that can be stored
              on the stack during the search.  A larger value may
              give  more stable results, but increases the amount
              of memory required. (default: 500)

       -m overflow_pop_times
              Number of expanded hypotheses required  to  discon-
              tinue  the  search.   If  the  number  of  expanded
              hypotheses is greater then this threshold then, the
              search  is  discontinued at that point.  The larger
              this value is, The longer Julius gets  to  give  up
              search (default: 2000)

       -lookuprange nframe
              When  performing word expansion on the second pass,
              this option sets the number of  frames  before  and
              after  to  look up next word hypotheses in the word
              trellis.   This  prevents  the  omission  of  short
              words,  but  with  a  large  value,  the  number of
              expanded hypotheses increases  and  system  becomes
              slow. (default: 5)

       -looktrellis
              Expand  only  the  words survived on the first pass
              instead of expanding all  the  words  predicted  by
              grammar.   This  option  makes second pass decoding
              slightly faster  especially  for  large  vocabulary
              condition, but may increase deletion error of short
              words. (default: disabled)

   Forced Alignment
       -walign
              Do viterbi alignment per word units from the recog-
              nition  result.   The  word boundary frames and the
              average acoustic scores per frame are calculated.

       -palign
              Do viterbi alignment per phoneme (model) units from
              the   recognition  result.   The  phoneme  boundary
              frames and the average acoustic  scores  per  frame
              are calculated.

       -salign
              Do  viterbi alignment per HMM state from the recog-
              nition result.  The state boundary frames  and  the
              average acoustic scores per frame are calculated.

   Server Module Mode
       -module [port]
              Run Julian on "Server Module Mode".  After startup,
              Julian waits for  tcp/ip  connection  from  client.
              Once connection is established, Julian start commu-
              nication with the client to process  incoming  com-
              mands  from  the  client,  or to output recognition
              results, input trigger information and other system
              status  to  the  client.  The multi-grammar mode is
              only supported at this  Server  Module  Mode.   The
              default  port  number is 10500.  jcontrol is sample
              client contained in this package.

       -outcode [W][L][P][S][C][w][l][p][s]
              (Only for Server Module Mode) Switch which  symbols
              of  recognized words to be sent to client.  Specify
              'W' for output symbol, 'L' for grammar  entry,  'P'
              for  phoneme  sequence,  'S' for score, and 'C' for
              confidence score,  respectively.   Capital  letters
              are  for  the second pass (final result), and small
              letters are for results of  the  first  pass.   For
              example,  if  you want to send only the output sym-
              bols and phone sequences as a recognition result to
              a client, specify "-outcode WP".

   Message Output
       -quiet Omit  phoneme  sequence  and score, only output the
              best word sequence hypothesis.

       -progout
              Enable progressive output of the partial results on
              the first pass.

       -proginterval msec
              set  the output time interval of "-progout" in mil-
              liseconds.

       -demo  Equivalent to "-progout -quiet"

   OTHERS
       -debug (For debug) output  enoumous  internal  status  and
              debug information.

       -C jconffile
              Load  the  jconf  file.  The options written in the
              file are included and expanded at the point.   This
              option can also be used within other jconf file.

       -check wchmm
              (For  debug) turn on interactive check mode of tree
              lexicon structure at startup.

       -check triphone
              (For debug) turn on interactive check mode of model
              mapping between Acoustic model, HMMList and dictio-
              nary at startup.

       -setting
              Display compile-time engine configuration and exit.

       -help  Display a brief description of all options.

EXAMPLES
       For  examples  of system usage, refer to the tutorial sec-
       tion in the Julian documents.

NOTICE
       Note about jconf files: relative paths in a jconf file are
       interpreted  as  relative to the jconf file itself, not to
       the current directory.

SEE ALSO
       julius(1), jcontrol(1), adinrec(1), adintool(1), mkdfa(1),
       mkgsmm(1), wav2mfcc(1), mkss(1)

       http://julius.sourceforge.jp/

DIAGNOSTICS
       Julian  normally  will  return  the  exit status 0.  If an
       error occurs, Julian exits abnormally with exit status  1.
       If  an  input file cannot be found or cannot be loaded for
       some reason then Julian  will  skip  processing  for  that
       file.

BUGS
       There  are  some  restrictions to the type and size of the
       models Julian can use.  For a detailed  explanation  refer
       to  the  Julius  documentation.  For bug-reports, inquires
       and comments please contact  julius@kuis.kyoto-u.ac.jp  or
       julius@is.aist-nara.ac.jp.

COPYRIGHT
       Copyright (c) 1991-2003 Kyoto University, Japan
       Copyright  (c)  2000-2003  Nara  Institute  of Science and
       Technology, Japan

AUTHORS
       Rev.1.0 (1998/07/20)
              Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto
              University)

       Rev.2.0 (1999/02/20)

       Rev.2.1 (1999/04/20)

       Rev.2.2 (1999/10/04)

       Rev.3.1 (2000/05/11)
              Development of above versions by Akinobu LEE (Kyoto
              University)

       Rev.3.2 (2001/08/15)

       Rev.3.3 (2002/09/11)

       Rev.3.4 (2003/10/01)
              Development of above versions by Akinobu LEE  (Nara
              Institute of Science and Technology)

THANKS TO
       From  rev.3.2, Julian is released in the "Information Pro-
       cessing Society, Continuous Speech Consortium".

       The Windows Microsoft Speech API  compatible  version  was
       developed by Takashi SUMIYOSHI (Kyoto University).



                              LOCAL                     JULIAN(1)
