2014年11月3日星期一

关于安装moses提示boost::iostreams的错误信息

1) 如果提示找不到类库文件/usr/bin/ld: cannot find -lboost_iostreams
重新安装boost, 确保在/usr/local/lib/ 目录(或自定义的boost安装目录)下存在 libboost_iostreams.*等文件.


2) 如果提示类似下面的错误
/usr/include/boost/iostreams/filter/zlib.hpp:345: undefined reference to `boost::iostreams::detail::zlib_base::before(char const*&, char const*, char*&, char*)'
/usr/include/boost/iostreams/filter/zlib.hpp:346: undefined reference to `boost::iostreams::zlib::no_flush'
/usr/include/boost/iostreams/filter/zlib.hpp:346: undefined reference to `boost::iostreams::detail::zlib_base::xdeflate(int)'
/usr/include/boost/iostreams/filter/zlib.hpp:347: undefined reference to `boost::iostreams::detail::zlib_base::after(char const*&, char*&, bool)'

查看安装boost时zlib设置是否正确, 如果安装boost (如运行./b2 install)时, 出现下面的提示,

    - zlib                     : no  (cached)
说明找不到zlib. 解决的办法是从http://www.zlib.net/下载zlib (如zlib-1.2.8.tar.gz), 并且解压.
在命令行下设置zlib路径 
 export ZLIB_SOURCE=/........./zlib-1.2.8
重新运行安装boost命令

再重新运行安装moses命令

2014年5月23日星期五

ctb8.0 及LDC2012E109_BOLT_Phase_1_Chinese_Treebank_DF_Part_1 标注bug

1) 
LDC2012E109_BOLT_Phase_1_Chinese_Treebank_DF_Part_1/data/bolt-cmn-NG-21-181152-105489.cmn.su.fid
( (IP (NP-SBJ (NN 面子) )
      (VP (VP (VC 是)
              (NP-PRD (CP (WHNP-1 (-NONE- *OP*))
                          (CP (IP (NP-SBJ (PN 自己) )
                                  (VP (VV 挣)
                                      (NP-OBJ (-NONE- *T*-1))) )
                              (DEC 的) ))) )
          (PU ,)
          (VP (ADVP (AD 不) )
              (VP (VC 是)
                  (NP-PRD (CP (WHNP-2 (-NONE- *OP*))
                              (CP (IP (NP-SBJ-3 (-NONE- *T*-2))
                                      (VP (PP-LGS (P 靠)
                                                  (NP (NP (PN 别人) )
                                                      (NP (NN 施舍) ) ) )
                                          (VP (VV 给)
                                              (NP-OBJ (-NONE- *)-3)) ) )  |||  --> (NP-OBJ (-NONE- *T*-3)) ) ) )
                                  (DEC 的) ))) ) ) ) ) )


2) LDC2012E109_BOLT_Phase_1_Chinese_Treebank_DF_Part_1/data/bolt-cmn-NG-21-181152-105489.cmn.su.fid
( (IP (IP-ADV (NP-SBJ (-NONE- *pro*))
              (VP (ADVP (AD 或) )
                  (ADVP (AD 正) )
                  (VP (VC 是)
                      (PP-PRD (P 由于)
                              (NP (PN 此) ) ) ) ) )
      (PU ,)
      (PP-LOC (P 在)
              (LCP (NP (DP (DT 这)
                           (CLP (M 个) ) )
                       (NP (NN 讨论组) ) )
                   (LC 中) ))
      (PU ,)
      (NP-SBJ (DP (DT 其他) )
              (NP (NN 与会者) ) )
      (VP (ADVP (AD 也) )
          (VP (VV 认为)
              (IP-OBJ (PP-LOC (P 在)
                              (LCP (NP-TMP (NT 当前)
                                           (NN 危机) )
                                   (LC 中) ) )
                      (NP-SBJ (-NONE- *pro*))
                      (VP (VV 存在)
                          (NP-OBJ (CP (WHNP-1 (-NONE- *OP*))
                                      (CP (IP (NP-SBJ-2 (-NONE- *T*-1))
                                              (VP (VV 可以)
                                                  (VP (SB 被)
                                                      (VP (VV 利用)
                                                          (NP-OBJ (-NONE- *)-2)) ) ) ) ||| --> (NP-OBJ (-NONE- *T*-2)) ) ) ) )
                                          (DEC 的) ))
                                  (NP (NN 机会) ) ) ) ) ) )
      (PU 。) ) )

3) ctb8.0/data/bracketed/chtb_3095.bn  (same if it's ctb7.0)

( (IP (NP-TPC (CP ((WHNP-4 (-NONE- *OP*))    ||| --> ( (IP (NP-TPC (CP (WHNP-4 (-NONE- *OP*))
                  CP (IP (NP-SBJ (NN 地方)   ||| --> (CP (IP (NP-SBJ (NN 地方)
                                  (NN 当局))
                          (VP (ADVP (AD 正在))
                              (VP (VV 组织)
                                  (NP-OBJ (-NONE- *T*-4)))))
                      (DEC 的)))
              (ADJP (JJ 紧急))
              (NP (NN 求援)))
      (PU ,)
      (ADVP (AD 但是))
      (PP-MNR (P 据)
              (NP (NN 报道)))
      (PU ,)
      (NP-SBJ (CP (WHNP-1 (-NONE- *OP*))
                  (CP (IP (NP-SBJ (-NONE- *T*-1))
                          (VP (QP-ADV (CD 1多))
                              (VP (VA 深))))
                      (DEC 的)))
              (NP (NN 大雪)))
      (VP (PP-BNF (P 给)
                  (NP (NN 求援)
                      (NN 工作)))
          (VP (VV 造成)
              (AS 了)
              (NP-OBJ (CP (WHNP-2 (-NONE- *OP*))
                          (CP (IP (NP-SBJ (-NONE- *T*-2))
                                  (VP (ADVP (AD 极))
                                      (VP (VA 大))))
                              (DEC 的)))
                      (NP (NN 困难)))))
      (PU 。)))

2014年4月11日星期五

(Linux Terminal) Comment overwirtes itself if it is too long

Run the command b4 typing the long one

shopt -s checkwinsize
 
 

2014年3月17日星期一

自动生成make file


1) 文件结构
SynPhrase包括两个文件夹:
a) synphrase/
collapseverb.cpp  collapseverb.h  expandtemplate.txt  expandverb.cpp  expandverb.h  main.cpp  nohup.out  train.cpp  train.h                                  

b) utility/
alignment.h  morphor.cpp  morphor.h  postagger.h  srl_sentence.h  tree.h  tsuruoka_maxent.h  utility.cpp  utility.h
lapos-0.1.2/
common.h  crf.cpp  crf.h  crfpos.cpp  lookahead.cpp  strdic.h  tokenize.cpp
maxent-3.0/
lbfgs.cpp  lbfgs.h  libtsuruoka_maxent.a  mathvec.h  maxent.cpp  maxent.h  owlqn.cpp  sgd.cpp

2)  运行autoscan
生成configure.scan, 重命名为configure.in, 里面的内容如下
-----------------------------------------------------------------------------------------------------------
#                                               -*- Autoconf -*-
# Process this file with autoconf to produce a configure script.

AC_PREREQ([2.63])
AC_INIT([FULL-PACKAGE-NAME], [VERSION], [BUG-REPORT-ADDRESS])
AC_CONFIG_SRCDIR([utility/utility.cpp])
AC_CONFIG_HEADERS([config.h])

# Checks for programs.
AC_PROG_CXX
AC_PROG_CC

# Checks for libraries.

# Checks for header files.
AC_CHECK_HEADERS([stdlib.h string.h sys/time.h])

# Checks for typedefs, structures, and compiler characteristics.
AC_HEADER_STDBOOL
AC_C_INLINE
AC_TYPE_SIZE_T

# Checks for library functions.
AC_FUNC_MALLOC
AC_FUNC_REALLOC
AC_CHECK_FUNCS([pow sqrt strchr strrchr strstr])

AC_OUTPUT
-----------------------------------------------------------------------------------------------------------
修改为
-----------------------------------------------------------------------------------------------------------
#                                               -*- Autoconf -*-
# Process this file with autoconf to produce a configure script.

AC_PREREQ([2.63])
AC_INIT(syntheticphrase, 1.0)
AM_INIT_AUTOMAKE
AC_CONFIG_SRCDIR([synphrase/main.cpp])
AC_CONFIG_HEADERS([config.h])

# Checks for programs.
AC_PROG_CXX
AC_PROG_CC
AC_PROG_LIBTOOL
AC_PROG_RANLIB

# Checks for libraries.

# Checks for header files.
AC_CHECK_HEADERS([stdlib.h string.h sys/time.h])

# Checks for typedefs, structures, and compiler characteristics.
AC_HEADER_STDBOOL
AC_C_INLINE
AC_TYPE_SIZE_T

# Checks for library functions.
AC_FUNC_MALLOC
AC_FUNC_REALLOC
AC_CHECK_FUNCS([pow sqrt strchr strrchr strstr])

# core stuff
AC_CONFIG_FILES([Makefile])
AC_CONFIG_FILES([utility/maxent-3.0/Makefile])
AC_CONFIG_FILES([utility/lapos-0.1.2/Makefile])
AC_CONFIG_FILES([utility/Makefile])
AC_CONFIG_FILES([synphrase/Makefile])

AC_OUTPUT
-----------------------------------------------------------------------------------------------------------
主要包括以下几个:
AM_INIT_AUTOMAKE
AC_CONFIG_SRCDIR([synphrase/main.cpp])
其中synphrase/main.cpp是主程序app

AC_CONFIG_FILES([Makefile])
AC_CONFIG_FILES([utility/maxent-3.0/Makefile])
AC_CONFIG_FILES([utility/lapos-0.1.2/Makefile])
AC_CONFIG_FILES([utility/Makefile])
AC_CONFIG_FILES([synphrase/Makefile])
为每个子目录指定Makefile

3) 为每个目录添加Makefile.am文件
utility/lapos-0.1.2/Makefile.am
--------------------------------------------------
noinst_LIBRARIES = liblapos.a

liblapos_a_SOURCES = \
  common.h \
  crf.h \
  strdic.h \
  tokenize.cpp \
  lookahead.cpp \
  crfpos.cpp
 
AM_CPPFLAGS = -W -Wall
---------------------------------------------------

utility/maxent-3.0/Makefile.am
---------------------------------------------------------------------
noinst_LIBRARIES = libtsuruoka_maxent.a

libtsuruoka_maxent_a_SOURCES = \
  lbfgs.cpp \
  maxent.cpp \
  owlqn.cpp \
  sgd.cpp
 
AM_CPPFLAGS = -W -Wall
-----------------------------------------------------------------------

utility/maxent-3.0/Makefile.am
-----------------------------------------------------------------------
noinst_LIBRARIES = libtsuruoka_maxent.a

libtsuruoka_maxent_a_SOURCES = \
  lbfgs.cpp \
  maxent.cpp \
  owlqn.cpp \
  sgd.cpp
 
AM_CPPFLAGS = -W -Wall

-bash-4.1$ cat utility/Makefile.am
noinst_LIBRARIES = libutility.a

libutility_a_SOURCES = \
  alignment.h \
  srl_sentence.h \
  tree.h \
  utility.h \
  postagger.h \
  morphor.h \
  tsuruoka_maxent.h \
  argument_reorder_model.h \
  utility.cpp \
  morphor.cpp
 
AM_CPPFLAGS = -W -Wall -I$(top_srcdir)/utility/lapos-0.1.2 -I$(top_srcdir)/utility/maxent-3.0
-----------------------------------------------------------------------

synphrase/Makefile.am
-----------------------------------------------------------------------
bin_PROGRAMS = train

train_SOURCES = main.cpp
train_LDADD = libsynphrase.a ../utility/libutility.a ../utility/maxent-3.0/libtsuruoka_maxent.a ../utility/lapos-0.1.2/liblapos.a -lz

noinst_LIBRARIES = libsynphrase.a

libsynphrase_a_SOURCES = \
  collapseverb.h \
  expandverb.h \
  train.h \
  collapseverb.cpp \
  expandverb.cpp \
  train.cpp
 
AM_CPPFLAGS = -W -Wall -I$(top_srcdir) -I$(top_srcdir)/utility
-----------------------------------------------------------------------

Makefile.am
-----------------------------------------------------------------------
SUBDIRS = \
  utility/maxent-3.0 \
  utility/lapos-0.1.2 \
  utility \
  synphrase

UTOMAKE_OPTIONS = foreign
ACLOCAL_AMFLAGS = -I m4
AM_CPPFLAGS = -D_GLIBCXX_PARALLEL -march=native -mtune=native -O2 -pipe -fomit-frame-pointer -Wall
-----------------------------------------------------------------------

4) 运行 autoreconf
 运行前需要运行mkdir m4 && libtoolize && automake --add-missing
(运行automake --add-missing会提示NEWS, README等文件不存在, 先运行touch NEWS README AUTHORS ChangeLog)

5) 运行./configure

6) 运行make

这时在synphrase目录下生成可执行文件train






2014年1月29日星期三

Running Thrax in Joshua on Hadoop 2.0 Installation.

Joshua 5.0中提供的thrax.jar ($JOSHUA/thrax/bin/thrax.jar)是在hadoop 2.0以前版本编译的. 直接放在hadoop 2.0上跑会出现类似下面的错误:
----------------------------------------------------------------------------------------
2014-01-29 10:52:12,107 FATAL [main] org.apache.hadoop.mapred.
YarnChild: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
    at edu.jhu.thrax.hadoop.features.
WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:56)
    at edu.jhu.thrax.hadoop.features.
WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:28)
    at org.apache.hadoop.mapreduce.
Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.
MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.
MapTask.run(MapTask.java:338)
    at org.apache.hadoop.mapred.
YarnChild$2.run(YarnChild.java:157)
    at java.security.
AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.
doAs(Subject.java:415)
    at org.apache.hadoop.security.
UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.
YarnChild.main(YarnChild.java:152)

2014-01-29 10:52:12,213 INFO [main] org.apache.hadoop.metrics2.
impl.MetricsSystemImpl: Stopping MapTask metrics system...
2014-01-29 10:52:12,214 INFO [main] org.apache.hadoop.metrics2.
impl.MetricsSystemImpl: MapTask metrics system stopped.
2014-01-29 10:52:12,214 INFO [main] org.apache.hadoop.metrics2.
impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
----------------------------------------------------------------------------------------   
 
 
解决的办法是重新下载thrax, 然后在hadoop2.0下编译, 用新生成的thrax.jar文件去替换原来的thrax.jar.

(参见https://github.com/jweese/thrax/wiki/Quickstart)
1) 下载thrax
git clone https://github.com/jweese/thrax.git
 
2) 编译
ant
 

需要修改build.xml (以下是做过修改的地方). 做修改后, 不需要设置环境变量, 但是要把JAVA_HOME的路径设置正确(/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64)
 
提供正确的jar files.  

 
删除检测是否设置HADOOP, HADOOP_VERSION环境变量 (这两个环境变量的设置主要是为了能够找到合适的jar文件) 


删除对init-amazon的依赖. 这样也不用去设置AWS_SDK环境变量

3) ant中会出现关于amazon的几个错误的解决办法
删除文件src/edu/jhu/thrax/util/amazon/AmazonConfigFileLoader.java
修改文件src/edu/jhu/thrax/util/ConfFileParser.java 
(删除import edu.jhu.thrax.util.amazon.AmazonConfigFileLoader; 
 修改scanner = new Scanner(AmazonConfigFileLoader.getConfigStream(configURI));为scanner = new Scanner(DefaultConfigFileLoader.getConfigStream(configURI));)