2012年12月27日星期四

import cdec into eclipse

Building Cdec:

(http://www.cdec-decoder.org/guide/compiling.html)
1) Download cdec

git clone git://github.com/redpony/cdec.git


2) Install third-party software, including:
autoconf/automake
libtool
boost
python

3) Compiling cdec

cd cdec
autoreconf -ifv
./configure [--with-boost=/path/to/boost-install]
make
./tests/run-system-tests.pl

Error running make:

fast_align.cc:216:   instantiated from here
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/hashtable:863: error: ‘Internal::hash_code_base::m_h1’ has incomplete type

Solution:
word-aligner/fast_align.cc: line 216:

unordered_map,unsigned>::iterator it = size_counts.begin();   ==>
unordered_map,unsigned,boost::hash > >::iterator it = size_counts.begin();

4) ....

Importing cdec project into eclipse
1) Create a C++ project cdec_eclipse
2) Copy folders decoder/, utils/, and kenlm/ into cdec_eclipse, and config.h
3) Compiling cdec_eclipse will get such errors of "...h" not found
    3.1) Modify source files as to include .h files appropriately.
    3.2) In terms  of error KENLM_MAX_ORDER in klm/lm/left.hh not defined,
            Properties --> C/C++ Build --> Settings --> Cross GCC Compiler (Cross G++ Compiler) --> Symbols --> Define symbols (-D), add KENLM_MAX_ORDER=6.
    3.3) Properties --> C/C++ Build --> Settings --> Cross GCC Compiler (Cross G++ Compiler) --> Includes --> Include Paths (-I), add /..../decoder, /.../mteval, /.../utils, /.../klm,
4) Exclude decoder/cfg_test.cc from build

5) Add #include "hg.h" in decoder/tromble_loss.h

6) Add -lz in Properties --> C/C++ Build --> Settings --> Cross G++ linker
    (-lz is always needed if zlib is included)
7) Add boost libraries in Properties --> C/C++ Build --> Settings --> Cross G++ linker --> Libraries
    boost libraries should be present in /usr/local/lib. The lib files are in format of libboost***.a (e.g., libboost_date_time.a). To link these libraries, add boost**** (e.g., boost_data_time, NOT the full name) in eclipse.
    Here I added boost_iostreams, boost_system boost_unit_test_framework, boost_context, boost_filesystem, boost_program_options.

8) For errors like 'duplicate symbol _main in:' modify related source files and comments main functions.

9) For errors like duplicate symbol init_unit_test_suite(int, char**), find related source files (mostly are in the format **_test.cc) and exclude them from building.

10) For errors like duplicate symbol InitCommandLine, find related source files and use another function name.

*Note* If boost is installed in somewhere else, add "/path/to/boost" (for example /home/.../boost_1_53_0/include) in Properties --> C/C++ Build --> Settings --> Cross GCC Compiler (Cross G++ Compiler) --> Includes --> Include Paths (-I), and add /path/to/boost/lib in Properties --> C/C++ Build --> Settings --> Cross G++ linker --> Library search path (-L)

When run or debug the programme, for errors like 'error while loading shared libraries: libboost_context.so.1.53.0': a solution is to set LD_LIBRARY_PATH to include the directory where the boost thread library is. in Run --> Debug Configurations... --> Environment --> New, and add LD_LIBRARY_PATH for name and /home/.../boost_1_53_0/lib:$LD_LIBRARY_PATH for value.


Debugging cdec in eclipse
1) For error "Program is not a recognized executable." which might be due to the wrong binary parsers used. Add the correct binary parser under Project --> Properties --> C/C++ Build --> Settings --> Binary Parsers to solves he problem. (Dunno which binary parsers are correct ones. Try one by one).

2012年9月17日星期一

1) 不能包含|字符
2) 不能包含 [word] 式单词
3) 不能包含两个或以上连续的空格

以下句子句法分析有问题, 删除:

但 正 如 律政司 在 立法会 文件 CB ( 2 ) 8602 -03 ( 02 ) 号 中正 确地 指出 , “  国家 行为 o 邕 点 与 咨询 文件 所 建议 的 禁制 机制 无关镇 , 尽管 法律 政策 专员 在 首 次 联席 会议 上确 曾 说 过 , 建议 的 证明书 做法 与 第19 条 相似 。 (1459342)

2012年9月6日星期四

学习使用perl

1) 命令行参数
与C/C++类似, 命令行上的参数是存储在内建数组@ARGV中, $ARGV[0]第一个参数, 依此类推. 但与C/C++不同的是$ARGV不包括程序名称, 即$ARGV[0]是用户输入的第一个参数. 程序名称存在$0变量中.

命令行参数的个数可按$#ARGV+1获取

2) 函数参数
函数的参数是存储在数组@_中, $_[0]第一个参数, 依此类推.
函数参数的个数可按$#_+1获取

还可以按shift, pop来从左或从右获取参数, 获取的参数将自动从数组中删除. 例如:
my $para1 = shift;
my $paran = pop;
将获取第一个和最后一个参数, 此时@_数组中只保留了除两者外的其余参数

3) 读写文件
open( MYFILE, "/home/...." ); 打开成功时返回非零值, 否则返回零. 文件名可以使用相对或绝对路径名.  MYFILE是文件句柄. 如果是重写(或追加)文件, 使用>(或>>)标记, 例如open(MYFILE, ">/home/...");

在打开文件时通常添加个die语句, open( MYFILE, "/home/.." ) || die( "Can't open the file!\n" );

关闭打开的文件使用 close( MYFILE );

读文件:
$line = ;从文件中读取一行并储存至变量$line中, 此时文件指针指向下一行. 也可以使用@array = 把文件的所有行读到@array数组中. 注意, 数组的每项会包括最末的换行符.

例如:
while( $line = <MYFILE>)
{
    printf( $line );
}

写文件:
print MYFILE ("Hello.\n");

4) 一个统计单词个数并输出频率最高的单词 (感谢二师兄提供)

#!/usr/bin/perl
#
#
#
use warnings;
use strict;


(scalar(@ARGV) == 3) or die "perl find-top-k-words.pl input-file output-file k-value\n";

my $Source = shift;
my $Target = shift;
my $Top_K = shift;

my ($FIN, $FOUT);
my @ItemList;
my %HashSet;

open $FIN, "<$Source"or die "open file $Source failed\n";
while(<$FIN>)
{
    chomp;
    @ItemList = split /\s+/;
    for(my $i=0; $i<scalar(@ItemList); $i++)
    {
        $HashSet{$ItemList[$i]} += 1;
    }
}
close($FIN);

my $index = 0;
open $FOUT, ">$Target" or die "open file $Target failed\n";
foreach my $key (sort SortByHashValue (keys (%HashSet)))
{
    if($index < $Top_K)
    {
        print $FOUT "$key $HashSet{$key}\n";
    }
    $index += 1;

}

close($FOUT);

sub SortByHashValue
{
    $HashSet{$b} <=> $HashSet{$a};
}

5) 读写gz文件

#!/usr/bin/perl -w
use utf8;
use strict;

my $final_file = "combine.data";
my $gzip_fh;

open ($gzip_fh, "| /bin/gzip -c > $final_file.gz") or die "error starting gzip";

binmode(STDOUT,":utf8");
my $lc = 0;
for my $file (@ARGV) {
  my $fh;
  if ($file =~ /\.gz$/) {
    open $fh, "zcat $file|" or die;
  } else {
    open $fh, "<$file" or die;
  }
  binmode $fh, ":utf8";
  while(<$fh>) {
    $lc++;
    #exit 0 if ($lc > 100);
    print $gzip_fh "$_";
  }
  close $fh;
}
close $gzip_fh;


6) 文件数组.
my @f_out_array;

my $f_out;
fopen f_out, ">test.txt";
push(@f_out_array, f_out);

print $f_out_array[0] "hello world\n";

##以上print语句编译出错, 改为:
$f_out = $f_out_array[0];
print $f_out "hello world\n";



2012年6月22日星期五

安装cdec

http://cdec-decoder.org/index.php?title=Building_cdec_from_source

1) 安装git
sudo apt-get install git-core

2) git clone git://github.com/redpony/cdec.git
Connection timed out
改用 git clone https://github.com/redpony/cdec.git

3) 进入cdec目录
autoreconf -ifv

./configure --with-boost=/usr/local/boost_1_48_0/ --with-cmph=/home/ldd/toolkit/cmph-1.1-cdyer/cmph-1.1 --with-eigen=/usr/local/include/eigen3
(执行此步前安装boost, libcmph-1.1Eigen)

出错:

checking for flex... no
checking for lex... no
configure: error: No lex (Flex, lex, etc.) program found
安装flex,  sudo apt-get install flex


make


mac下需要安装autoconf, automake, cmake, libtool

最后编译时提示:
configure: WARNING: unrecognized options: --with-eigen
是不是不用去安装eigen??
默认安装boost时, 路径为--with-boost=/usr/local/include/, 而不是--with-boost=/usr/local/include/boost
在./configure时, 不设置--enable-mpi, 否则会提示新的错误ld: library not found for -lboost_mpi

2012年3月15日星期四

Latex的使用

1) Latex 从小到大设置字体大小
\tiny
\scriptsize
\footnotesize
\small
\normalsize
\large
\Large
\LARGE
\huge
\Huge

2) 缩放表格左右边距
\begin{tabular}{>{\rule{-6pt}{-6pt}}l>{\rule{-6pt}{-6pt}}l}


3) 公式中各种符号的写法
(公式中空格可用\,)
http://en.wikipedia.org/wiki/Help:Displaying_a_formula
http://zh.wikipedia.org/wiki/Help:数学公式

2012年2月27日星期一

安装gcc-3.3.3

Opensuse 安装gcc-3.3

1) 下载gcc-3.3.3.tar.gz

2) 解压
% tar xzvf gcc-3.3.3.tar.gz

3) 安装
% mkdir gcc-build #随便叫什么名字都行
% cd gcc-build
% /pathtogcc-3.3.3/configure --prefix=/opt/gcc3.3 #/opt/gcc3.3是自定义目录, 需要有相关权限
% make
make出错
错误1 read-rtl.c:653: error: lvalue required as increment operand
解决1 修改/pathtogcc-3.3.3/include/obstack.h
第426行 *((void **)__o->next_free)++ = ((void *)datum); \
修改为
*((void **)__o->next_free) = ((void *)datum); \
__o->next_free += sizeof(void *); \

错误2 lvalue required as left operand of assignment In file included from ../../gcc/cp/decl.c:15472
patch decl.c < gcc42-patch-gcc-3.3.1_gcc_cp_decl.c

错误3 ../../gcc/f/com.c:11079: error: conflicting types for ‘ffecom_gfrt_basictype’
../../gcc/f/com.h:236: error: previous declaration of ‘ffecom_gfrt_basictype’ was here
解决3 修改gcc/f/com.h:236 ffeinfoKindtype-->ffeinfoBasictype

错误4 ../../gcc/java/gjavah.c:49: error: static declaration of ‘flag_jni’ follows non-static declaration
../../gcc/java/java-tree.h:170: error: previous declaration of ‘flag_jni’ was here
解决4 修改gjavah.c第49, static int flag_jni = 0; --> int flag_jni = 0;

错误5 /usr/include/gnu/stubs.h:7:27: gnu/stubs-32.h: No such file or directory
解决5 configure时添加参数 /pathtogcc-3.3.3/configure --prefix=/opt/gcc3.3 --enable-threads=posix --disable-checking --disable-multilib --enable-languages=c,c++

% make install




2012年1月9日星期一

使用moses-chart时语料的预处理

1) 不能含有 [..] 单词
2) 不能含有 | 单词