1) 在decoder/目录下新建ff_feature_name.h和ff_feature_name.cc文件.
在ff_feature_name.h中定义继承于FeatureFunction的类, 如何有需要, 定义继承于FactoryBase
2) 注册新定义的feature
在decoder/cdec_ff.cc中注册新feature
#include "ff_feature_name.h"
在void register_feature_functions()中添加:
ff_registry.Register("NewFeatureName", new NewFeatureFactory()); //如果定义了factory类
或者
ff_registry.Register("NewFeatureName", new FFFactory
3) 在decoder/Makefile.am 中添加新定义的类(文件) ff_feature_name.cc, 可参考ff_klm类的写法.
4) 重新编译..
添加一个新的feature, contextual free, 但是需要用state来保留些信息, 用于快速计算该feature值.
这个时候需要注意, apply_models.cc/IncorporateIntoPlusLMForest(...), 里面(*s2n)[item->state_]会根据state值,决定是否产生一个新的node,或者丢弃. 因此在这之前需要将state里面的内容关于新feature的部分给erase掉. 按以下:
1) 在ff_set.h/ModelSet类里添加
public:
FFState GetRealFFState(const FFState& state) const;
在该方法的实现中根据原state状态, 返回一个新的real_state, real_state里将新feature部分的state给删除掉了
2) 在apply_models.cc/IncorporateInputPlusLMForest(...)里:
Candidate*& o_item = (*s2n)[item->state_];
修改为
FFState real_state = models.GetRealFFState(item->state_);
Candidate*& o_item = (*s2n)[real_state];
修改为
assert(models.GetRealFFState(o_item->state_) == models.GetRealFFState(item->state_)); //sanity check!
附: GetRealFFState()实现
real_state.resize(state.size());
for (int i = 0; i < state.size(); i++) {
real_state[i] = state[i];
}
return;
assert(state.size() == state_size_);for (int i = 0; i < models_.size(); i++) {
if (models_[i]->name_ == string( "NewFeatureName" )) {
int start_pos = model_state_pos_[i];
int end_pos;
if (i == models_.size() - 1)
end_pos = state_size_;
else
end_pos = model_state_pos_[i + 1];for (int j = start_pos; j < end_pos; j++)
real_state[j] = 0;
}
}
}
FFState real_state;
GetRealFFState(state, real_state);
return real_state;
}
没有评论:
发表评论