Bag of Tricks for Efficient Text Classification, fastText
Unofficial PyTorch Implementation of "Bag of Tricks for Efficient Text Classification", 2016, A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov
1. fastText的介绍
文档地址:https://fasttext.cc/docs/en/support.html
fastText is a library for efficient learning of word representations and sentence classification.
fastText是一个单词表示学习和文本分类的库
优点:在标准的多核CPU上, 在10分钟之内能够训练10亿词级别语料库的词向量,能够在1分钟之内给30万多类别的50多万句子进行分
数据格式:分词后的句子+\t__label__+标签
fasttext_model.py
from fasttext import FastText
import numpy as np
def get_data_path(by_word=True,train=True):
if by_word:
return ./classify/data_by_word_train.txt if train else ./classify/data_by_word_test.txt