fastTextR is an R interface to the fastText library. It can be used to word representation learning (Bojanowski et al., 2016) and supervised text classification (Joulin et al., 2016). Particularly the advantage of fastText to other software is that, it was designed for biggish data.
The following example is based on the examples provided in the fastText library, the example shows how to use fastTextR for word representation. For more informations about word representations can be found at the fastText homepage.
library("fastTextR")
The training of these models can be quite time consuming therefore pre-trained models are a good option.
model <- ft_load("cc.en.300.bin")
ft_word_vectors(model, c("asparagus", "pidgey", "yellow"))[,1:5]
## [,1] [,2] [,3] [,4] [,5]
## asparagus 0.0292057190 -0.0114405714 -0.003201437 0.03087331 0.127229080
## pidgey 0.0452978685 0.0090015158 0.067562237 0.11123407 -0.008441916
## yellow 0.0007776691 -0.0001886144 0.001824494 0.03869999 0.036413591
ft_sentence_vectors(model, c("Poets have been mysteriously silent on the subject of cheese", "Who did not let the gorilla into the ballet"))[,1:5]
???
ft_nearest_neighbors(model, 'asparagus', k = 5L)
## aspargus broccolini artichokes asparagus. asparagas
## 0.7316202 0.6995656 0.6930545 0.6915916 0.6911229
ft_analogies(model, c("berlin", "germany", "france"))
## paris france. avignon montpellier paris.
## 0.6831182 0.6408537 0.6288283 0.6138449 0.6059716
## rennes london Paris. toulon montparnasse
## 0.5884554 0.5832924 0.5743204 0.5727922 0.5715630
[1] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
@article{bojanowski2016enriching,
title={Enriching Word Vectors with Subword Information},
author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
journal={arXiv preprint arXiv:1607.04606},
year={2016}
}
[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification
@article{joulin2016bag,
title={Bag of Tricks for Efficient Text Classification},
author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
journal={arXiv preprint arXiv:1607.01759},
year={2016}
}