Direction : Cécile Fabre & Alessandro Lenci
The use of distributional information extracted from corpora to compute semantic similarity between words has become a very common method in NLP. Its popularity is easily explained: it provides access to semantic content on the basis of an elementary principle, requiring no sources of knowledge other than corpus-derived information about word distribution in contexts. In recent years, distributional semantics based on vector space models has benefited from the availability of massive amounts of textual data and increased computational power, allowing for the application of these methods on a large scale.
Today, the field has reached maturity: many experiments have been carried out on several languages, several survey articles have recently helped to consolidate the concepts and procedures used for distributional computations, various distributional models and evaluation data are now available. Still, many issues remain open to have a better control on the application of the distributional methodology in computational semantics and to improve the understanding of the the types of information that is induced by these methods.
Much research effort has focused on optimization methods to handle massive corpora and on the adjustment of the many parameters that are likely to have impact on the quality and nature of semantic relations – such as similarity measures, types of distributional contexts, dimensionality reduction techniques, context weighting schemes, etc. A second important issue relates to the use of distributional semantic information for a large number of applications (information retrieval, summarization, textual entailment, etc.). Distributional features have been incorporated into a wide range of NLP tasks, such as named entity classification and paraphrasing. They are also used for the construction of lexical networks that enable the visualization of the semantic relations that hold between the words of a corpus. Finally, in the last few years, research has focused on combining distributional representations with other kinds of semantic representations, and on modeling semantic compositionality within a distributional framework, such that not only individual words but also larger phrases can be taken into account.
We wish that this special issue of the TAL journal on distributional semantics will reflect the current diversity of the field, regarding linguistic and computational issues. We welcome papers that focus on any of the aforementioned topics, and in particular:
- construction of distributional semantic models
- compositionality within a distributional framework
- induction of specific semantic relations
- use of distributional methods within NLP tasks
- optimization techniques for distributional computations
- visualization techniques for word spaces
- role of corpora in distributional semantics models
- “deep learning” and distributional semantics
- integrating distributional and non-distributional semantic information
- evaluation of distributional semantic models