Beekhuizen, B., Milic, S., Armstrong, B. C., & Stevenson, S.  (2018).  What Company Do Semantically Ambiguous Words Keep? Insights from Distributional Word Vectors.  Proceedings of the 40th Annual Conference of the Cognitive Science Society.  Mahwah, NH: Lawrence Erlbaum Associates.  

Download:

Author's self-archived version (.pdf)  (6 pages)

[external link pending]

Abstract

The diversity of a word’s contexts affects its acquisition and processing. Can differences between word types such as monosemes (unambiguous words), polysemes (multiple related senses), and homonyms (multiple unrelated meanings) be related to distributional properties of these words? We tested for traces of number and relatedness of meaning in vector representations by comparing the distance between words of each type and vector representations of various “contexts”: their dictionary definitions (an extreme disambiguating context), their use in film subtitles (a natural context), and their semantic neighbours in vector space (a vector-space-internal context). Whereas dictionary definitions reveal a three-way split between our word types, the other two contexts produced a two-way split between ambiguous and unambiguous words. These inconsistencies align with some discrepancies in behavioural studies and present a paradox regarding how models learn meaning relatedness despite natural contexts seemingly lacking such relatedness. We argue that viewing ambiguity as a continuum could resolve many of these issues.


Keywords: lexical/semantic ambiguity; homonymy; polysemy; vector space models; contextual diversity.


Copyright Notice (borrowed from David Plaut): The documents distributed here have been provided as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.