j.bica.2015.04.007_759597398647

  • Upload
    saman

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

  • 8/18/2019 j.bica.2015.04.007_759597398647

    1/13

    RESEARCH ARTICLE

    Human-inspired semantic similarity between

    sentences

    J. Ignacio Serrano   a,*, M. Dolores del Castillo   a, Jesús Oliva   b, Rafael Raya   a

    a

    Group of Neural and Cognitive Engineering (gNeC), Centro de Automática y Robótica, Consejo Superior de InvestigacionesCientı́ ficas (CSIC), Spainb BBVA Data & Analytics, Spain

    Received 23 March 2015; received in revised form 10 April 2015; accepted 10 April 2015

    KEYWORDS

    Cognitive linguistics;Computational linguistics;Semantic similarity

    Abstract

    Following the Principle of Compositionality, the meaning of a complex expression is influenced,

    to some extent, not only by the meanings of its individual words, but also the structural way the

    words are assembled. Compositionality has been a central research issue for linguists and psy-

    cholinguists. However, it remains unclear how does syntax influence the meaning of a sentence.In this paper, we propose an interdisciplinary approach to better understand that relation. We

    present an empirical study that seeks for the different weights given by humans to different

    syntactic roles when computing semantic similarity. In order to test the validity of the hypothe-

    ses derived from the psychological study, we use a computational paradigm. We incorporate

    the results of that study to a psychologically plausible computational measure of semantic sim-

    ilarity. The results shown by this measure in terms of correlation with human judgments on a

    paraphrase recognition task confirm the different importance that humans give to different

    syntactic roles in the computation of semantic similarity. This results contrast with generative

    grammar theories but support neurolinguistic evidence.

    ª 2015 Elsevier B.V. All rights reserved.

    Introduction

    We, humans, are continuously assessing the similarity ofobjects in our daily life. As explained by  Goldstone (1994),when humans try to judge the similarity of visual scenes,we take into account the structure of the compared objectsand the different relationships between the different parts.

    http://dx.doi.org/10.1016/j.bica.2015.04.0072212-683X/ª  2015 Elsevier B.V. All rights reserved.

    *  Corresponding author.E-mail addresses:   [email protected]  (J.I. Serrano),   md.

    [email protected]  (M.D. del Castillo),  [email protected](J. Oliva), [email protected] (R. Raya).

    Biologically Inspired Cognitive Architectures (2015)  12, 121 – 133

    A v a i l a b l e a t   w w w . s c i e n c e di r e c t . c o m

    ScienceDirect 

    j o u r n a l h o m e p a g e :   w w w . e l s e v i e r . c o m / l o c a t e / b i c a

    http://dx.doi.org/10.1016/j.bica.2015.04.007mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.bica.2015.04.007http://dx.doi.org/10.1016/j.bica.2015.04.007http://dx.doi.org/10.1016/j.bica.2015.04.007http://dx.doi.org/10.1016/j.bica.2015.04.007http://www.sciencedirect.com/http://www.elsevier.com/locate/bicahttp://www.elsevier.com/locate/bicahttp://www.sciencedirect.com/http://dx.doi.org/10.1016/j.bica.2015.04.007http://dx.doi.org/10.1016/j.bica.2015.04.007mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.bica.2015.04.007http://crossmark.crossref.org/dialog/?doi=10.1016/j.bica.2015.04.007&domain=pdf

  • 8/18/2019 j.bica.2015.04.007_759597398647

    2/13

    So humans use the structural information in the comparisonof general objects, but could this conclusion be generalizedto language? To what extent do the different parts of thehierarchical structure of a sentence influence the globalmeaning? The relations between syntax and semantics havebeen studied for several years. In particular, compositional-ity has been largely studied since it was first proposed as thenotion that the meaning of an expression is determined by

    the meaning of its constituents and the way they are com-bined. This is clearly shown by sentences made up by thesame words but with very different semantic interpretationslike:  ‘‘The dog bit the man’’  and  ‘‘The man bit the dog’’.

    Despite the great amount of work about the Principle ofCompositionality and its different interpretations, it is stillnot clear the real influence of different syntactic roles onthe representation of meaning. See the following examples:

    (a) That movie made me cry quickly.That Movie made me cry slowly.

    (b) Than movie made me cry quickly.That movie made me laugh quickly.

    It seems clear that the two sentences in (a) are more

    semantically similar than the ones in (b). However, in bothcases we have just replaced one word by one of its anto-nyms. So, how does our brain compute semantic similarity?Are some parts of a sentence more important than others inthe computation of semantic similarity? Most studies havecentered on the dominant effect of verbs on the meaningof a sentence but there is a lack of work about the relativeinfluence of different syntactic roles. The only study in thisdirection is the one presented by   Wiemer-Hastings (2004).On that paper, Wiemer-Hastings points that human judgestend to ignore similarities between segments with differentfunctional roles, denoting the importance of syntacticstructure analysis in the computation of semantic similarity,

    and claiming that different syntactic roles have a differentlevel of signification in the calculation of semantic similarityby humans.

    In this paper, we propose an interdisciplinary approachto better understand how our mind computes semantic sim-ilarity and, in particular, the different importance thathumans give to different syntactic roles in the computationof semantic similarity. Acoording to   Cambria and White(2014), the work presented here aims at explaining howcan we jump from the syntactics and semantics curves tothe pragmatics one. To this end, we performed a psycholog-ical study about how humans compute semantic similaritybetween sentences and then we use a computational para-digm in order to test the validity of the hypotheses derived

    from that study. First of all, we present an empirical studythat seeks for the different weights given by humans to dif-ferent syntactic roles when computing semantic similarity.According to the results obtained by   Wiemer-Hastings(2004), we start from the hypothesis that different syntacticroles have different importance in the calculation of seman-tic similarity by humans. Going a step forward, we made adeeper empirical study, based on two experiments withhuman judges, that complement the experiments carriedout by Wiemer-Hastings (2004) and overcome some of theirlimitations. Our experiments are not restricted to specificdomains while the work of   Wiemer-Hastings (2004)   is

    focused on the two specific domains: computer literacyand psychological research methods. Moreover we give amore accurate quantitative measure of the differentweights given by humans to different syntactic roles whilecomputing semantic similarity.

    In order to assess the validity of the conclusions obtainedwith the experiments carried out with humans, we used acomputational paradigm. We incorporated the results of

    the empirical study to a psychologically plausible semanticsimilarity measure (Oliva, Serrano, Del Castillo, & Iglesias,2011) that takes into account the influence of different syn-tactic roles on the overall sentence meaning. The semanticsimilarity measure was applied to a paraphrase recognitiontask (Dolan, Quirk, & Brockett, 2004) using two differentcombinations of weights obtained from the human judg-ments for a semantic similarity task and from the humanjudgments for the same paraphrase recognition task. Theresults obtained with both versions confirm the differentcontribution of different syntactic roles on semantic simi-larity computation. The different variations of the similaritymeasure tested with the two combinations of weights out-performed their non-weighted counterparts. Moreover, they

    obtained results similar to the ones reported by  Islam andInkpen (2008) and Mihalcea, Corley, and Strapparava(2006)   on the paraphrase recognition task. Furthermore,four of the six approaches tested outperform significantlythe method of Mihalcea et al. (2006) and the results of threeof them are similar to the ones reported by Islam and Inkpen(2008). Finally, we compared the different weights given byhumans to different syntactic roles on different tasks thatinvolve semantic similarity computation. The weightsobtained from the human ratings of semantic similarityand the ones obtained from the paraphrase recognition taskwere very similar, showing that humans tend to use thesame weights through different tasks. The interdisciplinary

    character of this work is not only assessed by the combina-tion of experimental techniques derived from psycholinguis-tics and computational sciences. Moreover, thecontributions of this paper are of interest both from a the-oretical and a practical point of view. The importance ofsentence semantic similarity measures in natural languageresearch is increasing due to the great number of applica-tions that are arising in many text-related research fields.For example, in text summarization, sentence semanticsimilarity is used (mainly in sentence-based extractive doc-ument summarization) to cluster similar sentences and thenfind the most representing sentence of each cluster(Aliguliyev, 2009). Also in web page retrieval sentence sim-ilarity can improve the effectiveness by calculating similar-

    ities of titles of pages (Park, Ra, & Jang, 2005).Conversational agents can also benefit from the use of sen-tence similarity measures reducing the scripting process byusing natural language sentences rather than patterns ofsentences (Allen, 1995). These are only a few examples ofapplications whose effectiveness could improve with sen-tence semantic similarity calculation. Therefore, our worknot only sheds light on the theoretical question of howhumans use syntactic roles when computing semantic simi-larity. It also shows how those results can be straightfor-wardly used in a psychologically plausible computationalsystem with many practical applications.

    122 J.I. Serrano et al.

  • 8/18/2019 j.bica.2015.04.007_759597398647

    3/13

    The structure of this paper is the following: next sectionpresents a brief review of some studies about the influenceof syntax on semantics and some approaches to computesemantic similarity taking into account syntactic informa-tion to some extent. The following section discusses thepsychological experiments carried out to measure the con-tributions of different syntactic roles to the overall meaningof a sentence. Next section after that presents an adapta-

    tion of a psychologically plausible semantic similarity mea-sure that applies the conclusions obtained in the previousexperiments on the computation of semantic similarity byhumans. Finally, a conclusion section sums up the workand points out some conclusions and future work.

    Background

    From early 70s the relations between syntax and semanticshave been studied to some extent. However, a deep empir-ical study about the quantitative and qualitative contribu-tion of the different syntactic roles to the overall meaningof a sentence has not been carried out to date.

    An early study by Healy and Miller (1970) focused on theinfluence of verbs and subjects on semantics. On that study,25 sentences were constructed from 5 subjects (the sales-man, the writer, the critic, the student and the publisher),5 verbs (sold, wrote, criticized, studied and published), and1 object (the book). Participants were asked to sort the sen-tences in groups according to similarity in meaning. Resultsshowed that participants tend to sort sentences that sharethe same verb and not the same subject. The conclusionobtained by Healy and Miller is that the verb is the maindeterminant of sentence meaning.

    Starting from this experiment, most linguistic and psy-cholinguistic theories assume that verbs play a main role

    on sentence semantics. However,   Bencini and Goldberg(2000) suggest that the contribution of verbs to the overallmeaning of the sentence may not be as strong as assumed.In their study, Bencini and Goldberg support that argumentstructure constructions (i.e. the different configurations ofthe complements of a verb) are directly associated withsentence meaning. To get to this conclusion, they carriedout an experiment in which participants were asked togroup sentences according to similarity in meaning.Sixteen sentences were used, obtained by crossing fourverbs (throw, slice, get and take) with four constructions(ditransitive, caused motion, resultative and transitive).The results showed that most of the participants sortedthe sentences by construction and not by verb. This way,

    the argument structure constructions seem to play a crucialrole in sentence interpretation, independently of the contri-bution of the main verb. However, Bencini and Goldbergacknowledged also the importance of verb semantics onthe overall meaning of a sentence, stating that in somecases, its contribution could be higher than the one of argu-ment constructions.

    The influence of syntax on semantics has been assessedacross different areas. For example, the syntactic boot-strapping theory (Gleitman & Gillete, 1994) posits that chil-dren use syntactic knowledge during the lexical acquisitionof verbs. Despite the great amount of work studying the

    relations between syntax and semantics, the most impor-tant effort to measure the influence of different syntacticroles on semantic similarity computation is the one doneby   Wiemer-Hastings (2004). On their work, they tried tocatch the relative influence of the main phrases (subject,verb, object and indirect object) within a sentence. A cor-pus of 50 sentence pairs was constructed for the experi-ment. The first sentence of each pair was built by

    randomly selecting a subject, verb, direct object and,optionally, an indirect object from a list of candidates.The second sentence was created by taking some of the can-didates selected for the first sentence but changing theroles of some of them. The participants were asked to ratethe similarity of each pair on a scale from 1 (totally dissim-ilar) to 6 (completely similar). The main conclusionsobtained by Wiemer-Hastings is that humans tend to ignoresimilarities between segments with different functionalroles and that verbs play a main role on the overallsemantics, followed by subjects and objects among whichthere is no significant difference. This is also in accordancewith recent neurolinguistic findings (Malaia & Newman,2014).

    Talking about computational methods, there do not existmany methods that take into account the different impor-tance of syntactic roles to compute semantic similaritybetween sentences. There are few methods that considerpseudo-syntactic information such as word order in the sen-tence. The ones proposed by Achananuparp, Hu, Zhou, andZhang (2008), Li, McLean, Bandar, O’Shea, and Crockett(2006) and Islam and Inkpen (2008) use this kind of informa-tion showing the best results reported in the literature interms of correlation to human intuition, thus verifying thatsyntactic information is of great importance in the compu-tation of sentence semantic similarity. Given the promisingresults obtained by methods that include shallow syntactic

    information, some authors have tried to consider deepersyntactic information. In this direction we can find theapproach proposed by  Achananuparp, Hu, and Yang (2009)that takes into account the verb-argument structure ofthe sentences to measure their semantic similarity andthe efforts of   Wiemer-Hastings (2004), Wiemer-Hastingsand Ziprita (2001), Wiemer-Hastings (2000) and Wiemer-Hastings, Wiemer-Hastings, and Graesser (1999) to add syn-tactic information to LSA (Landauer, Foltz, & Laham, 1998)(LSA is a corpus-based method for computer modeling andsimulation of the meaning of words and passages that allowsto compute easily its semantic similarity). Both approachesimprove the results obtained by similar methods that do nottake into account the contributions of different syntactic

    roles. However, going beyond the verb-argument structureof the sentence could lead to a better performance. Toour knowledge, the only effort made on that direction isthe one of  Oliva et al. (2011). They proposed a semanticsimilarity measure that computes the semantic similaritybetween concepts that play the same syntactic role in thetwo sentences compared. This approach outperformedexisting methods in the task of computing semantic similar-ity between sentences and obtained similar results to thebest performing methods on the paraphrase recognitiontask. Given that this measure takes into account the differ-ent syntactic roles, we adapted it in order to test the

    Human-inspired semantic similarity between sentences 123

  • 8/18/2019 j.bica.2015.04.007_759597398647

    4/13

    validity of the conclusions extracted from the psychologicalstudy carried out on this paper.

    How do humans weigh syntactic roles when judging semantic similarity?

    Wiemer-Hastings (2004) pointed that human judges tend toignore similarities between segments with different func-tional roles, denoting the importance of the different syn-tactic roles in the computation of semantic similarity, andclaiming that different syntactic roles have differentimportance in the calculation of semantic similarity byhumans. In order to check this hypothesis we carried outtwo experiments. These experiments tried to catch theimportance given by humans to four basic syntactic roles:subject, verb, direct object and adverbial complement.We designed the first experiment in order to checkwhether there exist significant qualitative differences inthe weights used by humans during the computation ofsemantic similarity. The second experiment tries to com-plement the first one by measuring quantitatively those

    different weights.

    Experiment I

    Experiment I examined whether participants are sensible tochanges in syntactic roles and whether there are qualitativedifferences on the importance of different changes in dif-ferent syntactic roles. This experiment tried to show howto rank the four basic syntactic roles (verb, subject, directobject and adverbial complement) according to the impor-tance that humans give to each of them while computingsemantic similarity between sentences.

    ParticipantsThe dataset was assessed by 30 evaluators, all adult, nativeSpanish-speakers.

    Materials

    Two sets of four sentences were built for use as stimuli inthe experiment. Both sets used the frame sentence   ‘‘El

     joven anim a su amigo rpidamente’’ (In English: The youngman encouraged his friend quickly) and we built a first set offour sentences by changing, in each one, one of the foursyntactic roles for a synonym. The second set of four sen-tences was constructed by replacing, in each one, one ofthe syntactic roles of the frame sentence for an antonym.The synonyms and antonyms used in the experiment wereextracted from the online version of the dictionary of the

    Real Academia Española de la Lengua1. The complete setof sentences used in the experiment is shown in   Table 1(see Appendix for the Spanish translations used in theexperiment).

    Method

    The evaluators were asked to rank the sentences in each setaccording to its semantic similarity with the frame sentence(rank 4 indicates the most similar sentence and rank 1 indi-cates the most dissimilar sentence). Each participant waspresented with a form with the 4-sentence lists simultane-ously and there were no limitations of time to answer.The sentences were presented to the evaluators inSpanish, as shown in Appendix.

    Results and discussion

    Table 1 shows the complete set of sentences and the averagescores given by human evaluators. A two-tailed   t-test wasperformed to determine if there were significant differencesin the way humans weigh different syntactic roles. In thesynonym-set, not surprisingly, ratings given to the sentencewith a different verb were significantly lower than ratingsgiven to the sentences with a different subject, object andadverbial complement (tð29Þ ¼  2:76; p  <  0:01;  tð29Þ ¼  2:21; p  <  0:05; tð29Þ ¼  4:22; p  <  0:001, respectively). Moreover,ratings given to the sentence with a different adverbial com-plement were significantly higher than the ones given to the

    sentences with a different subject, object and verb ( tð29Þ ¼2:12; p  <  0:05; tð29Þ ¼  2:48; p  <  0:05; tð29Þ ¼ 4:22; p  < 0:001,respectively). However, there was no significant differencebetween the ratings given to the sentence with a differentsubject and the one with a different object (tð29Þ ¼  0:62; p ¼  0:541). The results with the antonym set are similar interms of statistical significance. All the differences in

    Table 1   Similarity ranks obtained from human evaluators for different syntactic roles and semantic relations of changed words

    to the reference sentence  ‘‘The young man encouraged his friend quickly’’. Rank ranges from 1 to 4, being 4 the most similar.

    Syntactic role substituted Sentence Human-assigned rank

    Mean Std.

    Synonyms

    Subject The  lad  encouraged his friend quickly 2.60 1.08

    Direct object The young man encouraged his buddy quickly 2.40 0.95

    Adverb. comp. The young man encouraged his friend  rapidly   3.17 1.07

    Verb The young man   heartened  his friend quickly 1.83 0.93

     Antonyms

    Subject The  old man  encouraged his friend quickly 2.63 0.69

    Direct object The young man encouraged his enemy quickly 2.30 0.83

    Adverb. comp. The young man encouraged his friend  slowly   3.80 1.07

    Verb The young man   discouraged his friend quickly 1.20 0.40

    124 J.I. Serrano et al.

  • 8/18/2019 j.bica.2015.04.007_759597398647

    5/13

    ratings were statistically significant but the one betweenthe sentence with a different subject and the sentence witha different object.

    Given the results shown in Table 1 we can conclude thatdifferent syntactic roles have different effects in sentencesemantic similarity. Furthermore, we could extract two pre-liminary conclusions:

    1.  Humans give a great importance to verbs in the compu-tation of sentence semantic similarity . Substituting averb by one of its antonyms makes the sentence muchmore different from the frame sentence than substitut-ing any other syntactic role. Also when making the sub-stitution with a synonym, the resulting sentence is themost different from the frame sentence. This indicates

    that a little change in the verb produces a change in sen-tence semantics higher than the one produced by a littlechange in other syntactic roles.

    2.  Humans give a low importance to adverbial complementsin the computation of sentence semantic similarity . Theeffects produced by a substitution of the adverbialcomplement in the frame sentence are exactly theopposed to the ones produced by substituting the verb.Changing the adverbial complement by one of its anto-nyms makes a very similar sentence to the frame sen-tence. Furthermore, a slight change in the adverbialcomplement keeps the sentence very similar to the orig-inal one.

    The importance that humans give to subjects and objects

    when computing semantic similarity seems to be very simi-lar. The differences obtained are not statistically signifi-cant. This result is in line with the one reported byWiemer-Hastings (2004), which also showed no statisticalsignificance between these syntactic roles.

    Experiment II

    In order to seek for more evidences and to catch the relativeimportance of each syntactic role we carried out a secondexperiment. Experiment I showed that participants gavethe highest importance to verbs and the lowest importanceto adverbial complements. A worthwhile question to ask iswhich the relative importance of each syntactic role is.

    Thus, the objective of this second experiment was not onlyto rank the different syntactic roles attending to theirimportance when computing its semantic similarity, but alsoto obtain a quantitative measure of the different weightshuman give to different syntactic roles when computingsemantic similarity. This way, this second experiment com-plements the first one and completes the qualitative andquantitative study presented on this paper.

    Participants

    The participants consisted of 27 evaluators, all adult, nativeSpanish-speakers.

    Table 2   Similarity scores obtained from human evaluators for different syntactic roles and semantic relations of changed words

    in sentence pairs. Similarity scores range from 0 to 100, indicating 100 the highest similarity.

    Syntactic role substituted Sentence pair Human-assigned similarity

    Mean Std.

    Subject That  gem  surprised the man 83.667 17.839

    That  jewel  surprised the man

    That  crane surprised the man 55.667 21.944

    That   implement  surprised the man

    That  glass  surprised the man 23.133 18.747

    That   magician surprised the man

    Direct object The man saw the  gem  yesterday 82.882 15.277

    The man saw the  jewel  yesterday

    The man saw the  crane yesterday 45.867 23.562

    The man saw the   implement yesterday

    The man saw the  glass  yesterday 23.882 19.787

    The man saw the   magician yesterday

    Adverbial complement The man left my bike near those  gems   92.000 7.024

    The man left my bike near those   jewels

    The man left my bike near that  crane   60.200 24.026

    The man left my bike near that   implementThe man left my bike near that  glass   39.133 23.263

    The man left my bike near that   magician

    Verb The man  split  the bin 73.667 20.450

    The man  divided the bin

    The man  crushed the bin 56.600 23.619

    The man  split  the bin

    The man  emptied the bin 9.467 11.860

    The man   situated  the bin

    Human-inspired semantic similarity between sentences 125

  • 8/18/2019 j.bica.2015.04.007_759597398647

    6/13

    Materials

    We collected human ratings of semantic similarity for pairsof sentences following the existing designs for word similar-ity measures given by  Rubenstein and Goodenough (1965)and Yang and Powers (2006). In order to build the dataset, we took three noun pairs from the experiment ofRubenstein and Goodenough (1965). These three noun pairswere selected attending to the semantic similarity given by

    human evaluators in that experiment. A high similarity pair(gem jewel, with a similarity of 3.84), a medium similaritypair (crane implement, with a similarity of 1.68) and a lowsimilarity pair (glass magician, with a similarity of 0.11)were selected. In the same way, we selected three pairsof verbs from the experiment of   Yang and Powers (2006)(divide split, with a similarity of 4, split crush, with a simi-larity of 2 and empty situate, with a similarity of 0.17). Withthis set of words, we built for each syntactic role three pairsof sentences with exactly the same words except for theword which plays the corresponding syntactic role. Forexample, for the syntactic role ’subject’ we built the pairsof sentences: ‘‘That gem surprised the man – That jewelsurprised the man’’ ‘‘That crane surprised the man – Thatimplement surprised the man’’and ‘‘That glass surprisedthe man – That magician surprised the man’’. The completeset of sentences used in the experiment is shown in Table 2(see Appendix for the Spanish translations used in theexperiment).

    Method

    Each evaluator was given a subset made by four pairs of sen-tences, with one of the pairs of sentences built for each syn-tactic role, so the evaluations were not influenced by thepresence of all the similar sentences. Every sentence pairwas assessed by nine evaluators.

    The evaluators were asked to rate the semantic similar-

    ity of the sentence pairs in a scale from 0 (minimum similar-ity) to 100 (maximum similarity). The scale used in theexperiments carried out by   Rubenstein and Goodenough(1965) and Yang and Powers (2006)   ranged from 0 to 4.However, we selected this new scale to make the evaluatorsindicate slight differences in scores easier.

    Results and discussion

    Table 2 shows the complete set of sentences and the aver-age scores given by human evaluators. We calculated thePearson’s correlation coefficient  ðr Þ  between each partici-pant’s ratings and the average rating. The range wasr  ¼  0:564 to  r  ¼  0:961, with a mean of 0:8597ðSD ¼  0:107Þ.

    These results are consistent with the ones presented inO’Shea, Bandar, Crockett, and McLean (2008)  for a similartask. In that experiment, humans measured the similarityof sentence pairs, obtaining an average correlation of0:825ðSD ¼  0:072Þ, moving in a range from 0.594 to 0.921.Also a two-tailed t-test was performed to determine if therewere significant differences in the way humans weigh differ-ent syntactic roles. First of all, we took a look at the fourpairs of sentences built with the most similar pairs of words.Human ratings given to the sentence pair built with the mostsimilar verb pair were significantly lower than ratings givento the sentences built with the word pair ‘‘ gem – jewel’’playing the role of subject, object and adverbial

    complement (tð16Þ ¼  2:18; p   < :05; tð16Þ ¼  2:42; p   < :05;tð16Þ ¼  3:83; p  < :01 respectively). Moreover, ratings givento the sentence pair built with the most similar adverbialcomplement were significantly higher than ratings given tothe sentences built 15 with the most similar subjects,objects and verbs (tð16Þ ¼  2:13; p  < :05;   tð16Þ ¼2:60; p  < :05; tð16Þ ¼   3:83; p  < :01 respectively). However,there was no significant difference between the ratings

    given to the sentence pairs with the most similar subjectsand the most similar objects (tð16Þ ¼ 0:07; p ¼  0:943). Theresults with the most dissimilar pairs of words are similarin terms of statistical significance. All the differences in rat-ings were statistically significant but the one between thesentence with the most similar subjects and the most simi-lar objects. The two conclusions obtained with the firstexperiment were confirmed by the results shown inTable 2. The pair of sentences built with the most dissimilarverbs (empty – situate) is scored by humans as the most dis-similar pair of sentences. The mean score given by humansto this pair of sentences is 9.467, significantly lower thanthe rest of sentence pairs made with the most dissimilar pair( glass – magician). Also the sentence pair constructed with

    the most similar pair of words (divide – split) obtained thelowest score (73.667) when these words play the role ofverb. The results of the second experiment also confirmedthe observation that the humans give a low importance toadverbial complements while calculating sentence semanticsimilarity: the sentence pair made by using the most dissim-ilar pair of words ( glass – magician) was scored with thehighest similarity (39.133) of all the sentence pairs madewith this pair of words and furthermore, the sentence pairconstructed with the most similar pair of noun words( gem – jewel) obtained the highest score (92.000) whenthese words play the role of adverbial complement. Theseresults confirm the fact that a change (either a big or a little

    change) in the meaning of the adverbial complement leadsto a smaller change (from 39.133 to 92.000) in the sentencemeaning than the one produced by a similar change in othersyntactic roles. Nevertheless, a change (either a big or a lit-tle change) in the meaning of the verb leads to a greaterchange (from 9.467 to 73.667) in the sentence meaning thanthe one produced by a similar change in other syntacticroles.

    Observing the results obtained when dealing with sub-jects and direct objects we can see that the differencesbetween these two syntactic roles are not as obvious asthe ones described above. This second experiment con-firmed the observation obtained on the first experiment thatthe importance given by human judgments to the subject

    and the object roles is very similar.

    Assessment of the importance of weighingsyntactic roles

    In order to assess the conclusions obtained in the previoussection, we adapted a semantic similarity measuredescribed in Oliva et al. (2011) that takes into account theinfluence of different syntactic roles on the overall sen-tence meaning. Here, the semantic similarity measure isapplied to a paraphrase recognition task using two differentcombinations of weights obtained from the human

    126 J.I. Serrano et al.

  • 8/18/2019 j.bica.2015.04.007_759597398647

    7/13

    evaluators reported in the previous section and from otherhuman judgments for the same paraphrase recognition task.As pointed by   Corley and Mihalcea (2005) and Islam andInkpen (2008), sentence semantic similarity computation isnot the same task as paraphrase recognition. While insemantic similarity computation a similarity score must beassigned to pairs of sentences, a binary decision must bemade in paraphrase recognition for pairs of sentences:

    whether the two sentences mean exactly the same or not.Nevertheless, they are very related tasks and there aremany things that can be learned from one to the other.We used the paraphrase recognition task in our work fortwo main reasons: first of all, there exist large datasetsjudged by humans so we can test the hypotheses of ourexperimental study with a significant sample. Moreover,the use of the paraphrase recognition task allows expandingthe results obtained in the experimental study by checkingwhether similar weights are used by humans in similar tasks.

    W-SyMSS: Weighted Syntax-based Measure of 

    Semantic Similarity

    In this subsection, we describe the W-SyMSS method pro-posed by Oliva et al. (2011). The method captures the influ-ence of the syntactic structure of the compared sentencesin the calculation of the semantic similarity. This is basedon the notion that the meaning of a sentence is not onlymade up by the meanings of its individual words, but alsoby the structural way these words are combined.

    W-SyMSS captures and combines syntactic and semanticinformation to compute the semantic similarity of two sen-tences. Semantic information is obtained from WordNet(Fellbaum, 1998), whose structure allows calculating differ-ent types of semantic similarity measures between con-cepts. Syntactic information is obtained through a parsing

    process that obtains the phrases, i.e. groups of words thatfunction as a single unit in the syntax of a sentence, whichmake up the sentence as well as their syntactical functions.With this information, the proposed method measures thesemantic similarity between concepts that have the samesyntactic function.

    The similarity between two sentences is calculated as aweighted sum of the similarities between the heads of thephrases that have the same syntactic function in the twosentences, following the next formula:

    simðs1 ; s2Þ ¼w S sS þw V   sV  þ w O sO þw  A s A þ

    Pni¼1w R simðh1i ;h2iÞ

    w S þw V  þ w O þw  A þw R n  l PF 

    Let us assume that sentence  s1  and  s2 are made of a subject

    (S), a verb (V ), a direct object (O), an adverbial comple-ment ( A) whose semantic similarities are   sS; sV ; sO   and   s A.Also each sentence may have n other syntactic roles whoseheads are  h11,  h1n  and  h21,  h2n, respectively for sentence  s1and  s2. Moreover, let phrases of   h1i   and   h2i  have the samesyntactic function. Also let us assume that the sentenceshave l syntactic roles that are only present in one of the sen-tences. In this case, if one sentence has a phrase not sharedby the other, a penalization factor (PF ) is introduced toreflect the fact that one of the sentences has extrainformation.

    W-SyMSS obtains semantic similarity between concepts (inthe formula: sS; sV ; sO; s A and simðh1; h2Þ) from WordNet, usingits hierarchical structure and the different glosses associatedwith each term. The similarity between concepts is the basicunit of similarity used by the sentence similarity method, sousing a poor similarity measure at this point could reducethe overall performance of the proposed method.

    In this paper we have made a comparative study follow-

    ing the approach of Oliva et al. (2011) by using six differentmeasures in order to compare their performance. These sixmeasures belong to three types of categories (for a moredetailed explanation of these measures see   Pedersen,Banerjee, & Patwardhan (2005)): Path-based category(Path, and Hirst and St. Onge measures), Information con-tent-based category (Resnik, Lin, and Jiang and Conrathmeasures) and Gloss-based category (Vector measure). Ineach experiment we tested six different variations of W-SyMSS, one of each using one of the measures mentioned.From now on these different variations will be named witha prefix (PATH, HSO, RES, LIN, JCN and VECTOR) indicatingthe semantic similarity measure between concepts used.

    The experimental methodology is the same as the one

    used by   Oliva et al. (2011)   but in that study the weightsassigned to each syntactic role were the ones obtainedempirically by   Wiemer-Hastings (2004). We will use thoseresults in order to show how our current estimation ofweights is more accurate.

    SW-SyMSS: Similarity task Weighted SyMSS

    As evidenced above, words with different syntactic roles ina sentence provide different contributions to sentencesemantic similarity computation made by humans.Therefore, an appropriate weighing strategy is needed inorder to reflect the contribution that each syntactic compo-

    nent makes to the overall measurement.Experiment II gives us a quantitative estimation about

    the different weights given by human evaluators to differ-ent syntactic roles. To obtain the most adequate combina-tion of weights that fits these human evaluations we usedan evolutionary strategy (De Jong, 2006). The parametersused in the evolutionary strategy were: a population of 30individuals in each generation, an offspring of 200 individu-als and a schema l þ k for the individuals selection (i.e. thel individuals of the next generation are selected among theparents and the offspring). The fitness function of eachcombination of weights was calculated as the Pearsons cor-relation coefficient of the similarities obtained using W-SyMSS method and the similarity values given by human

    evaluators in Experiment II. To compute the semantic simi-larity of each of the sentence pairs in Experiment II, weused the similarity values given by humans in  Rubensteinand Goodenough (1965) to the word pair around which thesentence pair had been constructed. For example, the wordpair crane implement, whose similarity is of 1.68 in theinterval [0–4], recalculated to 0.42 in the interval [0–1],can be taken. Given a combination of weights   w S; w V ; w Oand   w  A   (weights for subject, verb, object and adverbialcomplement, respectively) the semantic similarity of thesentence pair:

    Human-inspired semantic similarity between sentences 127

  • 8/18/2019 j.bica.2015.04.007_759597398647

    8/13

    I saw that  crane  yesterday.

    I saw that  implement  yesterday.

    would be:

    simðs1; s2Þ ¼ w S þ w V  þ ðw O  0:42Þ þ w  A

    w S þ w V  þ  w O þ w  A

    After running the evolutionary strategy, the best combi-

    nation of weights is the one shown in Table 3. The similarityscores obtained using this combination of weights leads to acorrelation coefficient of 0.974 ( p  < :01) with human simi-larities. Given that there are no human data about othersyntactic roles, the weight for the rest of the possible func-tions present in a sentence was set empirically to w R  ¼  0:4.From now on, the W-SyMSS method using the weights inTable 3 will be called SW-SyMSS.

    In order to evaluate our sentence similarity measure SW-SyMSS with a large dataset and in a much more challengingtask, we used the Microsoft Paraphrase Corpus (Dolan et al.,2004). This corpus consists of 4076 training and 1725 testpairs collected from thousands of news sources on theweb over a period of 18 months, which have been labeled

    by two human judges who determined whether the two sen-tences in a pair were semantically equivalent paraphrases ornot. The agreement between human judges was approxi-mately 83%, which can be considered as an upper boundfor the automatic task. For this paraphrase identificationtask, we used SW-SyMSS as a supervised method, using thetraining set for obtaining the similarity threshold score thatcarries the best accuracy in the training set, and the test setfor checking the method with this similarity threshold. Inorder to determine whether a pair is a paraphrase or notwe used different similarity thresholds ranging from 0 to 1with interval 0.05. For each candidate paraphrase pair inthe training set, the system obtained the semantic similarity

    score and then labeled the candidate pair as a paraphrase ifthe similarity score exceeded each of the thresholds used.After the evaluation with the training test, we selectedthe best similarity threshold in terms of accuracy for eachof the variations (PATH, HSO, RES, LIN, JCN and VECTOR)of SW-SyMSS evaluated. Then, these similarity thresholdswere used in the evaluation process with the test set.According to   Mihalcea et al. (2006), two baselines wereused: random simply makes a random decision for each can-didate pair and vector-based uses a cosine similarity mea-sure as traditionally used in information retrieval, with tf-idf weighting (term-frequency * 1/term frequency in docu-ment). In order to show the contribution of our weighingstrategy, we have also computed one more baseline method

    for each variation using a value of 1 for all weights in W-SyMSS.

    The evaluation metrics used to measure the performanceof the different variations of SW-SyMSS are the ones pro-posed by Achananuparp et al. (2008). Precision is the pro-portion of correctly predicted paraphrase sentences to allpredicted paraphrase sentences. Recall is the proportionof correctly predicted paraphrase sentences to all para-phrase sentences. Rejection is the proportion of correctlypredicted non-paraphrase sentences to all non-paraphrasesentences. Accuracy is the proportion of all correctly pre-dicted sentences compared to all sentences.

    Accuracy comparisons between the weighted and non-weighted variations can be seen in   Table 4. Completeresults are shown in Table 5. Also baseline results and theresults obtained by similar studies (Islam & Inkpen, 2008;Mihalcea et al., 2006; Oliva et al., 2011) are shown forthe sake of comparisons. Concretely,   Mihalcea et al.(2006) proposed a combined unsupervised method that usessix WordNet-based measures and two corpus-based mea-sures and combined the results to show how these measurescan be used to derive a short texts similarity measure. Themain drawback of this method is that it computes the simi-larity of words using eight different methods, which is not

    computationally efficient. Islam and Inkpen (2008) proposeda corpus-based similarity method that considered pseudo-syntactic information, such as common word order similar-ity. It is important to note that   Oliva et al. (2011)   usedtwo versions of the same similarity method. The first oneis a version of W-SyMSS with all the weights equal to one(here called SyMSS) and the second one uses the combina-tion of weights extracted from the work of   Wiemer-Hastings (2004)   (here called WHW-SyMSS). Therefore, acomparison with that method is of special interest and willshow how our approach fits better the weights humans giveto different syntactic roles.

    Results and discussion

    Table 4 clearly shows the influence of using syntactic infor-mation to compute semantic similarity. The six variationsthat use a weighing strategy (SW-SyMSS) similar to the oneused by human evaluators in the similarity task outperformthe corresponding variations that do not use this strategy.The improvement of three of these variations (JCN, LINand VECTOR) was found to be significant ( p  < :05) using aparametric paired t-test once we confirmed that data werenormally distributed by a Chi-Square test for the goodnessof fit ( p  < :05). The other three variations have a similarperformance using our combination of weights and the onesused in Oliva et al. (2011).

    Moreover, Table 5  shows encouraging results given that

    the combination of weights used has been obtained from asimilarity task and then used for the paraphrase recognitiontask. Three of the variations of SW-SyMSS (JCN, LIN andVECTOR) outperform significantly ( p  < :05, using the statis-tical test mentioned previously) the accuracy results ofMihalcea et al. (2006). Also, the approaches based on theVECTOR measure and the JCN measure (VECTOR-SWSyMSSand JCN-SW-SyMSS, respectively) obtain results similar tothe ones obtained by   Islam and Inkpen (2008)   in terms ofaccuracy.

    The results obtained from this computational experimentconfirm the working hypothesis: humans give different

    Table 3   Optimal weights for the semantic similarity task.

    Syntactic role Weight

    Subject 0.65293

    Verb 0.75191

    Object 0.68669

    Adverbial complement 0.55155

    128 J.I. Serrano et al.

  • 8/18/2019 j.bica.2015.04.007_759597398647

    9/13

    weights to different syntactic roles in semantic similarityrelated tasks. This conclusion was already pointed by  Olivaet al. (2011). As commented before, they used the resultsof   Wiemer-Hastings (2004)   to show that a psychologicallyplausible weighing of syntactic roles led to a better fittingto human evaluations. The presented approach obtains bet-ter results than the WHW-SyMSS version of   Oliva et al.(2011)  with three of the variations tested. Therefore, theworking hypothesis is, once again, supported. Moreover,these results show that our psychological study, carriedout in Experiment II, is more accurate than the one ofWiemer-Hastings (2004)   in order to measure the quantita-tive contribution of each syntactic role in semantic similar-

    ity related tasks.From a computational point of view, Table 5 shows thatthe enhancement of the psychological plausibility of anexisting method led to an improvement in the overall per-formance of the method. The proposed semantic similaritymeasure can compete with state-of-the-art methods ofsemantic similarity computation. Therefore, the contribu-tion of this paper is not only relevant from a cognitive pointof view but also from a computational point of view. As sta-ted in the introduction, the importance of sentence seman-tic similarity measures in natural language research isincreasing due to the great number of applications thatare arising in many text related research fields.

    PW-SyMSS: Paraphrase task Weighted SyMSS

    The work of Wiemer-Hastings in combination with this studyshows that humans give different importance to differentsyntactic roles while computing semantic similarity. A natu-ral question that arises is whether humans use similarweights when facing up to similar natural language process-ing tasks or not. In order to check this hypothesis we com-pute the optimal weights for the paraphrase recognitiontask and compare them to the ones obtained for the seman-tic similarity computation task. As acknowledged by Corleyand Mihalcea (2005) and Islam and Inkpen (2008), sentencesemantic similarity measures are a necessary step in the

    paraphrase recognition task, so it could be expected thathumans use similar weights in this task.

    The optimal weights for the paraphrasing task were com-puted using an evolutionary strategy in the same way as forSW-SyMSS. A hundred pairs of sentences (50 of them para-phrase and 50 non-paraphrase) were selected from theMicrosoft Paraphrase Corpus. The fitness function of eachcombination of weights was calculated as the accuracy inthe detection of paraphrase sentences among these 100selected pairs.   Table 6   shows the best combination ofweights obtained after running the evolutionary strategy.We only show the results obtained with the VECTOR mea-sure given that, following the previous study, is the best

    Table 4   Accuracy values with the MSR corpus for SW-SyMSS, WHWSyMSS and non-weighted SyMSS measures, and different

    WordNet-based word similarity measures. Bold entries show the best performing variation for each word similarity measure.

    Word similarity SW-SyMSS WHW-SyMSS SyMSS

    PATH 69.80   69.81a 69.16

    JCN   71.83a 70.87 70.42

    RES   69.62   69.32 69.48

    LIN   71.63a 70.63 70.10

    HSO   69.27   68.72 68.48VECTOR   72.08a 70.82 70.52

    a  p  < :05.

    Table 5   Results with the MSR corpus. SW-SyMSS variations, similar methods and baselines. F1 and f1 are the uniform harmonic

    mean of precision-recall and rejection-recall respectively. Bold entries show the best performing method for each evaluation

    measure.

    Measure Best similarity threshold Pr. Rec. Rej. F1 f1 Acc.

    PATH-SW-SyMSS 0.4 73.03 88.25 30.41 79.92 45.23 69.80

    JCN-SW-SyMSS 0.4   75.29   85.02 42.25 79.86 53.02 71.83RES-SW-SyMSS 0.35 71.91 91.13 25.39 80.38 39.71 69.92

    LIN-SW-SyMSS 0.4 73.76 88.92 30.89 80.63 45.85 71.63

    HSO-SW-SyMSS 0.45 71.47 78.20   43.16   74.68 55.62 69.27

    VECTOR-SW-SyMSS 0.5 74.5 88.71 38.15 80.99 53.35 72.08

    Islam and Inkpen 0.6 74.65 89.13 39.97 81.25 55.19   72.64

    Mihalcea et al. 0.5 69.60   97.70   –   81.30   – 70.30

    Oliva et al. (JCN-WHWSyMSS) 0.45 74.47 84.17 41.61 79.02   55.68   70.87

    Random – 68.30 50.00 – 57.80 – 51.30

    Vector-based 0.5 71.60 79.50 – 75.30 – 65.40

    Human-inspired semantic similarity between sentences 129

  • 8/18/2019 j.bica.2015.04.007_759597398647

    10/13

    performing concept similarity measure in terms ofaccuracy.

    The weight for the rest of the possible functions presentin a sentence was set empirically to   w R  ¼  0:07348. Usingthis combination of weights in the same way as for SW-SyMSS, we evaluate this new approach of the proposed sys-tem with the Microsoft Paraphrase Corpus obtaining theresults shown in   Table 7. Also baseline results and the

    results obtained by   Mihalcea et al. (2006) and Islam andInkpen (2008) are shown for the sake of comparisons.

    The optimal weights obtained for the paraphrasingrecognition task show that humans tend to use similarweights although the two tasks are different. Despite thefact that the weights obtained for the paraphrase recogni-tion task are lower than the ones obtained for the semanticsimilarity computation task, the relative influence of eachsyntactic role is almost the same for both tasks (seeTable 8). In the paraphrase recognition task, humans givethe highest influence to the verb, while the adverbial com-plement is, by far, the least important syntactic role. Theresults obtained for the subject and object role show, onceagain, that the differences between subject and object are

    not significant on the calculation of sentence semantic sim-ilarity. As can be seen, for the semantic similarity computa-tion task, the object had a slightly higher weight than thesubject. And for the paraphrase recognition task, the sub-ject is the role that has a slightly higher weight. However,the differences are very small so we can conclude thathumans do not give a significant difference between thesetwo syntactic roles. This result matches up with the exper-iment carried out by   Wiemer-Hastings (2004) and with theexperiments I and II presented in this paper, which alsoshow very slight differences between the effects of subjectand object changes in semantic similarity.

    The results obtained for both approaches on the para-

    phrase recognition task show, as expected, that the weightscomputed from the Microsoft Paraphrase Corpus are moresuitable for the paraphrase recognition task. However, theresults obtained using the weights computed from thesemantic similarity task are only slightly different, showingagain that the weights used by human in different tasks arevery related. If we measure the performance of the para-phrase weights on the corpus used to compute the semanticsimilarity, we obtain a correlation coefficient of 0.924,which is slightly lower than the 0.974 obtained with theweights computed from this corpus. These results show thatboth combinations of weights are suitable for both tasks.

    Once again, the results of this experiment are relevantnot only from a cognitive point of view but also from a com-

    putational point of view. Four of the six approaches tested

    (PATH, JCN, LIN and VECTOR) outperform significantly themethod of  Mihalcea et al. (2006). This improvement wasfound to be significant ( p  < :05) using a parametric pairedt-test once we confirmed that data were normally dis-tributed by a Chi-Square test for the goodness of fit( p  < :05). Moreover, the results of two of them (JCN andVECTOR) are similar to the ones reported by   Islam andInkpen (2008)  in terms of accuracy of paraphrase recogni-

    tion, showing the importance of taking into account the dif-ferent importance of syntactic roles in the computation ofsemantic similarity.

    Conclusions

    This paper proposes an interdisciplinary approach to betterunderstand how our mind computes semantic similarity and,in particular, the different importance that humans give todifferent syntactic roles in the computation of semanticsimilarity. We made a psychological study about howhumans compute semantic similarity between sentencesand then we use a computational paradigm in order to test

    the validity of the hypotheses derived from that study. Firstof all, we present an empirical study that seeks for the dif-ferent weights given by humans to different syntactic roleswhen computing semantic similarity. Two experiments havebeen carried out to check the hypotheses that human beingstend to ignore similarities between segments with differentfunctional roles and that different syntactic roles have dif-ferent importance in their calculation of semantic similar-ity. The qualitative and quantitative results show thathumans give a great importance to verbs and a low impor-tance to adverbial complements in the computation of sen-tence semantic similarity. Furthermore, in our experimentswe found no significant difference between the importancegiven by humans to the subject and the object role, pointing

    that humans assign very similar weights to these syntacticroles.

    In order to assess the validity of the conclusions obtainedwith the experiments carried out with humans, we used acomputational paradigm. We incorporated the results ofthe empirical study to a psychologically plausible semanticsimilarity method described in Oliva et al. (2011) that takesinto account the influence of different syntactic roles on theoverall sentence meaning.

    The semantic similarity method was applied to a para-phrase recognition task using two different combinationsof weights obtained from twenty-seven human evaluatorsfor a semantic similarity task and from two human judg-

    ments for the same paraphrase recognition task. The resultsobtained with both versions confirm the different contribu-tion of different syntactic roles on semantic similarity com-putation. The different variations tested with the twocombinations of weights outperformed their non-weightedcounterparts. Furthermore, they obtained results similarto the ones reported by   Islam and Inkpen (2008) andMihalcea et al. (2006)  on the paraphrase recognition task.Moreover, four of the six approaches tested outperform sig-nificantly the method of   Mihalcea et al. (2006)   and theresults of three of them are similar to the ones reportedby Islam and Inkpen (2008). Finally, we compared the differ-ent weights given by humans to different syntactic roles on

    Table 6   Optimal weights for the paraphrase recognition

    task.

    Syntactic role Weight

    Subject 0.52791

    Verb 0.59672

    Object 0.47315

    Adverbial complement 0.22383

    130 J.I. Serrano et al.

  • 8/18/2019 j.bica.2015.04.007_759597398647

    11/13

  • 8/18/2019 j.bica.2015.04.007_759597398647

    12/13

    that hypothesis. Moreover, from a computational point ofview, would be interesting to merge the conclusions ofthose two studies in order to enhance the psychologicalplausibility of the proposed semantic similarity measure.Other trends of future work are related to the applicationof the proposed method to different natural language pro-cessing tasks that involve semantic similarity computationto some extent. This way, it could be observed whether

    humans keep on using similar weights when facing differenttasks.

    Acknowledgment

    This work has been funded by project PIE-201350E070.

    Appendix. Spanish test sentences

    Tables 9 and 10.

    Table 9   Spanish test sentences used in Experiment I.

    Syntactic role substituted Sentence

    Synonyms

    Subject El  muchacho animó a su amigo rápidamente

    Direct object El joven animó a su   colega rápidamente

    Adverb. comp. El joven animó a su amigo   velozmente

    Verb El joven  alentó a su amigo rápidamente

     AntonymsSubject El  viejo animó a su amigo rápidamente

    Direct object El joven animó a su  enemigo  rápidamente

    Adverb. comp. El joven animó a su amigo   lentamente

    Verb El joven   desanimó a su amigo rápidamente

    Table 10   Spanish test sentences used in Experiment II.

    Syntactic role substituted Sentence pair

    Subject Aquella  gema sorprendió al hombre

    Aquella joya sorprendió al hombre

    Aquella grúa sorprendió al hombreAquella  herramienta  sorprendió al hombre

    Aquel cristal  sorprendió al hombre

    Aquel mago  sorprendió al hombre

    Direct object El hombre vió la  gema  ayer

    El hombre vió la  joya  ayer

    El hombre vió la  grúa  ayer

    El hombre vió la   herramienta ayer

    El hombre vió el   cristal  ayer

    El hombre vió al  mago ayer

    Adverbial complement El hombre dejó mi bicicleta cerca de aquellas  gemas

    El hombre dejó mi bicicleta cerca de aquellas  joyas

    El hombre dejó mi bicicleta cerca de aquella  grúaEl hombre dejó mi bicicleta cerca de aquellas   herramientas

    El hombre dejó mi bicicleta cerca de aquel  cristal

    El hombre dejó mi bicicleta cerca de aquel  mago

    Verb El hombre  partió el contenedor

    El hombre  dividió el contenedor

    El hombre  aplastó el contenedor

    El hombre  partió el contenedor

    El hombre  vació el contenedor

    El hombre  situó el contenedor

    132 J.I. Serrano et al.

  • 8/18/2019 j.bica.2015.04.007_759597398647

    13/13

    References

    Achananuparp, P., Hu, X., Zhou, X., & Zhang, X. (2008). Utilizingsentence similarity and question type similarity to response tosimilar questions in knowledge-sharing community. InProceedings of the QAWeb 2008 workshop..

    Achananuparp, P., Hu, X., & Yang, C. C. (2009). Addressing thevariability of natural language expressions in sentence similaritywith semantic structure of the sentences. In Proceedings of the13th Pacific–Asia conference on knowledge discovery and data

    mining..Aliguliyev, R. M. (2009). A new sentence similarity measure and

    sentence based extractive technique for automatic text sum-marization.   Expert Systems with Applications, 36(4),7764–7772.

    Allen, J. (1995).   Natural language understanding. The Benjamin/Cummings Publishing Company, Inc..

    Bencini, G. M. L., & Goldberg, A. E. (2000). The contribution ofargument structure constructions to sentence meaning.  Journalof Memory and Language, 43(4), 640–651.

    Cambria, E., & White, B. (2014). Jumping NLP curves: A review ofnatural language processing research.   IEEE ComputationIntelligence Magazine, (May), 48–57..

    Corley, C., & Mihalcea, R. (2005). Measures of text semanticsimilarity. In   Proceedings of the ACL workshop on empiricalmodeling of semantic equivalence..

    De Jong, K. A. (Ed.). (2006).   Evolutionary computation: A unified approach. MIT Press.

    Dolan, W., Quirk, C., & Brockett, C. (2004). Unsupervised con-struction of large paraphrase corpora: Exploiting massivelyparallel news sources. In   Proceedings of the 20th internationalconference on computational linguistics..

    Fellbaum, C. (1998).  Wordnet: An electronic lexical database. MITPress.

    Gleitman, L., & Gillete, J. (1994). The handbook of child language.In P. Fletcher & B. MacWhinney (Eds.),  Chap. The role of syntax in verb learning. Blackwell.

    Goldstone, R. (1994). Similarity, interactive activation, and map-

    ping. Journal of Experimental Psychology, 20 

    (1), 3–28.Healy, A., & Miller, G. (1970). The verb as the main determinant ofsentence meaning.  Psychonomic Science, 20 .

    Islam, A., & Inkpen, D. (2008). Semantic text similarity usingcorpus-based word similarity and string similarity.   ACMTransactions on Knowledge Discovery from Data, 2(2), 1–25.

    Jackendoff, R. (2007). A parallel architecture perspective onlanguage processing.  Brain Research, 1146, 2–22.

    Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction tolatent semantic analysis.  Discourse Processes, 25, 259–284.

    Li, Y., McLean, D., Bandar, Z., O’Shea, J., & Crockett, K. A. (2006).Sentence similarity based on semantic nets and corpus statistics.IEEE Transactions on Knowledge and Data Engineering, 18 (8),1138–1150.

    Malaia, E., & Newman, S. (2014). Neural bases of event knowledgeand syntax integration in comprehension of complex sentences.Neurocase, 20 , 1–14.

    Mihalcea, R., Corley, C., Strapparava, C. (2006). Corpus-based andknowledge-based measures of text semantic similarity. InProceedings of the American association for artificial intelli-

     gence (AAAI 2006)..Oliva, J., Serrano, J. I., Del Castillo, M. D., & Iglesias, A. (2011).

    Symss: A syntax-based measure for short-text semantic similar-ity.  Data and Knowledge Engineering, 70 (4), 390–405.

    O’Shea, J., Bandar, Z., Crockett, K.A., & McLean, D. (2008).  Agentand multi-agent systems: Technologies and applications   (Vol.4953, pp. 172–181). Springer..

    Park, E. K., Ra, D. Y., & Jang, M. G. (2005). Techniques forimproving web retrieval effectiveness.   Information Processingand Management, 41(5), 1207–1223.

    Pedersen, T., Banerjee, S., & Patwardhan, S. (2005). MaximizingSemantic relatedness to perform word sense disambiguation

    (Research Report No. UMSI 2005/25). University of MinnesotaSupercomputing Institute ..

    Rubenstein, H., & Goodenough, J. B. (1965). Contextual correlatesof synonymy.  Communications of ACM, 8 (10), 627–633.

    Wiemer-Hastings, P. (2000). Adding syntactic information to LSA. InProceedings of the 22nd annual conference of the cognitive

    science society . Erlbaum.Wiemer-Hastings, P. (2004). All parts are not created equal: SIAM-

    LSA. In   Proceedings of the 26th annual conference of thecognitive science society . Erlbaum.

    Wiemer-Hastings, P., Wiemer-Hastings, K., & Graesser, A. (1999).How latent is latent semantic analysis?. In  Proceedings on thesixteenth international joint congress on artificial intelligence

    (pp. 932–937). Morgan Kaufman.

    Wiemer-Hastings, P., & Ziprita, I. (2001). Rules for syntax, vectorsfor semantics. In  Proceedings of the 23rd annual conference of the cognitive science society . Erlbaum.

    Yang, D., & Powers, D.M.W. (2006). Verb similarity on the taxonomyof WordNet. In  Proceedings of the third international WordNetconference (pp. 121–128). Masaryk University..

    Human-inspired semantic similarity between sentences 133

    http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015http://refhub.elsevier.com/S2212-683X(15)00016-X/h0020http://refhub.elsevier.com/S2212-683X(15)00016-X/h0020http://refhub.elsevier.com/S2212-683X(15)00016-X/h0020http://refhub.elsevier.com/S2212-683X(15)00016-X/h0020http://refhub.elsevier.com/S2212-683X(15)00016-X/h0025http://refhub.elsevier.com/S2212-683X(15)00016-X/h0025http://refhub.elsevier.com/S2212-683X(15)00016-X/h0025http://refhub.elsevier.com/S2212-683X(15)00016-X/h0025http://refhub.elsevier.com/S2212-683X(15)00016-X/h0025http://refhub.elsevier.com/S2212-683X(15)00016-X/h0040http://refhub.elsevier.com/S2212-683X(15)00016-X/h0040http://refhub.elsevier.com/S2212-683X(15)00016-X/h0040http://refhub.elsevier.com/S2212-683X(15)00016-X/h0040http://refhub.elsevier.com/S2212-683X(15)00016-X/h0050http://refhub.elsevier.com/S2212-683X(15)00016-X/h0050http://refhub.elsevier.com/S2212-683X(15)00016-X/h0050http://refhub.elsevier.com/S2212-683X(15)00016-X/h0050http://refhub.elsevier.com/S2212-683X(15)00016-X/h0055http://refhub.elsevier.com/S2212-683X(15)00016-X/h0055http://refhub.elsevier.com/S2212-683X(15)00016-X/h0055http://refhub.elsevier.com/S2212-683X(15)00016-X/h0055http://refhub.elsevier.com/S2212-683X(15)00016-X/h0055http://refhub.elsevier.com/S2212-683X(15)00016-X/h0060http://refhub.elsevier.com/S2212-683X(15)00016-X/h0060http://refhub.elsevier.com/S2212-683X(15)00016-X/h0060http://refhub.elsevier.com/S2212-683X(15)00016-X/h0060http://refhub.elsevier.com/S2212-683X(15)00016-X/h0065http://refhub.elsevier.com/S2212-683X(15)00016-X/h0065http://refhub.elsevier.com/S2212-683X(15)00016-X/h0065http://refhub.elsevier.com/S2212-683X(15)00016-X/h0070http://refhub.elsevier.com/S2212-683X(15)00016-X/h0070http://refhub.elsevier.com/S2212-683X(15)00016-X/h0070http://refhub.elsevier.com/S2212-683X(15)00016-X/h0070http://refhub.elsevier.com/S2212-683X(15)00016-X/h0070http://refhub.elsevier.com/S2212-683X(15)00016-X/h0075http://refhub.elsevier.com/S2212-683X(15)00016-X/h0075http://refhub.elsevier.com/S2212-683X(15)00016-X/h0075http://refhub.elsevier.com/S2212-683X(15)00016-X/h0075http://refhub.elsevier.com/S2212-683X(15)00016-X/h0080http://refhub.elsevier.com/S2212-683X(15)00016-X/h0080http://refhub.elsevier.com/S2212-683X(15)00016-X/h0080http://refhub.elsevier.com/S2212-683X(15)00016-X/h0080http://refhub.elsevier.com/S2212-683X(15)00016-X/h0085http://refhub.elsevier.com/S2212-683X(15)00016-X/h0085http://refhub.elsevier.com/S2212-683X(15)00016-X/h0085http://refhub.elsevier.com/S2212-683X(15)00016-X/h0085http://refhub.elsevier.com/S2212-683X(15)00016-X/h0085http://refhub.elsevier.com/S2212-683X(15)00016-X/h0090http://refhub.elsevier.com/S2212-683X(15)00016-X/h0090http://refhub.elsevier.com/S2212-683X(15)00016-X/h0090http://refhub.elsevier.com/S2212-683X(15)00016-X/h0090http://refhub.elsevier.com/S2212-683X(15)00016-X/h0100http://refhub.elsevier.com/S2212-683X(15)00016-X/h0100http://refhub.elsevier.com/S2212-683X(15)00016-X/h0100http://refhub.elsevier.com/S2212-683X(15)00016-X/h0100http://refhub.elsevier.com/S2212-683X(15)00016-X/h0100http://refhub.elsevier.com/S2212-683X(15)00016-X/h0110http://refhub.elsevier.com/S2212-683X(15)00016-X/h0110http://refhub.elsevier.com/S2212-683X(15)00016-X/h0110http://refhub.elsevier.com/S2212-683X(15)00016-X/h0110http://refhub.elsevier.com/S2212-683X(15)00016-X/h0110http://www.patwardhans.net/papers/PedersenBP05.pdfhttp://www.patwardhans.net/papers/PedersenBP05.pdfhttp://refhub.elsevier.com/S2212-683X(15)00016-X/h0120http://refhub.elsevier.com/S2212-683X(15)00016-X/h0120http://refhub.elsevier.com/S2212-683X(15)00016-X/h0120http://refhub.elsevier.com/S2212-683X(15)00016-X/h0120http://refhub.elsevier.com/S2212-683X(15)00016-X/h0125http://refhub.elsevier.com/S2212-683X(15)00016-X/h0125http://refhub.elsevier.com/S2212-683X(15)00016-X/h0125http://refhub.elsevier.com/S2212-683X(15)00016-X/h0125http://refhub.elsevier.com/S2212-683X(15)00016-X/h0130http://refhub.elsevier.com/S2212-683X(15)00016-X/h0130http://refhub.elsevier.com/S2212-683X(15)00016-X/h0130http://refhub.elsevier.com/S2212-683X(15)00016-X/h0130http://refhub.elsevier.com/S2212-683X(15)00016-X/h0130http://refhub.elsevier.com/S2212-683X(15)00016-X/h0135http://refhub.elsevier.com/S2212-683X(15)00016-X/h0135http://refhub.elsevier.com/S2212-683X(15)00016-X/h0135http://refhub.elsevier.com/S2212-683X(15)00016-X/h0135http://refhub.elsevier.com/S2212-683X(15)00016-X/h0135http://refhub.elsevier.com/S2212-683X(15)00016-X/h0140http://refhub.elsevier.com/S2212-683X(15)00016-X/h0140http://refhub.elsevier.com/S2212-683X(15)00016-X/h0140http://refhub.elsevier.com/S2212-683X(15)00016-X/h0140http://refhub.elsevier.com/S2212-683X(15)00016-X/h0140http://refhub.elsevier.com/S2212-683X(15)00016-X/h0140http://refhub.elsevier.com/S2212-683X(15)00016-X/h0140http://refhub.elsevier.com/S2212-683X(15)00016-X/h0140http://refhub.elsevier.com/S2212-683X(15)00016-X/h0135http://refhub.elsevier.com/S2212-683X(15)00016-X/h0135http://refhub.elsevier.com/S2212-683X(15)00016-X/h0135http://refhub.elsevier.com/S2212-683X(15)00016-X/h0135http://refhub.elsevier.com/S2212-683X(15)00016-X/h0130http://refhub.elsevier.com/S2212-683X(15)00016-X/h0130http://refhub.elsevier.com/S2212-683X(15)00016-X/h0130http://refhub.elsevier.com/S2212-683X(15)00016-X/h0125http://refhub.elsevier.com/S2212-683X(15)00016-X/h0125http://refhub.elsevier.com/S2212-683X(15)00016-X/h0125http://refhub.elsevier.com/S2212-683X(15)00016-X/h0120http://refhub.elsevier.com/S2212-683X(15)00016-X/h0120http://www.patwardhans.net/papers/PedersenBP05.pdfhttp://www.patwardhans.net/papers/PedersenBP05.pdfhttp://refhub.elsevier.com/S2212-683X(15)00016-X/h0110http://refhub.elsevier.com/S2212-683X(15)00016-X/h0110http://refhub.elsevier.com/S2212-683X(15)00016-X/h0110http://refhub.elsevier.com/S2212-683X(15)00016-X/h0100http://refhub.elsevier.com/S2212-683X(15)00016-X/h0100http://refhub.elsevier.com/S2212-683X(15)00016-X/h0100http://refhub.elsevier.com/S2212-683X(15)00016-X/h0090http://refhub.elsevier.com/S2212-683X(15)00016-X/h0090http://refhub.elsevier.com/S2212-683X(15)00016-X/h0090http://refhub.elsevier.com/S2212-683X(15)00016-X/h0085http://refhub.elsevier.com/S2212-683X(15)00016-X/h0085http://refhub.elsevier.com/S2212-683X(15)00016-X/h0085http://refhub.elsevier.com/S2212-683X(15)00016-X/h0085http://refhub.elsevier.com/S2212-683X(15)00016-X/h0080http://refhub.elsevier.com/S2212-683X(15)00016-X/h0080http://refhub.elsevier.com/S2212-683X(15)00016-X/h0075http://refhub.elsevier.com/S2212-683X(15)00016-X/h0075http://refhub.elsevier.com/S2212-683X(15)00016-X/h0070http://refhub.elsevier.com/S2212-683X(15)00016-X/h0070http://refhub.elsevier.com/S2212-683X(15)00016-X/h0070http://refhub.elsevier.com/S2212-683X(15)00016-X/h0065http://refhub.elsevier.com/S2212-683X(15)00016-X/h0065http://refhub.elsevier.com/S2212-683X(15)00016-X/h0060http://refhub.elsevier.com/S2212-683X(15)00016-X/h0060http://refhub.elsevier.com/S2212-683X(15)00016-X/h0055http://refhub.elsevier.com/S2212-683X(15)00016-X/h0055http://refhub.elsevier.com/S2212-683X(15)00016-X/h0055http://refhub.elsevier.com/S2212-683X(15)00016-X/h0050http://refhub.elsevier.com/S2212-683X(15)00016-X/h0050http://refhub.elsevier.com/S2212-683X(15)00016-X/h0040http://refhub.elsevier.com/S2212-683X(15)00016-X/h0040http://refhub.elsevier.com/S2212-683X(15)00016-X/h0025http://refhub.elsevier.com/S2212-683X(15)00016-X/h0025http://refhub.elsevier.com/S2212-683X(15)00016-X/h0025http://refhub.elsevier.com/S2212-683X(15)00016-X/h0020http://refhub.elsevier.com/S2212-683X(15)00016-X/h0020http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015http://refhub.elsevier.com/S2212-683X(15)00016-X/h0015