Anderson Rossanez Semi-Automatic Checklist-Based Quality ...repositorio.unicamp.br/bitstream/REPOSIP/325406/1/Rossanez_Ande… · Anderson Rossanez Semi-Automatic Checklist-Based

Universidade Estadual de CampinasInstituto de Computação

INSTITUTO DECOMPUTAÇÃO

Anderson Rossanez

Semi-Automatic Checklist-Based Quality Assessment of

Natural Language Requirements

Avaliação Semi-Automática de Qualidade de Requisitos

em Língua Natural baseada em Checklist

CAMPINAS

2017

Anderson Rossanez

Semi-Automatic Checklist-Based Quality Assessment of Natural

Language Requirements

Avaliação Semi-Automática de Qualidade de Requisitos em

Língua Natural baseada em Checklist

Dissertação apresentada ao Instituto deComputação da Universidade Estadual deCampinas como parte dos requisitos para aobtenção do título de Mestre em Ciência daComputação, na área de Engenharia daInformação.

Thesis presented to the Institute of Computingof the University of Campinas in partialful�llment of the requirements for the degree ofMaster in Computer Science, in theInformation Engineering area.

Supervisor/Orientadora: Prof. Dr. Ariadne Maria Brito Rizzoni CarvalhoCo-supervisor/Coorientador: Prof. Dr. Marco Paulo Amorim Vieira

Este exemplar corresponde à versão �nal daDissertação defendida por AndersonRossanez e orientada pela Prof. Dr.Ariadne Maria Brito Rizzoni Carvalho.

CAMPINAS

2017

Agência(s) de fomento e nº(s) de processo(s): Não se aplica.

Ficha catalográficaUniversidade Estadual de Campinas

Biblioteca do Instituto de Matemática, Estatística e Computação CientíficaAna Regina Machado - CRB 8/5467

Rossanez, Anderson, 1981- R733s RosSemi-automatic checklist-based quality assessment of natural language

requirements / Anderson Rossanez. – Campinas, SP : [s.n.], 2017.

RosOrientador: Ariadne Maria Brito Rizzoni Carvalho. RosCoorientador: Marco Paulo Amorim Vieira. RosDissertação (mestrado) – Universidade Estadual de Campinas, Instituto de

Computação.

Ros1. Software - Controle de qualidade. 2. Lista de checagem. 3.

Processamento de linguagem natural (Computação). 4. Software de sistemas.I. Carvalho, Ariadne Maria Brito Rizzoni,1958-. II. Vieira, Marco Paulo Amorim.III. Universidade Estadual de Campinas. Instituto de Computação. IV. Título.

Informações para Biblioteca Digital

Título em outro idioma: Avaliação semi-automática de qualidade de requisitos em línguanatural baseada em checklistPalavras-chave em inglês:Software - Quality controlChecklistNatural language processing (Computer science)Systems softwareÁrea de concentração: Ciência da ComputaçãoTitulação: Mestre em Ciência da ComputaçãoBanca examinadora:Ariadne Maria Brito Rizzoni Carvalho [Orientador]Nuno Manuel dos Santos AntunesLeonardo MontecchiData de defesa: 27-07-2017Programa de Pós-Graduação: Ciência da Computação

Powered by TCPDF (www.tcpdf.org)

Universidade Estadual de CampinasInstituto de Computação

INSTITUTO DECOMPUTAÇÃO

Anderson Rossanez

Semi-Automatic Checklist-Based Quality Assessment of Natural

Language Requirements

Avaliação Semi-Automática de Qualidade de Requisitos em

Língua Natural baseada em Checklist

Banca Examinadora:

• Prof. Dr. Ariadne Maria Brito Rizzoni Carvalho (Orientadora)Instituto de Computação - UNICAMP

• Prof. Dr. Nuno Manuel dos Santos AntunesDepartamento de Engenharia Informática - Universidade de Coimbra

• Prof. Dr. Leonardo MontecchiInstituto de Computação - UNICAMP

A ata da defesa com as respectivas assinaturas dos membros da banca encontra-se noprocesso de vida acadêmica do aluno.

Campinas, 27 de julho de 2017

Dedicated to my parents,José Roberto Rossanez

andLéa Silvia Sian Rossanez.

Acknowledgments

First of all, I would like to thank my advisors, Professor Ariadne Carvalho and ProfessorMarco Vieira, for their guidance, attention, and their dedication on helping me in thedevelopment of this work.

I would also like to thank everyone involved in the DEVASSES project, for the support,exchange of knowledge, and also for the funding that made my secondment in Coimbra,Portugal, possible. It was both a pleasure and a privilege taking part in this project.

I am very grateful to the Institute of Computing, from the University of Campinas(IC/UNICAMP) for allowing me to develop this work, and to all of its faculty, sta�,and students for the exchange of knowledge, support, and for helping me out wheneverI needed. I am also grateful to the Department of Informatics Engineering, from theUniversity of Coimbra (DEI/UC) for receiving me, and allowing me to develop part ofthis work there.

A special thanks goes to Paulo C. Véras and Nuno Pedro Silva, for their time, ex-change of knowledge, and for providing documents that were extremely valuable for thedevelopment of this work.

I am also grateful for the funding granted by the Institute of Computing, from theUniversity of Campinas (IC/UNICAMP), and Fundo de Apoio ao Ensino, à Pesquisa eExtensão (FAEPEX), that allowed me to participate and present this work at the VIDEVASSES ToK Workshop, and at the 7th Latin-American Symposium on DependableComputing (LADC/2016).

I would like to thank Professor Eliane Martins, and Professor Leandro Villas, fortheir invaluable help in the validation of this work. Also, thanks to the statistician JoséRuy Porto de Carvalho, who kindly advised me on the data analysis of the performedexperiments. A very special thanks goes to all the students and professionals who kindlyvolunteered and helped with the experiments. Without their help, it would not be possibleto validate this work.

Finally, I would also like to thank my family, friends, and coworkers, for encouragingand keeping me motivated during all this time.

Thank you very much for making this work possible.

�The more we reduce ourselves tomachines in the lower things, themore force we shall set free to use

in the higher.�Anna C. Brackett

Resumo

Problemas com a especi�cação de requisitos são uma causa comum de defeitos de soft-ware. Em domínios como o das aplicações espaciais, tais defeitos acarretam um customuito alto, especialmente quando detectados em campo. É imperativo garantir que osrequisitos de software estejam bem escritos, de modo a evitar a introdução desses defeitosno �nal. A qualidade de documentos de requisitos de software é comumente veri�cadaem revisões baseadas em checklists, gerados a partir de padrões e em problemas encon-trados em projetos anteriores. Dada a importância dessa veri�cação, e do fato de queela é normalmente efetuada manualmente, nós propusemos um framework para auxiliarnesse processo, utilizando técnicas de processamento de língua natural. Uma ferramentafoi implementada de acordo com esse framework, visando diminuir o esforço dos revisorese reduzir a quantidade de erros não encontrados durante o processo de revisão. Foramconduzidos experimentos comparando a análise baseada em checklists conduzida manual-mente, com a análise semi-automática guiada pela ferramenta desenvolvida, considerandoo tempo de análise e a quantidade de erros detectados no documento de requisitos anal-isado. Os resultados indicam que o método de análise semi-automático, guiado peloframework proposto, apresenta uma melhoria em se considerando o método existente, queé conduzido manualmente.

Abstract

Problems with the speci�cation of software requirements are a common cause of softwaredefects. In domains such as space applications, those defects are very costly, especiallywhen detected after software deployment, when the product is already in the �eld. Itis imperative to ensure that software requirement documents are well written to avoidthe introduction of these defects. The quality of software requirements is frequentlyassessed via reviews guided by checklists, based on standards, and on problems foundin previous projects. Given the importance of quality assessment, and the fact that thereviews are performed manually, we propose a framework for assisting a checklist-basedreview of software requirements, using natural language processing techniques. A toolwas developed under this framework, whose objetive is to diminish the reviewer's e�ort,and to reduce the amount of uncaught errors during the reviewing process. Experimentswere conducted to compare the checklist-based analysis performed manually, against thesemi-automatic analysis guided by the developed tool, considering the analysis time andthe amount of errors detected in the requirements under analysis. The results indicatethat the semi-automatic analysis method, guided by the proposed framework, bringsimprovements to the existing manually conducted method.

List of Figures

1.1 PUS-based checklist question. . . . . . . . . . . . . . . . . . . . . . . . . . 171.2 CoFI-based checklist questions. . . . . . . . . . . . . . . . . . . . . . . . . 181.3 Review Item Discrepancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1 Search string. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Modi�ed search string. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 Proposed framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2 NLP Module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3 Parse tree from a requirement sentence. . . . . . . . . . . . . . . . . . . . . 373.4 A simple checklist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.5 A sample SRS document. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.6 A preprocessed �le. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.7 A CSV output �le. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1 Framework instantiated in the AutoChecklist implementation. . . . . . . . 46

5.1 Number of errors found by the subjects. . . . . . . . . . . . . . . . . . . . 535.2 Analysis times by the subjects. . . . . . . . . . . . . . . . . . . . . . . . . 545.3 Test hypotheses for experiment I. . . . . . . . . . . . . . . . . . . . . . . . 555.4 Test hypotheses for experiment II. . . . . . . . . . . . . . . . . . . . . . . . 55

B.1 Packages structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68B.2 Execution stages modules' classes. . . . . . . . . . . . . . . . . . . . . . . . 69B.3 Analysis modules' classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . 70B.4 Preprocessed �le. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71B.5 NLP module's classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76B.6 Error-based checklist resource. . . . . . . . . . . . . . . . . . . . . . . . . . 78B.7 Checklist module's classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . 78B.8 GUI classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80B.9 Initial GUI screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80B.10 Preprocessing screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81B.11 Analysis screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82B.12 Results screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83B.13 Reports screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84B.14 Spreadsheet screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

C.1 CLI usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86C.2 CLI output �les. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87C.3 GUI input types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88C.4 Failsafe text representation of a SRS document. . . . . . . . . . . . . . . . 89

C.5 Requirements Traceability Matrix as a text chunk. . . . . . . . . . . . . . . 90C.6 Requirements Traceability Matrix in CSV format. . . . . . . . . . . . . . . 90

D.1 Checklist resource structure. . . . . . . . . . . . . . . . . . . . . . . . . . . 93

List of Tables

2.1 NLP tools used in the identi�cation of the quality indicators. . . . . . . . . 252.2 Quality indicators detected by the quality assessment tools. . . . . . . . . . 282.3 NLP tools used by the quality assessment tools. . . . . . . . . . . . . . . . 29

5.1 Number of errors caught in the manual and semi-automatic analysis. . . . 525.2 Times (in minutes) of the manual and semi-automatic analysis. . . . . . . 545.3 P-Values for the Shapiro-Wilk normal distribution test. . . . . . . . . . . . 555.4 Sample sizes, means, standard deviations, and variances. . . . . . . . . . . 555.5 Di�erences between semi-automatic and manual values. . . . . . . . . . . . 565.6 Mean of di�erences, standard deviation, and t-values. . . . . . . . . . . . . 56

Contents

1 Introduction 161.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.2 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.4 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Literature Review 202.1 Search String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Search Engines and Databases . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.1 Web of Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.2 Scopus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.3 IEEExplore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.4 ACM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.5 Springer Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.6 Google Scholar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Critical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.4 NLP use on SRS Documents . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.1 Quality Indicators and NLP Tools . . . . . . . . . . . . . . . . . . . 232.4.2 Quality Assessment Tools and Methodologies . . . . . . . . . . . . 25

2.5 Applicability to our Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Methodology 313.1 Analysis Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.1.1 Preprocessing Module . . . . . . . . . . . . . . . . . . . . . . . . . 333.1.2 Analysis Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1.3 Output Generator Module . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 Utility Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.1 Text Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.2 File Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.3 Checklist Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.4 NLP Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 Working Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3.3 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 Considerations on the Methodology . . . . . . . . . . . . . . . . . . . . . . 43

4 Implementation 454.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Traceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3 Incompleteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.4 Incorrectness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.5 Inconsistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.6 Output Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.7 Text Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.8 NLP Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.9 Checklists Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.10 File Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.11 Considerations on the Implementation . . . . . . . . . . . . . . . . . . . . 50

5 Validation and Results 515.1 Expert Consultation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2.1 Experiment I: Number of Errors . . . . . . . . . . . . . . . . . . . . 525.2.2 Experiment II: Analysis Time . . . . . . . . . . . . . . . . . . . . . 535.2.3 Data Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . 54

5.3 Considerations on the Experiments . . . . . . . . . . . . . . . . . . . . . . 56

6 Conclusions and Future Work 586.1 Plain-text Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2 Requirements Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.3 Tool Con�guration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.4 Referencing External Documents . . . . . . . . . . . . . . . . . . . . . . . 606.5 Reviewing Unanswered Questions . . . . . . . . . . . . . . . . . . . . . . . 606.6 Analysis Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Bibliography 61

A Error-Based Checklist 65A.1 Questions about Lack of Traceability . . . . . . . . . . . . . . . . . . . . . 65A.2 Questions about Requirement Incompleteness . . . . . . . . . . . . . . . . 65A.3 Question about Incorrectness . . . . . . . . . . . . . . . . . . . . . . . . . 66A.4 Questions about Internal Con�ict/Inconsistency . . . . . . . . . . . . . . . 66

B Implementation Aspects for AutoChecklist 67B.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70B.2 Traceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72B.3 Incompleteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72B.4 Incorrectness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73B.5 Inconsistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73B.6 Output Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73B.7 Text Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74B.8 NLP Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

B.8.1 RequirementsInfoExtractor . . . . . . . . . . . . . . . . . . . . . . . 75B.8.2 DocumentSectionsExtractor . . . . . . . . . . . . . . . . . . . . . . 75B.8.3 RequirementsTraceabilityMatrixSectionExtractor . . . . . . . . . . 75

B.8.4 ExpressionExtractor . . . . . . . . . . . . . . . . . . . . . . . . . . 75B.8.5 EventActionDetector . . . . . . . . . . . . . . . . . . . . . . . . . . 75B.8.6 MissingNumericValueIndicativesDetector . . . . . . . . . . . . . . . 77

B.9 Checklists Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.10 File Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79B.11 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

B.11.1 Command Line Interface . . . . . . . . . . . . . . . . . . . . . . . . 79B.11.2 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . 79

C Running AutoChecklist 86C.1 Command Line Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

C.1.1 Available Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 86C.1.2 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

C.2 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88C.2.1 SRS Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88C.2.2 Preprocessed Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 88C.2.3 Analyzed Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

C.3 Iterative Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89C.4 Requirements Traceability Matrix in CSV format . . . . . . . . . . . . . . 90

D Adding new Checklists to AutoChecklist 92D.1 Changes to perform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

D.1.1 Checklist Resource . . . . . . . . . . . . . . . . . . . . . . . . . . . 92D.1.2 Checklist Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93D.1.3 Question Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93D.1.4 Analysis Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94D.1.5 Analysis Modules Factory . . . . . . . . . . . . . . . . . . . . . . . 94D.1.6 Orchestrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94D.1.7 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94D.1.8 NLP Utility Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 95D.1.9 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95D.1.10 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Chapter 1

Introduction

Software defects, when detected at later stages of software development, can be veryexpensive. If they are detected when the software product is already deployed in the�eld, the cost is even higher and, depending on the application, the consequences ofsuch defects can be catastrophic. A considerable number of software defects are relatedto problems in the software requirements that could have been avoided if the SoftwareRequirements Speci�cation (SRS) were well written and properly veri�ed. In areas suchas space applications, SRS documents are written almost entirely in Natural Language(NL), which makes them understood by a higher number of people, not only by thedomain specialists. But, unfortunately, this may result in issues inherent to NL, such asambiguity, incompleteness, misinterpretation, among others.

In order to detect problems in the software requirements, and to avoid �nding softwaredefects at the later development phases, researchers have proposed methods to assess thequality of SRS documents. Sommervile [30] states that the SRS document validationmust be performed considering the following:

1. Validity: An user may believe that a system is needed for certain functions; however,further studies and analysis may determine that aditional functions are necessary.

2. Consistency: The requirements must not be con�icting.

3. Completeness: The SRS document must contain all functions and restrictions thatan user may desire for a system.

4. Realism: The requirements' implementation must be feasible.

5. Veri�ability: The requirements must be veri�able.

Pressman [25] says that every requirement statement must be declared in a SRS doc-ument in a non-ambiguous format. The inconsistencies, missing statements and errorsmust be detected and �xed, and the �nal outcome must be in accordance with the estab-lished standards, given the project and the product. Pressman proposes a small checklist,containing questions like the following:

• Have the requirements been clearly established? Can they be misinterpreted?

16

CHAPTER 1. INTRODUCTION 17

• Has the source (for instance, person, document) been identi�ed? Has the �nalrequirement statement been examined by the source or in his/her presence?

• Is the requirement limited in quantitative terms?

• Which other requirements are related to this requirement?

Presman [25] also argues that the primary mechanism for the SRS validation is aformal technical review. Sommerville [30] suggests other techniques to deal with problemsin the requirements, such as prototyping, in which an executable model of the system ispresented to end-users, and test cases generation, where tests are created from the SRSdocument.

In the space applications domain, Véras et al. [36] propose an approach for SRS review,performed by domain specialists. Checklists, consisting of several yes/no questions, areused to measure the quality of the software requirements. They present three types ofchecklists: the �rst is based on the Packet Utilization Standard (PUS) [8]; the second, onthe Conformance and Fault Injection (CoFI) methodology [1], and the third, on problemsfound in previous projects.

The PUS-based checklist is used to assure that the SRS document complies with thestandard, de�ned by the European Cooperation for Space Standardization (ECSS). Thechecklist contains questions elaborated from two services present in the standard: 91questions for the telecommand veri�cation service, and 260 questions for the on-boardoperations scheduling service. An example of a question from the PUS-based checklist isshown in Figure 1.1.

If applicable to the mission, does the telecommand veri�cation service speci�ca-tion state that this service shall check whether the Source ID �eld correspondsto a valid source of the telecommand packet?

Figure 1.1: PUS-based checklist question.

The purpose of the CoFI-based checklist is to verify if the SRS handles situationsrelated to software failures. In the CoFI methodology, Finite State Machines (FSMs) aregenerated from the SRS document to produce test cases. In the work by Véras et al.[36], the questions for the checklist are generated from the FSMs which, in turn, are builtfrom the same two PUS services used in the PUS-based checklist: 35 questions for thetelecommand veri�cation service, and 23 questions for the on-board opeartions schedulingservice. Examples of the questions available in the CoFI-based checklist can be seen inFigure 1.2.


Q1: Does the requirement speci�cation state that the telecommand veri�cationservice shall send the telecommands received from the ground to its destinationprocess after its checking?Q2: Does the requirement speci�cation state that the telecommand veri�cationservice shall send a report of success acceptance to the ground station if this isrequested through the �rst bit set?Q3: Does the requirement speci�cation state that the veri�cation of the TC ex-ecution starting shall occur after the acceptance con�rmation by the destinationapplication process?

Figure 1.2: CoFI-based checklist questions.

The objective of the error-based checklist is to detect typical speci�cation mistakes.The checklist comes from seven SRS problem reports of three real past projects of aerospacesystems. Review Item Discrepancies (RIDs) were found and categorized as follows: ex-ternal con�ict/inconsistency, lack of traceability, external incompleteness, incorrectness,internal con�ict/consistency, application knowledge, readability, domain knowledge, andnon-usage of standard. A RID from the incorrectness category, and the checklist questiongenerated from it can be seen in Figure 1.3.

Title of the RID:

Requirement not properly capturedProblem description:

Req-15900 requirement is not possible to validate. Expressions like �in preferenceto� should not be usedQuestions elaborated from this RID:

Are all the requirements without the expression �in preference� indicating a non-mandatory feature of the system?

Figure 1.3: Review Item Discrepancy.

From the RIDs, 22 questions were generated, and divided into four categories:

1. Lack of Traceability : 5 questions for detecting problems related to missing or wrongtraceability, such as whether all requirements are traced to at least a system or aninterface requirement, or to the correct system or interface requirement.

2. Requirement Incompleteness : 4 questions for detecting if the requirements do notcontain terms like �TBD� (To Be De�ned) or �TBC� (To Be Con�rmed), and otherindications of incompleteness.

3. Incorrectness : 10 questions aiming to detect words or expressions indicating prob-lems in the requirements, such as �should�, �might�, or �in preference�.

4. Internal Con�ict/Consistency : 3 questions for checking if there are con�icts orother inconsistencies, like a reference to a �gure or table which is not related to therequirement, or showing con�icting information with the requirement.


All questions from the error-based checklist are shown in Appendix A. The PUS, CoFI,and error-based checklists are fully described in Véras' PhD Thesis [35].

1.1 Problem

As stated in the work by Véras et al. [36], the checklists were manually applied tothree di�erent projects, by six di�erent domain specialists. The process demands a greate�ort, and it takes much time to be completed. Moreover, it is an error-prone procedure,because reviewers might fail to detect some errors in large SRS documents, due to fatigueand stress. We believe that, if the process were semi-automated, this problem would bemitigated.

1.2 Hypothesis

Our working hypothesis is that it is possible to provide a method, based on previouslybuilt checklists and Natural Language Processing (NLP) techniques, to guide the review ofSRS documents. A series of relevant NLP tools and techniques, depending on the questionor questions' category whithin the checklists, are necessary. We also need to identify andextract relevant information from the SRS document, to be used in the semi-automatedanalysis.

1.3 Objectives

The aim of our work is to mitigate the problem described in Section 1.1. Speci�cally, weprovide a framework for applying checklists to SRS documents, using NLP techniques.We have implemented a tool under this framework, focusing on the error-based checklistdescribed in the work by Véras et al. [36].

1.4 Dissertation Structure

The next chapters are structured in the following manner: Chapter 2 contains a litera-ture review on NLP techniques in SRS documents; Chapter 3 presents the methodologyand proposed framework; Chapter 4 describes the framework's implementation; Chap-ter 5 shows the validation and results; �nally, Chapter 6 presents our conclusions andsuggestions for future work.

Chapter 2

Literature Review

In this chapter, we present the literature review, whose objective was searching methodsthat use NLP techniques in the quality assessment of software requirements. The reviewwas conducted following a series of steps in order to �nd relevant publications, understandthem in details, and to summarize information that could be used in our work.

Our literature review was performed from July/2015 to October/2015, taking a snap-shot on the existing state of the art [38] at that time period. Despite that, we keptreferring to the scienti�c literature during the whole extent of this work's development.We describe here how the review was conducted, and the obtained outcome from it.

The �rst step in the literature review was to determine the most appropriate searchstring to be used by the search engines and databases, which will be detailed posteriorly.An initial �ltering took place with the results from each search engine, considering onlythe publication title and abstract. Afterwards, a more restricted �ltering was appliedafter reading each publication, which resulted on a smaller set of related works. In thenext sections, we describe these steps in more details.

2.1 Search String

Considering the objective of this literature review, that is searching for works dealing withNLP techniques for SRS documents analysis, we used the search string shown in Figure2.1, to query search engines and databases. The search considered the following �elds:publication title, abstract, and keywords.

(�Natural language processing� OR �NLP�) AND (�software requirements speci-�cation� OR �SRS� OR �software requirements� OR �requirements engineering�)AND (�quality� OR �faults� OR �errors�) AND (�checklist� OR �benchmark�)AND ( �evaluation� OR �assessment� OR �analysis�)

Figure 2.1: Search string.

Although the checklists from the work by Véras et al. [36] used software requirementsfrom aerospace systems, we decided not to include this constraint in the search string,

20

CHAPTER 2. LITERATURE REVIEW 21

broadening the scope to every possible aplication or system. In this way, we found relevantworks from several areas.

2.2 Search Engines and Databases

We conducted the search on several search engines and scienti�c publication databases.In some cases we had to modify the search string slightly, due to speci�c query formatson the search engine, but those changes did not alter the search context. The search wasperformed considering the publication title, abstract, and keywords, and it was performedat all the databases from the search engines.

Another important information is that, due to access constraints on some of the searchengines, we performed the search, either at a physical location inside Unicamp, or outside,by connecting through Unicamp's Virtual Private Network (VPN). This allowed us fullaccess to all publications.

We will show in the next sections the results for each individual search engine.

2.2.1 Web of Science

The search on Web of Science returned a single publication as a result, the work by Lash etal. [18]. It presents a NLP approach to identify issues such as ambiguity, incompletenessand speci�cation issues in NL requirements. We could not �nd the full text version ofthis publication free of charge, so we contacted one of the authors, who kindly providedus not only the referred paper, but also another related work, by Lamar and Mocko[17], presenting a linguistic approach for decomposing requirement statements, with thepurpose of detecting issues such as ambiguity and incompleteness.

2.2.2 Scopus

The initial search on Scopus resulted in 23 publications. After checking their abstracts,we narrowed it down to 5 publications. Carlson and Laplante [2] describe the reverse-engineering of a discontinued tool built by NASA, which automatically analyzes require-ment documents and generates quality indicators for those documents; Carvalho et al. [4]present a strategy for generating test cases from NL requirements. Another work by thesame authors [3] takes as input requirements written in NL, parses them and generatesa domain speci�c language for specifying requirements, which are later used for model-based testing. Tichy and Körner [31] propose an adaptation of recent techniques on NLPto generate UML models, test cases and even source code from NL requirements. Finally,Ormandjieva et al. [23] propose a quality model for software requirements, and a systemto automate the process of aplying their model on SRS documents written in NL.

2.2.3 IEEExplore

This search returned no results initially. Therefore, we decided to search only for thesubparts related to NLP and software requirements, i.e., the �rst two sets of variations in


the original search string. The modi�ed search string can be seen in Figure 2.2.

(�Natural language processing� OR �NLP�) AND (�software requirements speci-�cation� OR �SRS� OR �software requirements� OR �requirements engineering�)

Figure 2.2: Modi�ed search string.

The outcome was only 6 publications, and after reading their abstracts, two wereconsidered for further reading: Hussain et al. [14], addressing the problem of providingautomated assistance on the quality assessment of software requirements by means of adecision-tree-based text classi�er; and Greghi et al. [10], presenting a method to semi-automatically generate extended �nite state machine models from a SRS document, usingNLP techniques.

2.2.4 ACM

The search returned 39 results. After reading their abstracts, we narrowed it down to 5relevant publications: Lee and Rine [19], addressing the problem of missing requirementson requirement speci�cations written in NL; Wagner et al. [37], dealing with the qualitymanagement of requirements using an approach of quality models; Huertas and Juarez-Ramires [12], presenting a tool for automatic evaluation of software requirements; Femmeret al. [7], proposing a light-weight technique for assessing a requirement as soon as it iswritten down, using NLP techniques; and Falessi et al. [6], presenting a characterizationof NLP techniques that can be used to detect equivalent requirements.

2.2.5 Springer Link

The search on this engine returned initially 91 results. After reading their abstracts, andremoving the ones previously found, we narrowed it down to 5 publications: Popescu et al.[24], proposing a semi-automatic method for identifying inconsistencies and ambiguities inNL requirements documents, and a prototype tool; Kof [15], also proposing an approachfor identifying ambiguities on NL requirements documents; Sardinha et al. [27], presentinga tool for automatically identifying con�icts on aspect-oriented requirements written inNL; Kof [16], presenting an approach for generating automata from NL description; andGenova et al. [9], presenting some indicators of quality in textual requirements, and atool that assesses such quality in an automated way.

2.2.6 Google Scholar

In this engine, a huge amount of results were shown, but there were too many resultsalready found by the other search engines, and further results not related to softwarerequirements or NLP. Reading the abstracts of the results not yet encountered previously,we found: Tichy and Körner [32], providing a review on works reducing ambiguity onrequirements documents written in NL, and Husain and Beg [13], presenting a benchmarkfor tools which process textual software requirements automatically.


2.3 Critical Analysis

Considering all the publications selected from the search engines, as described in theprevious section, we moved on to a more criterious analysis. At this time we performeda detailed reading of each publication in order to obtain both a systematic knowledgeon the research subject, and a critical view of the �eld of study. Detailed reading wasperformed in three steps, as recommended in [11, 34]:

In breadth: with the objective of grasping the text structure, we �rst read the title,abstract and introduction; then, we moved to the section headers without readingits content, and �nally, to the conclusions and known references. After that, we wereable to tell whether the publication presents a system, a theory or a methodology,as well as the related publications, the validity of the hypothesis, and the newcontributions.

Integrity check: we looked more carefully at the pictures, diagrams, graphics and def-initions, and also at the unknown references. After this step we were be able tosummarize the publication.

In depth: with the objective of questioning the hypothesis, and also contradicting theproofs, experiments and simulation scenarios. After this step we were be able toreconstruct the publication in details.

After reading all the publications, we were able to better categorize each of them. Inthe next section, we present this categorization.

2.4 NLP use on SRS Documents

The publications we have selected from the search engines, as described in the previoussections, have di�erent objectives. Some implement tools for performing quality assess-ment using speci�c methodologies, others describe quality indicators to be considered onthose assessments, some provide NLP techniques for this purpose, and also, some describehow to generate models from the requirement documents, like, for instance, UML. Someworks describe how to automatically generate test cases and source code from the SRSdocument.

We can summarize the publications dealing with NLP on software requirements inthree areas: Quality Indicators and NLP Tools, Quality Assessment Tools and Method-ologies,and Model Generation. As we do not consider Model Generation in our work,we present in the next subsections publications dealing with Quality Indicators and NLPTools, and Quality Assessment Tools and Methodologies.

2.4.1 Quality Indicators and NLP Tools

Several quality indicators can be used to measure the quality of NL SRS documents, andsome works implement methods to detect those quality indicators by means of NLP tools.


Lamar and Mocko [17] consider ambiguity, lack of understandability, incompleteness,and incorrect use of speci�city indicators, according to a methodology of analysis thatconsiders a three-tiered linguistic approach. On the �rst tier, a Part of Speech (PoS)tagger identi�es the di�erent types of words on the statement; the second tier is basedon the sentence structure, showing how the words relate to each other; on the thirdtier, the verb type is analyzed, as a direct result of the previous tiers. The result ofthe analysis is a controlled NL using a syntax specialized for both functional and non-functional requirements.

A quality model is presented in [23], aiming to determine if a requirement is ambigu-ous, stating that the comprehension of the text can be divided in two levels: surface andconceptual understanding. Surface understanding is how easy or hard it is to understandthe SRS document without judging the design or implementation context. The conceptualmeaning, on the other hand, is how much gain a developer would have in implementinga system by carefully reading the SRS document. It involves the interpretation of thedocument. Di�culty indicators in this level were divided into seven categories: noise,silence, over speci�cation, contradiction, ambiguity, forward reference, and wishful think-ing. In the previous surface understanding level, the di�culty indicators are be subdividedinto two groups: sentence level and discourse level. At the sentence level, the indicatorsare: ambiguous keywords (adjectives, adverbs, determiners, and models), and syntaticfeatures (such as word frequency, passive verbs, parentheses, and fragments). At thediscourse level, the indicators are: words per sentence, unique words, and frequency ofambiguous sentences.

A review presented in [13] points at several approaches used to detect and minimizeambiguity on NL SRS documents.

Some authors focused on NLP tools and techniques to deal with SRS documents. In[18], NLP tools such as taggers, parsers, classi�ers, and libraries are described. Theyare explained through examples of excerpts from requirement documents for militaryvehicles. Some referenced examples on each tool are provided, for instance, Penn-TreebankPart of Speech (PoS) tagger, the Stanford Parser, Naïve Bayes classi�er, Decision Treeclassi�er, Conditional Exponential classi�er, WEKA classi�er, and also, the WordNet[22] database. The authors also present a linguistic approach for requirements analysis tojudge their quality, identifying factors such as level of ambiguity, speci�city, completeness,and vagueness as quality indicators.


Table 2.1: NLP tools used in the identi�cation of the quality indicators.

Quality Indicator Parser PoS Tagger Classi�er Dictionary ThesaurusAmbiguity X X X

Incompleteness X XLack of

UnderstandabilityX X

Lack ofAtomicity

X X

Vagueness X X X XEquivalency X X X X X

As we can see in this subsection, several NLP techniques can be used to identify issuesrelated to quality indicators on software requirements. Those techniques are implementedusing a variety of NLP tools. We further describe the NLP tools in Chapter 3. Table 2.1presents a summary of the NLP tools used to detect the quality indicators.

2.4.2 Quality Assessment Tools and Methodologies

Some publications describe tools that implement speci�c methodologies to assess thequality of SRS documents written in NL, in an automatic or semi-automatic manner.

NASA's ARM tool [2] uses a quality model, developed internally at NASA, fromhistorical manual evaluation of several SRS documents. The quality model comprises �vequality indicators: ambiguity, completeness, understanding, volatility, and traceability.ARM tool does not measure volatility and traceability, but all the other indicators aremeasured employing statistical and document structure analysis.

In [14], a decision-tree text classi�er is used to decide if a requirement is ambiguous,according to the author's quality model presented in [23]. The classi�er searches for theindicators from that model by considering a corpus of SRS documents from a publicdomain containing annotations from human reviewers, who classi�ed the requirements asambiguous or not.

NLARE tool [12] performs quality analysis of requirements following some guidelinesto detect ambiguity, incompleteness, and the lack of atomicity. A requirement is ambigu-ous, if the text can be interpreted di�erently by di�erent readers; incomplete, if it doesnot provide all the required information; and non-atomic, if it has statements that couldbe divided into two or more di�erent requirements. NLARE tool's architecture is dividedinto modules, which initially prepare the requirement statement to be evaluated, at �rstsplitting its sentences, and then, tokenizing each sentence. After that, a spell checker isused, and then evaluations following the guideline are applied. In the requirements eval-uator module, three sub-modules search for each of the problems: atomicity, checking ifthe sentence can be split into smaller sentences; ambiguity, checking if there are types ofwords like adjectives, superlatives and others which can inject ambiguity; and complete-ness, checking if there are key elements in the requirement statement - actor, function


and detail. The NLP tools used to perform those evaluations are the PoS Tagger [33],Tag Detector (which uses a Tokenizer [20]), and Tokens Regex [5].

A tool that identi�es �smells� on software requirements is presented in [7]. Such�smells� are possible indicators of ambiguity and incompleteness. They are: ambiguousadverbs, vague pronouns, subjective language, comparative phrases, superlatives, negativestatements, non-veri�able terms, loopholes, and incomplete references. They are detectedon individual requirements statements, and with NLP techniques. At �rst, the require-ments are tagged using a PoS Tagger, helping to identify pronouns; then, a morphologicalanalysis of the tagged sentence checks if adjectives and adverbs are in the comparativeor superlative form; and, �nally, with the help of dictionaries the remaining �smells� areidenti�ed.

Considering ambiguity alone as a quality indicator, in [15] the authors propose twotools to help humans to identify ambiguity on NL SRS documents: the �rst tool analyzesa requirement statement and tells whether it is potentially ambiguous, and the second toolpresents the reasons why the statement has been classi�ed as potentially ambiguous. The�nal decision on the actual requirement statement ambiguity is under the responsibilityof an human reviewer. The procedure is, therefore, semi-automatic. In [15], the authorsprovide an extensive de�nition of ambiguity, as well as very extensive de�nitions forrequirements, design descriptions, prototyping, and experiments for both tools.

EA-Analyzer [27] identi�es con�icts on aspect-oriented NL requirements. Aspect-oriented requirements engineering deals with the composability of concerns. A concern isan entity which encapsulates one or more functional or non-functional requirements fora speci�c matter of interest. An aspect is a concern that intersects with other concerns.A composition of concerns is used to represent the interdependencies between the con-cerns, and also to detect potential con�icts. Before using the EA-Analyzer tool, anotherone, called EA-Miner, generates annotations from the NL requirements, and an humanreviewer creates compositions using those annotations.From the generated compositions, EA-Analyzer uses the bag of words approach to iden-tify terms as potentially con�icting. There are two bags of words: one with potentiallycon�icting terms, and another with terms found on non-con�icting compositions.

Requirements Quality Analyzer (RQA) [9] is built over a framework for measuringthe quality of SRS, presenting grades to each requirement, and also suggesting how therequirement could be improved to receive a better grade. The methodology considersproperties that requirements should have, in a qualitative and quantitative manner, andthe authors discuss how such properties could be measured. The properties are: lack ofambiguity, understandability, completeness, lack of con�icts, precision, consistency, trace-ability, validity, veri�ability, and modi�ability. The authors provide a detailed analysison the measurements of those indicators, and they discuss the attribution of grades basedon those indicators, both at the requirement's level, and also on a global level, attributinga quality grade to the document as a whole.

An interesting way of evaluating such tools is through the nlrpBench [32]. It is acollection of NL requirements speci�cations, intended to serve as comparison and challengeto both existing and yet to exist tools that process NL requirements documents. Thecollection is organized in two types of tasks: model extraction and text correction. Each


task is performed manually by specialists, and they are available to all the existing SRSdocuments in their database. They are separated by categories, like teaching examples,industrial application, standards, among others.

As we can see, there are several tools to detect quality issues on SRS documents. Someof them detect one type of issue speci�cally. Others, such as the RCA tool, identify severalissues, o�ering a more complete quality assessment. Ambiguity is the quality indicatorconsidered by the majority of the tools. We summarize the quality indicators consideredby the presented tools in Table 2.2. In Table 2.3 we show the NLP tools used by each ofthem. For NASA's ARM tool, we are considering only the parser, because in [2] there areno further descriptions of the NLP tools used. The same applies to the RCA tool, whichseems to be the most complete tool, but in [9] there is not a description of the tool, onlythe methodology used.


Table2.2:

Qualityindicators

detected

bythequ

alityassessmenttools.

QualityIndicator

ARM

[2]

Decision-tree

[14]

NLARE[12]

Smellsdetector

[7]

Toolsfrom

[15]

EA-A

nalyzer[27]

RQA

[9]

Ambiguity

XX

XX

XX

Incompleteness

XX

XX

Lackof

Und

erstandability

XX

Lackof

Atomicity

XX

Con�icts

XX

Lackof

Precision

X

Lackof

Consistency

X

Lackof

Traceability

X

Lackof

Validability

X

Lackof

Veri�ability

X

Lackof

Mod

i�ability

X


Table2.3:

NLPtoolsused

bythequ

alityassessmenttools.

NLPTool

ARM

[2]

Decision-tree

[14]

NLARE[12]

Smellsdetector

[7]

Toolsfrom

[15]

EA-A

nalyzer[27]

RQA

[9]

Parser

XX

XX

XX

XPoS

Tagger

XX

XX

Tag

Detector

XClassi�er

XX

TokensRegex

XDictionary

XX

Thesaurus

XBag

ofWords

X


2.5 Applicability to our Work

We propose a framework for applying checklist-based quality reviews on SRS documents,using NLP techniques, as described in Chapter 3. Unlike the methods presented here, ourframework does not consider a speci�c quality model or quality indicator in its analysis.The quality model changes according to the checklist used in the analysis, and the qualityindicators are implicit in the checklist's questions.

We have implemented our framework in a tool called AutoChecklist, that performs thequality assessment similarly to the ones just described in Section 2.4.2. The AutoChecklisttool, which is further described in Chapter 4, points out the issues on each requirement,like the Requirements Quality Analyzer (RQA) [9]. The main di�erence between themis that AutoChecklist performs the analysis considering checklists (more speci�cally, theerror-based checklist, developed by Véras et al. [36]).

Our framework is modular, a design inspired by tools like NLARE [12]. Its modulesdeal with speci�c checklist questions, and they rely on NLP techniques to do their jobs.The NLP techniques are implemented considering speci�c NLP tools, such as SentenceSplitters, Tokenizers, Parsers, PoS Taggers, and Tokens Regexes ( please refer to Chapter3 for further details on the NLP tools).

The NLP techniques are not deployed speci�cally to detect quality indicators, like theones described in this Chapter (e.g. ambiguity). The techniques are directed to deter-mine if the requirement text may contain issues predicted by the questions. For instance,some questions may check if there are speci�c terms or expressions in the requirementstext, which can be achieved using simple NLP tools (e.g. Tokens Regex). Some questionsmay require more complex techniques, such as detecting events in text, or even in thedocument preprocessing, where we need to extract the requirement statements. Thosetechniques may combine the use of di�erent NLP tools, such as Sentence Splitters, To-kenizers, PoS Taggers, Parsers, and Dictionaries. Those techniques and the NLP toolsused are described with more details in Chapter 3.

Chapter 3

Methodology

We propose a methodology for helping a reviewer to perform a checklist-based qualityassessment of SRS documents [26]. A framework for applying checklists to SRS docu-ments is provided. A tool, called AutoCheckList, was implemented under the framework,applying the error-based checklist provided by Véras et al. [36]. Details on the tool'simplementation are presented in Chapter 4.

In the proposed framework, some of the questions from the checklists are automaticallyanswered, and some are presented to the reviewer, who must give the �nal answer. Thefollowing types of answers are provided during the requirements' evaluation:

Yes: No issues were found, and the question could be answered automatically.

Possible Yes: No issues were found, but the question could not be answered automati-cally. The system points out that the reviewer's answer could possibly be �Yes�.

Warning: Possible issues were found, but the question could not be answered automat-ically. The reviewer must provide an answer manually.

Possible No: Issues were found, but the question could not be answered automatically.The system points to the reviewer that the answer could possibly be �No�.

No: Issues were found, and the question could be answered automatically.

Except for the �Yes� answer, all the others contain �ndings that are presented to thereviewer with instructions on how to proceed, and what to consider in the cases he/shemust provide the �nal answer. In the case of the �No� answer, the reason is also explainedin the �ndings to the reviewer. The frameworks's architecture is illustrated in Figure3.1. It is divided into two layers � analysis layer and utilities layer � which, in turn, areorganized in modules.

31

CHAPTER 3. METHODOLOGY 32

Figure3.1:

Proposed

fram

ework.


In the following sections, we present the two layers in details, as well as a workingexample.

3.1 Analysis Layer

In the analysis layer, the checklist-based analysis of a SRS document is conducted. Thislayer is responsible for three main activities: (1) document's preprocessing, when relevantinformation is extracted from the document; (2) document's analysis, followed by questionanswering; and (3) reports' generation, when the results of the analysis are presented tothe reviewer. Each of these activities are performed by speci�c modules in the analysislayer. These modules are invoked from left to the right, according to the frameworkpresented in Figure 3.1.

In the preprocessing and analysis modules, NLP techniques are used when applicable(e.g. requirements are extracted from the text, sentences are parsed and tokenized, wordsare tagged by a part-of-speech tagger, and so on). These processes are conducted by theutility classes from the NLP module, described in more detail in section 3.2.4.

When the analysis is �nished, the output generator module generates reports to thereviewer. The modules from the analysis layer are described in detail in the followingsubsections.

3.1.1 Preprocessing Module

The preprocessing module deals with the input, that is, the SRS document. It per-forms three operations: (1) converts the SRS document to plain-text; (2) extracts therequirements and other relavant information from the text (e.g. document sections andtraceability matrix); and (3) saves a �le containing the extracted data into an internalrepresentation (e.g. XML), so that it serves as input to the following modules.

3.1.2 Analysis Modules

The analysis modules are responsible for analyzing the requirements extracted from thetext, and for providing answers to the questions from the checklist. As previously ex-plained, the possible answers are: �Yes�, �Possible Yes�, �Warning�, �Possible No�, and�No�. NLP techniques are used in this analysis.

The number of analysis modules may vary, depending on the checklist that is beingused. Each analysis module deals with a subset of questions from the checklist. Forinstance, we may consider the error-based checklist from Véras et al. [36], in whichthe questions are split into four categories. We could have four analysis modules todeal with questions from each category � Traceability, Incompleteness, Incorrectness, andInconsistency modules � as it can be seen in Appendix A.

The analysis modules are responsible for providing answers to the checklist questions,after having evaluated each requirement. An answer is assigned to each possible pair ofquestion/requirement. The answers may provide textual �ndings stating the reason for


that answer, or instructions to be considered by the reviewer when providing his/her �nalanswer.

3.1.3 Output Generator Module

The output generator module gathers all the answers and �ndings from the previousphase, and presents them to the reviewer. The output can be presented in several ways:tables, which, in turn, could be ordered by types of answers; requirements and questions;textual format, organized in views, showing all the requirements with their answers and�ndings; and all questions, with all the answers per requirement.

3.2 Utility Layer

The modules from this layer support the work of the upper layer's modules. Each modulein the utility layer provides a number of utility classes, applying techniques or proceduresthat are necessary in the preprocessing, analysis, and output generation modules fromthe analysis layer. The utility classes' implementation may use third-party componentsor libraries, if necessary.

We can see how the utility modules are organized in Figure 3.1. In these modules, theutility classes are shown at the top, and third-party components at the bottom of eachutility module. The utility layer's modules are further described in the next subsections.

3.2.1 Text Module

The text module provides a single utility class that converts a SRS document �le intoplain-text format, to be used by the preprocessing module from the analysis layer. Theutility class handles a variety of document formats, such as Portable Document Format(PDF), Rich Text Format (RTF), Microsoft Word (DOC/DOCX), among others.

3.2.2 File Module

The �le module provides utility classes that generate di�erent �le formats, which are usedto produce the preprocessing module's output, and also by the output generator module.Examples of such formats are: eXtensive Markup Language (XML), Comma-SeparatedValue (CSV), and HyperText Markup Language (HTML).

3.2.3 Checklist Module

This module is responsible for providing one of the inputs to be used by the analysis mod-ules contained in the analysis layer. It generates a set of questions for the correspondinganalysis module. The questions are obtained from a static checklist representation, suchas XML. Every question is provided with its number and text contents, along with otherrelevant parameters that indicate how the analysis module should proceed when dealingwith this question.


3.2.4 NLP Module

The NLP module implements the NLP techniques that are needed at the preprocessingand analysis stages (e.g. requirements extraction, and actions/events detection). Thetechniques are implemented in this module's utility classes, represented at the top partof the module shown in Figure 3.2.

Figure 3.2: NLP Module.

In order to implement the NLP techniques, the utility classes must use NLP tools,such as Sentence Splitter, Tokenizer, PoS Tagger, Parser, Lemmatizer, and Tokens Regex.These tools are provided by a third party NLP toolkit, represented in the bottom leftportion of the module in Figure 3.2. The bottom right portion of the �gure also containsthe WordNet [22] database and its interface, used by the utility classes. In the sequence,we describe these NLP tools.

Sentence Splitter: The Sentence Splitter, as the name says, splits a large text intosentences, for posterior processing. Typically, it �nds sentence-ending characters (e.g. �.�,�?�, or �!�) to determine the end of a sentence. It considers cases when such charactersare not grouped with other characters (for instance, in section numbers such as 7.3.1).A sentence splitter may also be customized to determine the end of a sentence when aparagraph, or a speci�c number of line breaks are found.

We present an example showing a chunk of text as input to a sentence splitter, andits output containing the extracted sentences as follows:

• Input: `The data handling software shall support the device command distribution

service sub-type 1 distribute on-o� commands, as de�ned in section 7.3.1 of AD1.

All �elds shall be supported. Notice that address �eld type is enumerated.'


• Output: [`The data handling software shall support the device command distribu-

tion service sub-type 1 distribute on-o� commands, as de�ned in section 7.3.1 of

AD1.', Àll �elds shall be supported.', `Notice that address �eld type is enumerated.']

Tokenizer: A Tokenizer is a tool used to split text into meaningful elements, calledtokens. The tokens may be de�ned according to the context, and may be words, sentences,or phrases. The Sentence Splitter itself uses a form of tokenization.

An example of a sentence tokenized into words is presented as follows:

• Input: `Notice that address �eld type is enumerated.'

• Output: [`Notice', `that',àddress', `�eld', `type', ìs', ènumerated', `.']

PoS Tagger: A Part-of-Speech (PoS) Tagger is a tool that takes tokenized sentences insome language, and assigns parts of speech tags (e.g. noun, verb, adjective, etc.) to suchtokens.

Here is an example of a PoS-tagged sentence, using the Penn-Treebank [21] notation(e.g. NN: noun, VBN: past-participle verb, MD: Modal, etc.):

• Input: [`Notice', `that',àddress', `�eld', `type', ìs', ènumerated', `.']

• Output: [`Notice/NNP', `that/WDT',àddress/VBP', `�eld/NN', `type/NN', ìs/VBZ',ènumerated/VBN', `./.']1

Lemmatizer: A NLP tool that groups together the in�ected forms of a word, in away that they can be later analyzed as a single item. This is performed with the useof vocabulary and morphological analysis of the words, removing the in�ection ends andreturning the base or dictionary form of a word. The following are examples of lemmatizedwords:

am, are, is→ be

car, cars, car′s, cars′ → car.

Tokens Regex: A tool that de�nes patterns over sequences of tokens. It describes textas a sequence of tokens, that may be words, numbers, punctuation marks, among others.Let us consider as an example, that we would need to detect the expressions �in less than�,or �in more than�, followed by a number. They can be detected by the following TokensRegex rule:

/in/ /less|more/ /than/ {word::IS_NUM}1According to the Penn-Treebank notation:

NNP: Proper noun, singular

WDT: Wh-determiner

VBP: Verb, non-3rd person singular present

NN: noun

VBZ: Verb, 3rd person singular present


Parser: A NLP tool that determines the gramatical structure of sentences: which groupsof words go together as phrases, and which words are subject or object of a verb. Themost common output of this NLP tool is a parse tree. An example of a parse tree ispresented in Figure 3.3, where the root of the tree represents the sentence, and the nodesbelow it represent phrases (e.g. noun and verbal phrases). The leaves represent the partsof speech, that is, the terminals of the language. The whole tree is annotated usingPenn-Treebank tags.

WordNet: A large lexical database for the English language. It groups nouns, verbs,adjectives and adverbs into sets of synonyms, expressing a distinct concept. It recordsthe number of relations among those sets. Examples of such relations are hypernyms (e.g.canine is a hypernym of dog), and hyponyms (e.g. dog is a hyponym of canine). TheWordNet database can be seen as a combination of a dictionary and a thesaurus.

In the sequence, we present two examples of utility classes implementing NLP tech-niques that use those NLP tools.

NLP Utility Class Example I � Requirements Extraction: One example of aprocedure that may be performed by a NLP utility class is the requirements extraction.Considering that a SRS document does not contain only requirement statements, thisutility class must take the document's plain-text representation, and identify the parts ofthe text that are requirement statements.

A tipical requirement statement must have a modal verb (e.g. �shall�) at the beginningof a verbal phrase, like the one shown in the parse tree from Figure 3.3.

Figure 3.3: Parse tree from a requirement sentence.

The utility class executing this procedure checks for a verbal phrase started with amodal verb, in order to identify requirement statements. The entire procedure is described


in Algorithm 1, where the whole text is split into sentences in line 1, and each sentenceis tokenized and parts of speech are tagged, respectively in lines 4 and 5. In line 7, wecheck for modal verbs, and if they are found, we generate a parse tree in line 8. In line9 we traverse the parse tree, getting all the verbal subtrees. Finally, in lines 11 and 12we respectively get the leftmost child of each verbal subtree, and check if they are modalverbs, thus determining if such modal verbs are in the beginning of a verbal subtree.

Algorithm 1 Requirement Extraction

Require: text

1: Sentence[] sentences← NLP.splitSentences(text)

2: Sentence[] requirements {To store the found requirements}3: for all sentence ∈ sentences do4: Token[] tokens← NLP.tokenize(sentence)

5: PoSTagged[] taggedTokens← NLP.posTag(tokens)

6: for all tagged ∈ taggedTokens do7: if tagged.posTag =ModalV erb then

8: ParseTree root← NLP.parse(sentence)

9: ParseTree[] verbalPhraseTrees← traverseAndGetV erbalPhrases(root)

10: for all subTree ∈ verbalPhraseTrees do11: PosTagged leftMostChild← getLeftmostChild(subTree)

12: if leftMostChild.posTag =ModalV erb then

13: requirements.add(sentence) {Found a requirement sentence}14: end if

15: end for

16: end if

17: end for

18: end for

19: return requirements

NLP Utility Class Example II � Events Detection: Another example of a pro-cedure performed by a NLP utility class is the extraction of events from a text. Let usconsider the following question that could be available in a checklist:

• In the requirements describing actions taken by the software in the occurrence ofsome events, are all the possible events taken into account?

To answer this question, we need to determine if there are events in the text. Thiscan be done according to the procedure described in the work by Saurí et al. [28]. Theprocedure considers as events:

1. Verbs at the beginning of verbal phrases.

2. Nouns that are hypernyms of �event� and �phenomenon�.

3. Adjectives that are in a precompiled list of event adjectives.


To perform such veri�cations, we may use a handful of combined NLP tools. Theprocedure may be executed in four steps:

1. Tokenize the sentence, and then run a PoS tagger in the resulting tokens, to deter-mine their parts of speech.

2. If verbs are found among the tagged tokens, and are not weak stative verbs (e.g.�be�, �have�, etc.), we run a parser. From the generated parse tree, we can check ifthe verbs are at the beginning of verbal phrases, i.e., if they are the leftmost childfrom a verbal phrase subtree.

3. If there are nouns among the tagged tokens, consult WordNet to verify if they arehypernyms of either �event� or �phenomenon�.

4. If there are adjectives among the tagged tokens, lemmatize them, and determine ifthey match a precompiled list of adjectives.

The entire procedure is described in Algorithm 2, where the �rst step is performed inlines 4 and 5. The second step is performed from lines 17 to 26. The third, from lines 27to 32. Finally, the fourth step is performed from lines 33 to 38.


Algorithm 2 Events Detection

Require: sentence

1: Word[]verbCandidates, nounCandidates, adjCandidates

2: Word[]adjList {Contains a precompiled list of adjective events}3: Word[]events {To store the detected events}4: Token[] tokens← NLP.tokenize(sentence)

5: PoSTagged[] taggedTokens← NLP.posTag(tokens)

6: for all tagged ∈ taggedTokens do7: if tagged.posTag = V erb and not NLP.isWeakStativeV erb(tagged.value) then

8: verbCandidates.add(tagged)

9: end if

10: if tagged.posTag = Noun then

11: nounCandidates.add(tagged)

12: end if

13: if tagged.posTag = Adjective then

14: adjCandidates.add(tagged)

15: end if

16: end for

17: if verbCandidates not Empty then

18: ParseTree root← NLP.parse(sentence)

19: ParseTree[] verbalPhraseTrees← traverseAndGetV erbalPhrases(root)

20: for all subTree ∈ verbalPhraseTrees do21: PosTagged leftMostChild← getLeftmostChild(subTree)

22: if leftMostChild.posTag = V erb and verbCandidates.contains(leftmostChild.value)

then

23: events.add(leftmostChild.value)

24: end if

25: end for

26: end if

27: for all nounCandidate ∈ nounCandidates do28: Word[] hypernyms← WordNet.getHypernyms(nounCandidate)

29: if hypernyms.contain(“event′′) or hypernyms.contain(“phenomenon′′) then

30: events.add(nounCandidate)

31: end if

32: end for

33: for all adjCandidate ∈ adjCandidates do34: Lemma adjLemma← NLP.lemma(adjCandidate)

35: if adjList.contains(adjLemma) then

36: events.add(adjCandidate)

37: end if

38: end for

39: return events


3.3 Working Example

We present an example of the application of the proposed framework using a simplechecklist, containing only four questions, as shown in Figure 3.4.

1. Are all the requirements in the requirements traceability matrix?

2. Are all requirements without �TBD� (To Be De�ned)?

3. Are all requirements without the word �can�?

4. In the requirements describing actions taken by the software in the occur-rence of some events, are all the possible events taken into account?

Figure 3.4: A simple checklist.

We will apply the simple checklist from Figure 3.4 to a sample SRS document, con-taining only three requirements and a traceability matrix, as shown in Figure 3.5.

Req-1: The data handling software shall support the memory managementservice sub-type 2 load memory using absolute addresses, as de�ned in section11.3.2 of [AD1]. It can be assumed that SMALLEST-ADDRESSABLE-UNIT is1 octet.

Req-2: The acceptance testing shall be based on a demonstrator test case thatis still to be de�ned.

Req-3: The data handling software shall support the on-board storage andretrieval service sub-type 10 delete packet stores contents up to speci�ed packets,as de�ned in section 18.3.6 of [AD1]. If the �eld deletion set is zero, the entirepacket store is deleted.

Requirements Traceability Matrix

ReqID Drawing ObjectReq-1 DRWG1 External OBJReq-2 DRWG2 External OBJ

Figure 3.5: A sample SRS document.

In the next subsections, we describe how to instantiate the proposed framework toconduct the analysis using the simple checklist. We show how many analysis modulesshould be considered, and also, how the simple SRS document gets processed along thethree main framework's stages. The procedure could be implemented in a SRS qualityassessment tool using any programing language, NLP toolkits, and �le converting/buildingutilities.


3.3.1 Preprocessing

Considering the questions from the checklists, we notice that, besides the requirements,we will need the traceability matrix to perform the analysis. So, in the preprocessingstage, we need to extract this matrix from the SRS document.

The �rst step is to convert the SRS document from its original format (e.g. PDF,DOC, etc...) to plain-text format, using utility classes from the text module. After that,we can proceed to the requirements extraction. This may be achieved by using a NLPutility class implementing Algorithm 1.

Once the requirements are extracted, we need to extract the traceability matrix. Thiscan be achieved by another NLP utility class, which detects the name of the section inthe SRS document containing the traceability matrix, and then extracts its contents.

Finally, the preprocessing output �le containing the extracted data must be created.It must be built using utility classes from the �le module, and we consider a XML �le inthis working example. This �le is illustrated in Figure 3.6.

<SRS><requirements><requirement id=�Req-1�>The data handling software shall support the mem-ory management service sub-type 2 load memory using absolute addresses,as de�ned in section 11.3.2 of [AD1]. It can be assumed that SMALLEST-ADDRESSABLE-UNIT is 1 octet.</requirement><requirement id=�Req-2�>The acceptance testing shall be based on a demon-strator test case, that is still to be de�ned.</requirement><requirement id=�Req-3�>The data handling software shall support the on-board storage and retrieval service sub-type 10 delete packet stores contentsup to speci�ed packets, as de�ned in section 18.3.6 of [AD1]. If the �eld deletionset is zero, the entire packet store is deleted.</requirement></requirements><RTM>Requirement Drawing ObjectReq-1 DRWG1 External OBJReq-2 DRWG2 External OBJ</RTM></SRS>

Figure 3.6: A preprocessed �le.

3.3.2 Analysis

The number of analysis modules will depend on the number of similar questions fromthe checklist. In Figure 3.4, we can see one question that checks the traceability matrix;two questions that check for speci�c terms (�can� and �TBD�), and �nally, one questionthat checks if all the possible events are described in the requirement text. Thus, wemay consider three analysis modules, and we will call them traceability, expressions, andevents.

The traceability module will simply check if each requirement under analysis is present


in the traceability matrix. If it is the case, the answer to question 1 for that speci�crequirement will be �Yes�, and �No� otherwise.

The expressions module will check if the terms are available in the text of each require-ment under analysis. It will use a NLP utility class to detect them. This way, questions 2and 3 will be answered as �Yes� if their terms are not found in the requirement text, and�No� otherwise.

The events module needs to check if the text of each requirement under analysiscontains events. To achieve this, it must use a NLP utility class implementing Algorithm2. If no events are detected in the text, the answer to question 4 will be �Yes�. If events aredetected, the answer will be a �Warning�, and the reviewer must check if all the expectedevents are described, giving his/hers �nal answer as either �Yes� or �No�.

3.3.3 Output

When the analysis is �nished, the output generator module must produce the �nal output.A great variety of outputs may be generated, but we will consider for this example a CSV�le containing all the answers. The output generator must gather the answers from allthe questions and requirements, and use an utility class from the �le module to generatethe CSV �le. This output �le is illustrated in Figure 3.7.

�Req. ID�,�Question ID�,�Finding�,�Automatic Answer��Req-1�,�1��Yes��Req-1�,�2��Yes��Req-1�,�3�,�Contains can�,�No��Req-1�,�4��Yes��Req-2�,�1��Yes��Req-2�,�2�,�Contains TBD�,�No��Req-2�,�3��Yes��Req-2�,�4��Yes��Req-3�,�1�,�Not in traceability matrix�,�No��Req-3�,�2��Yes��Req-3�,�3��Yes��Req-3�,�4�,�Check if it contains all expected events�,�Warning�

Figure 3.7: A CSV output �le.

3.4 Considerations on the Methodology

The methodology presented in this Chapter was initially considered for solving a problemdescribed in the work by Véras et al. [36], that is speci�c for the domain of space systems.Despite that constraint, the proposed methodology is not domain-speci�c. It is genericenough to be used in the semi-automatic application of checklists in SRS documents fromdi�erent domains.


The framework can be implemented in a tool, using di�erent programming languages,such as Java and Python. There are several open source, third-party libraries, and o�-the-shelf software that may be used in the implementation of the utility modules. Forinstance, Stanford's CoreNLP, and NLTK are alternatives to be used in the NLP Module.

We have implemented a tool, the AutoChecklist tool, under this framework, in orderto validate the methodology, as it will be described in Chapters 4 and 5. This tool usesa single checklist to perform the quality assessment of SRS documents. But as we haveseen here, the framework supports any number of checklists in the analysis.

Chapter 4

Implementation

In order to evaluate the framework described in Chapter 3, we developed a checklist-basedSRS analysis assisting tool, the AutoChecklist tool. The AutoChecklist tool applies theerror-based checklist from the work by Véras et al. [36], described in Appendix A, on SRSdocuments. Although it has been implemented to use a single checklist, AutoChecklistcan be easily modi�ed to use other checklists. We show how this can be done in AppendixD.

Given the four categories of questions available in the error-based checklist, we con-sidered four analysis modules: Traceability, Incompleteness, Incorrectness, and Inconsis-tency. The other modules from the top layer of the framework, the preprocessing and theoutput generator, along with the modules from the utility layer, were also implemented.Figure 4.1 illustrates the framework instantiated in the AutoChecklist implementation.

AutoChecklist produces intermediary results that can be saved in �les after each pro-cessing stage. These �les can be used as input to resume the processing at the desiredstage. We describe all the inputs accepted by AutoChecklist, and all the possible ways torun it, in Appendix C.

The tool was implemented using Java programming language. In the next sections, wedescribe high level aspects of the implementation, more speci�cally, those regarding theframework. For a lower level description of the implementation, please refer to AppendixB.

45

CHAPTER 4. IMPLEMENTATION 46

Figure4.1:

Fram

eworkinstantiated

intheAutoC

hecklistim

plem

entation.


4.1 Preprocessing

The preprocessing module from AutoChecklist performs the same three actions describedin Chapter 3: converts the SRS document to plain-text, extracts the requirements andother necessary information, and generates an intermediary �le containing the extracteddata. All actions are performed by calling utility classes from the applicable modules.

In the AutoChecklist implementation, the module extracts from the SRS documentthe information necessary to answer the questions from the error-based checklist. Be-sides the requirements, it extracts the document sections' names, and the requirementstraceability matrix. The three initial utility classes from left to right in the NLP modulerepresentation, shown in Figure 4.1, are called in the process.

Once extracted, the data is stored in a XML �le, using the XML-building utility classavailable in the �le module.

4.2 Traceability

This is the �rst of the analysis modules implemented by AutoChecklist. It handles thequestions concerning the lack of traceability category, from the error-based checklist. Thisinformation is retrieved from the Traceability utility class in the checklist module.

The traceability questions check if all the requirements are present in the requirementstraceability matrix, and if their references are correct. Also, some questions verify if therequirements contain internal or external (other documents) references, or references tofunctions.

To answer the questions that check if the requirements are present in the traceabilitymatrix, the module checks if the requirement under analysis is available in the matrixextracted from the document in the preprocessing stage. As for the questions on refer-ences, if they are detected, and not found in the list of document sections also extractedfrom the document, they are considered external, or internal otherwise. Regarding thefunctions, they are also detected using a NLP utility class that contains a list of wordsdenoting a function.

4.3 Incompleteness

The four questions from the incompleteness category check for terms indicating either�To Be Con�rmed� (TBC), or �To Be De�ned� (TBD) in the requirements text, and also,missing numeric values, and cases of actions taken by the software in response to someevents, in which case the reviewer must check if all the possible events are covered. Theterms detection, missing numeric values, and actions detection are implemented by NLPutility classes. In case TBD or TBC are found, the answer to the questions will be �No�.In the other cases, warnings will be generated, explaining to the reviewer what he/sheshould check in each occurrence.


4.4 Incorrectness

The incorrectness category contains the greater number of questions in the error-basedchecklist, checking for terms and expressions that are not desired in a requirement state-ment. Also, it checks if there are numbers followed by units, so that the reviewer mayverify if their units and order of magnitude are correct. The terms, expressions, and num-bers followed by units are detected using NLP utility classes. When terms or expressionsare detected, the answer to the related question will be �No�. In the case of numbersfollowed by units, a warning will be generated to ask the reviewer for checking the unitand order of magnitude.

4.5 Inconsistency

The inconsistency module is the �nal analysis module implemented by AutoChecklist. Thethree questions from this category check if the information regarding the description ofactions and functions, their period or frequency, and watch dog descriptions, are consistentin every occurrence. The questions also consider references to tables and images withinthe SRS document. Unfortunately, it is not possible at the present time, to extractreliable information from tables and images. Therefore, the answers in this category willbe warnings, informing the reviewer that he/she should check whether the informationfound in the requirements text is also found on images and tables. The information isextracted from the requirements in the same way as in the Traceability and Incompletenessmodules.

4.6 Output Generator

As described in Chapter 3, the output generator module gathers all the answers and�ndings from the analysis modules, to generate the output. In AutoChecklist, two outputsare generated: one in HTML format, and another in a spreadsheet (CSV) format. All�les are generated by calling utility classes from the �le module. The output generatoralso provides an interface to the reviewer, for reviewing the unanswered questions, andgiving his/her �nal answers.

4.7 Text Module

The text module provides a single utility class that performs the plain-text conversion. Ittakes a �le in a variety of formats, such as PDF, DOC, DOCX, or RTF, and extracts theircontent in plain-text format. The utility class uses APIs from the Apache Tika library,that is integrated in the module, performing the actual conversion.


4.8 NLP Module

The NLP module provides six utility classes, implementing di�erent NLP techniques usedin the preprocessing and analysis modules. The NLP techniques' implementation usesNLP tools, such as Sentence Splitters, Tokenizers, PoS Taggers, Parsers, Lemmatizers, andTokensRegexes. Those tools are provided by the Stanford CoreNLP toolkit, integrated inthe module. Some utility classes also use WordNet resources, accessed by the MIT JavaWordNet Interface (JWI). Both the resources and the interfaces are also integrated in themodule.

The six utility classes provided by the NLP module are:

• RequirementsInfoExtractor: Implements the extraction of the requirements from theplain-text representation of the SRS document, according to Algorithm 1, presentedin Chapter 3. It uses a Sentence Splitter, a Tokenizer, a PoS Tagger, and a Parser.

• DocumentSectionsExtractor: Implements the extraction of the document sectionsfrom the plain-text representation of the SRS document, using a TokensRegex.

• RequirementsTraceabilityMatrixSectionExtractor: Implements the extraction of thecontents from the traceability matrix, using Sentence Splitters and a TokensRegex.

• ExpressionExtractor: Implements the extraction of terms and expressions, usingTokensRegex.

• EventActionDetector: Implements Algorithm 2, presented in Chapter 3. It usesa Sentence Splitter, a Tokenizer, a PoS Tagger, a Parser, a Lemmatizer, and theWordNet dictionary.

• MissingNumericValueIndicativesDetector: Identi�es if there are terms indicatingpossible missing numeric values. It uses a Sentence Splitter, a TokensRegex, and alist of indicative terms obtained from the WordNet.

4.9 Checklists Module

The checklists module provides utility classes that are used by the analysis modules toretrieve the checklist questions, categories, and actions to be considered in the analysis.This information is retrieved by the utility classes upon parsing a XML �le containingthe error-based checklist.

4.10 File Module

This module provides three utility classes to generate distinct �le types:

• CSVBuilder: Generates CSV �les using OpenCSV APIs, integrated in the module.

• HTMLBuilder: Generates HTML �les using JSoup utilities.

• XMLBuilder: Generates XML �les using JavaX utilities.


4.11 Considerations on the Implementation

The complete AutoChecklist implementation considers more than the framework aspectsdescribed in this Chapter. The implementation also covers two user interfaces (commandline and graphical), di�erent input types that may be handled by the tool, such as SRSdocuments, plain-text, preprocessed �les, and analyzed �les. It also covers how the out-puts are formatted (both the �nal and intermediary outputs). Appendix B describes in alower level, all the aspects considered in the implementation. Also, it describes the sourcecode organization, the classes and methods involved in each processing stage, UI �ows,worker threads, and use cases.

Chapter 5

Validation and Results

The validation of this work was held in two di�erent ways: (1) consultation with an expert,and (2) experiments comparing the manual checklist-based analysis of SRS documents,against the semi-automatic proposed method. In both validation cases, the AutoChecklisttool, presented in Chapter 4, was used.

In the following sections we describe, in detail, the two ways in which our proposedmethod was validated.

5.1 Expert Consultation

During the development of this work, Nuno Pedro Silva, who is a manager at Critical Soft-ware S.A., an international information systems and software company, based in Coimbra,Portugal, was consulted. He is an experienced professional in the analysis of SRS docu-ments for critical systems, and very familiar with the checklists from the work by Véraset al. [36].

In the �rst contact with Nuno, at the beginning of our work, he showed us how areviewer conducts the checklist-based analysis of a SRS document. He gave us valuableinsights, which helped us to propose our method, and also in the AutoChecklist develop-ment.

At the �nal stages of our work, we presented him a fully functional version of theAutoChecklist tool, performing a semi-automatic analysis of a SRS document. Nuno againprovided us feedbacks, and also suggested some additional functionalities to improve thesemi-automatic analysis.

Nuno also provided us a few SRS documents in di�erent formats from some of thecompany's past projects. These documents were used to improve the AutoChecklist'splain-text conversion, and the algorithms for requirement extraction.

We implemented some of Nuno's suggestions, and sent him the AutoChecklist tool's�nal version. As for the other suggestions, they are going to be considered in a futurework, as it will be discussed in Chapter 6. The �nal version of the AutoChecklist tool isthe one that was used in the experiments described in the next section.

51

CHAPTER 5. VALIDATION AND RESULTS 52

5.2 Experiments

In order to determine the validity of our working hypothesis, described in Section 1.2,we needed to check if, by using the proposed method, we would reduce the problemsdescribed in Section 1.1. More speci�cally, we had to check if using the proposed methodwould result in more errors being caught by the reviewer in the analysis, and if the timetaken to perform the analysis would be reduced, resulting in less e�ort required from thereviewer.

We conducted two di�erent experiments to measure and compare the number of er-rors caught by the manual and semi-automatic methods, and also the time spent in thechecklist-based analysis of SRS documents using the error-based checklist (available inAppendix A), from the work by Véras et al. [36]. The AutoChecklist tool was used inthe semi-automatic analysis cases.

We had 22 subjects randomly distributed among each of the two experiments (11subjects in each experiment). The subjects were 22 volunteers: 5 undergraduate students,9 graduate students, and 8 software engineers. Before the experiments, we explained tothe subjects how to apply the error-based checklist in a SRS document. We also explainedhow to run the AutoChecklist tool, and how to conduct the checklist-based analysis withit.

Further details on each experiment are described in the next two subsections. In the�nal subsection, we present the data analysis and results.

5.2.1 Experiment I: Number of Errors

This experiment was performed in two stages for each subject: a manual and a semi-automatic analysis of the same SRS document. The document was a reduced versionof the OBOSS-III SRS document [29], containing 10 of the original requirements anda traceability matrix. Besides the SRS document, the subjects had access to the PUSstandard document [8], whose sections were referenced by the SRS document.

In the �rst stage, each subject conducted a manual analysis of the SRS document.They were asked to write every caught error in a spreadsheet containing the requirementID, question number, and an optional comment. After �nishing the manual task, theyhad to submit the spreadsheet �le by e-mail.

In the second stage, the subjects conducted the semi-automatic analysis of the sameSRS document, using the AutoChecklist tool. At the end of the analysis, they saved theirreviewed �ndings from the tool, and submitted them by e-mail.

Once the experiment was �nished, we were able to compare the number of errors caughtby the subjects in the manual analysis against the errors caught in the semi-automaticanalysis. Table 5.1 shows the number of errors caught by each subject.

Table 5.1: Number of errors caught in the manual and semi-automatic analysis.

Manual 09 08 11 07 02 03 05 18 08 13 17Semi-Aut. 15 15 27 15 10 15 19 23 18 22 21


Figure 5.1 shows that all subjects caught more errors in the semi-automatic analysisthan in the manual one.

Figure 5.1: Number of errors found by the subjects.

5.2.2 Experiment II: Analysis Time

The second experiment was also conducted in two stages, similarly to the �rst experiment.The di�erence is that each subject conducted the manual analysis in one document, andthe semi-automatic analysis in a di�erent document. The �rst document, called �Docu-ment A�, was the same already described in the previous experiment. The second docu-ment, called �Document B�, was also a reduced version of the OBOSS-III SRS document[29], containing 10 requirements and a traceability matrix. The subjects also had accessto the PUS document [8], referenced by both SRS documents.

The two documents, although having the same number of pages, and also, the sameamount of requirements, contained di�erent requirements. This was done to avoid a biasedtime of analysis. If the same document was used, the time of the second stage could besigni�cantly smaller than the �rst stage, because the document would have been knownby the subjects.

In the �rst stage of the experiment, the subjects conducted the manual analysis ofDocument A. Similarly to the �rst experiment, they were asked to write down the errorsfound in a spreadsheet, and submit them by e-mail at the end of the procedure. The timeat the start of the analysis, and at the end were also written down by the subjects.

In the second stage, the subjects conducted the semi-automatic analysis of DocumentB, using the AutoChecklist tool. At the end of the analysis, they also saved their �ndings


in the tool, and submitted them by e-mail. The starting time, and the �nishing timewere written down. It is important to mention that the starting time was the time theactual analysis of the subject started. We did not consider the time taken by the tool topreprocess the document and generate the output.

At the end of the experiment, we were able to compare the amount of time taken byeach subject in the manual analysis against the time taken in the semi-automatic analysis.Table 5.2 shows the results.

Table 5.2: Times (in minutes) of the manual and semi-automatic analysis.

Manual 55 51 68 60 69 47 69 72 81 50 54Semi-Aut. 41 20 31 30 23 33 18 29 34 23 27

Figure 5.2 shows that all the subjects spent less time in the semi-automatic analysisthan in the manual one.

Figure 5.2: Analysis times by the subjects.

5.2.3 Data Analysis and Results

In both experiments we have a single variable to analyze: number of errors in experiment I,and analysis time in experiment II. We needed to check if there was a signi�cant di�erencebetween the means of the manual and semi-automatic data extracted in both experiments.

Therefore, we applied a paired one-tailed t-test to our data, since in each experiment,the same subjects performed both the manual, and the semi-automatic analysis. Anotherreason is that we want to determine if the mean of the semi-automatic data is signi�cantly


higher than the manual, considering the number of errors. Regarding the analysis time,we want to determine if the mean of the semi-automatic data is signi�cantly smaller thanthe manual data. To that end, we formulate the null (H0) and alternative (H1) hypothesisfor both experiments, as described in Figures 5.3 and 5.4.

H0: µSAe − µMe ≤ 0

H1: µSAe − µMe > 0

Where:µSAe: Mean of errors caught in the semi-automatic analysis

µMe: Mean of errors caught in the manual analysis

Figure 5.3: Test hypotheses for experiment I.

H0: µSAt − µMt ≥ 0

H1: µSAt − µMt < 0

Where:µSAt: Mean of semi-automatic analysis times

µMt: Mean of manual analysis times

Figure 5.4: Test hypotheses for experiment II.

Before applying the t-test, we determined if the data distributions are normal. Weapplied the Shapiro-Wilk test to all data sets, and since their p-values are higher than0.05, we may consider all data sets as normal. Table 5.3 shows the test results.

Table 5.3: P-Values for the Shapiro-Wilk normal distribution test.

Experiment I Experiment IIMethod Manual Semi-Aut. Manual Semi-Aut.p− value 0.626 0.741 0.451 0.920

The sample size (n), mean (µ), standard deviation (σ), and variance (σ2) are shownin Table 5.4, for the values observed in both experiments.

Table 5.4: Sample sizes, means, standard deviations, and variances.

Experiment I Experiment IIMethod Manual Semi-Aut. Manual Semi-Aut.

n 11 11 11 11µ 9.1818 18.1818 61.4545 28.0909σ 5.2119 4.8129 10.9486 6.7743σ2 27.1636 23.1636 119.8727 45.8909


In order to apply the paired t-test, we determined the di�erences between the semi-automatic and the manual analysis values for both experiments, as shown in Table 5.5.

Table 5.5: Di�erences between semi-automatic and manual values.

Experiment I:Error di�erences

(Semi-Aut. - Manual)06 07 16 08 08 12 14 05 10 09 04

Experiment II:Time di�erences

(Semi-Aut. - Manual)-14 -31 -37 -30 -46 -14 -51 -43 -47 -27 -27

Then we calculated the mean of the di�erences (µDif ), and their standard deviation(σDif ). Finally, we were able to calculate the t-values for both experiments, using thefollowing paired t-test formula:

t− value = µDifσDif√n

We present the calculated values in Table 5.6.

Table 5.6: Mean of di�erences, standard deviation, and t-values.

Experiment I Experiment IIµDif 9 -33.3636σDif 3.7417 12.6907

t− value 7.9776 -8.7194

In both experiments, the degree of freedom is d.o.f. = n− 1 = 10, and considering asigni�cance level of 0.05 for one-tailed t-tests, we have a critical t-value of t0.05 = 1.812.With all these values, we could evaluate the test hipoteses for both experiments.

Considering experiment I, we must have tExpI > t0.05 (right tail) to reject H0. Since7.9776 > 1.812, we can reject H0. For experiment II, we must have tExpII < −t0.05 (lefttail) to reject H0. Therefore, since −8.7194 < −1.812, we can also reject H0 in experimentII.

In summary, we can say that the number of errors caught in the semi-automaticanalysis is signi�cantly higher than the number of errors caught in the manual analy-sis. Similarly, we can say that the time taken to perform the semi-automatic analysis issigni�cantly smaller than the time used to perform the manual analysis.

5.3 Considerations on the Experiments

The experiments performed are a preliminary validation of the method. In order toperform a more reliable validation, it would be ideal to use complete SRS documents,along with their error reports, allowing us to perform a deeper analysis, considering falsepositive and false negative errors caught in the analyses performed by the subjects. We did


not have access to such error reports, as they were mostly related to con�dential projects,thus not enabling us to perform such analysis. As for using complete SRS documents,that would, unfortunately, demand a huge amount of time from each subject to performthe experiments, and we could not �nd volunteers with such free time available to helpus.

Another aspect that could allow a more reliable validation, would be using real domainspecialists as subjects. We used undergratuate and graduate students, which were unex-perienced, and not familiar to this kind of quality assessment. We used a few softwareengineering professionals with experience in dealing with software requirements docu-ments, and performing critical analysis on such documents, but they were not familiar tothe space applications domain. If we had used real domain specialists, that are familiarto the checklists, we could have a more reliable and �nal validation on the semi-automaticproposed analysis method.

Nevertheless, when we look at the data collected in the experiments, we can see thatthe results indicate that the semi-automatic method is better than the manual method ina way that the analysis time, and therefore the e�ort, is reduced in the semi-automaticanalysis in all cases. Regarding the number of errors caught, they are also increased in thesemi-automatic analyisis for all the subjects. Despite the large variation in the numberof errors per subject, caused due to their lack of experience, or even unfamiliarity withthe process, the semi-automatic analysis enabled the subjects to detect more errors inthe documents than the manual analysis. This happened because the AutoChecklist tool,besides detecting several errors automatically, also points directions to the reviewers sothat they are able to detect more errors in their �nal analysis, thus providing a good jobin assisting the analysis process.

Chapter 6

Conclusions and Future Work

The checklist-based analysis of SRS documents is a task that demands great e�ort to beaccomplished. It is also an error-prone procedure, which gets even worse when applyinglarge checklists to also large SRS documents.

In this dissertation, we proposed a semi-automatic method to conduct the checklist-based analysis of SRS documents. We developed a framework to assist the analysis ofSRS documents using natural language processing. In this framework, some of the check-list's questions are answered automatically, and warnings are generated for the others,containing information that guides the reviewer to provide the �nal answers.

We implemented a tool called AutoChecklist, based on the framework. The tool appliesthe error-based checklist from the work by Véras et al [36] to SRS documents. Although ituses a single checklist, the tool supports the addition of others (please refer to AppendixD for the detailed procedure).

The AutoChecklist tool was used to validate the proposed method. It was evaluated byan expert in the SRS analysis of critical systems, and also in two experiments comparingthe manual analysis against the proposed semi-automatic analysis. The �rst experimentcompared the amount of errors caught in both methods, and the second compared theanalysis time.

The results show that, by using the proposed method, the number of errors caughtin the SRS document was signi�cantly higher than in the manual method, and that theanalysis time was signi�cantly smaller than the manual method. This indicates that theproposed semi-automatic method demands less e�ort from the reviewer, and it is lesserror-prone than the manual method.

The proposed method certainly represents an advance considering the manual appli-cation of checklist-based analysis on SRS documents. It also opens several improvementopportunities that may be considered in future work. We discuss some of these opportu-nities in the next sections.

6.1 Plain-text Conversion

The plain-text conversion, that is the �rst task conducted at the preprocessing stage, isnot 100% accurate. In fact, it does not behave well when dealing with tables, images,

58

CHAPTER 6. CONCLUSIONS AND FUTURE WORK 59

headers, and footers in a document �le. In some cases, when these structures are convertedto plain-text, the resultant �le may contain misplaced or missing line breaks, mingledtexts, or even missing sentences. In the current version of the AutoChecklist tool, thereis a workaround procedure for dealing with such cases, described in Section C.3, fromAppendix C.

This problem in the plain-text conversion constitutes a real challenge to be exploredin future work. It could be solved either by implementing a new plain-text converter,customizing an existing one, or even by postprocessing the converted plain-text, wherethe possible issues are detected and the proper actions are taken towards having them�xed.

Another problem in the plain-text conversion is the original �le format. Document�les, such as PDF, DOC, and RTF are more common, but in some cases, SRS documentsmay be in spreadsheet format. The plain-text conversion should be able to handle a widerange of �le formats.

6.2 Requirements Extraction

Algorithm 1, described in Chapter 3, works well in identifying sentences structured asrequirements. But, in a few cases, some requirements are not extracted due to a failurein identifying a requirement ID before those sentences.

This could be avoided if the reviewers were given the possibility to inform the expectedformat for requirement IDs in the SRS document under analysis. The format couldbe provided as a regular expression. Another possibility would be using a classi�er torecognize such IDs, after being extensively trained using a long list of SRS documentscontaining manually annotated IDs.

Another problem is that the sentence structure analysis is time-consuming. In verylarge documents, it might take a considerable amount of time in the preprocessing stage.This could be mitigated if the reviewers could point out the document sections where therequirements are located, avoiding the analysis of unnecessary sentences.

The sentence structure analysis could also be used to identify possible missing re-quirements in the document. In some cases, sentences with the speci�ed structure innon-requirement statements should be explicit requirements. This could be used to gen-erate basic warnings, not pertaining the checklists, indicating a basic problem in the SRSdocument.

Finally, some SRS documents do not contain IDs in the requirements. In this case,internal IDs could be assigned to such requirements, in order to conduct the analysis.This strategy could also be used in the case of possible missing requirements, identi�edwithin the document contents. Aside from generating the basic warnings, indicating thatthey are not explicit requirements, they could be considered in the analysis.

CHAPTER 6. CONCLUSIONS AND FUTURE WORK 60

6.3 Tool Con�guration

Speci�cally regarding the AutoChecklist tool, we could have the possibility to performsome tweaks before the analysis. We could specify, for instance, only a few questions fromthe checklists to be evaluated, instead of the whole checklist.

This could also be useful in cases where some checklist's questions aim to identifyspeci�c terms/expressions. We could customize the terms/expressions, broadening ortightening the search. Regular expressions denoting units and their pre�xes could also beadded or removed from within the tool con�guration.

The possibility of the reviewer informing a regular expression de�ning the format ofthe requirement IDs, or the sections within the document that contain the requirements,could also be de�ned in speci�c settings within the tool.

6.4 Referencing External Documents

Some checklist's questions need to check if references to sections or requirements fromexternal documents are correct. Currently, this generates a warning, asking the reviewerto check that speci�c section/requirement in the external document.

Instead of providing as input only the SRS document to be analyzed, the referreddocuments could also be provided. It would need to be preprocessed, and informationextracted from it used in the SRS analysis. This change in the framework could turn awarning into a fully-automated answer.

In another common case, the requirements traceability matrix is provided as an ex-ternal document, spreadsheet, or even database �les. This could also be considered asinput, and new utility modules could be created, to handle and extract information fromthese �le types.

6.5 Reviewing Unanswered Questions

When warnings are generated, they need to be checked by the reviewer, who gives the �nalanswer. This e�ort could be mitigated by using machine learning techniques, assessingthe �nal answers provided to the most common warning occurrences.

This method could reduce even more the amount of warnings that need to be manuallyreviewed, as more answers could be learnt using machine learning techniques in a timebasis.

6.6 Analysis Metrics

Finally, the AutoChecklist tool could generate metrics when the analysis is �nished, suchas: number of extracted requirements, analysis time, number of questions answered au-tomatically, percentage of traceability in the SRS document, among other possible usefulmetrics that might add value to the semi-automatic analysis.

Bibliography

[1] Ana Maria Ambrosio, Eliane Martins, Nandamudi L Vijaykumar, and Solon V Car-valho. A conformance testing process for space applications software services. Journalof Aerospace Computing, Information, and Communication, 3(4):146�158, 2006.

[2] Nathan Carlson and Phil Laplante. The nasa automated requirements measurementtool: A reconstruction. Innov. Syst. Softw. Eng., 10(2):77�91, June 2014.

[3] Gustavo Carvalho, Flávia Barros, Florian Lapschies, Uwe Schulze, and Jan Peleska.Model-Based Testing from Controlled Natural Language Requirements, pages 19�35.Springer International Publishing, Cham, 2014.

[4] Gustavo Carvalho, Diogo Falcão, Flávia Barros, Augusto Sampaio, Alexandre Mota,Leonardo Motta, and Mark Blackburn. Nat2testscr: Test case generation from nat-ural language requirements based on {SCR} speci�cations. Science of Computer

Programming, 95, Part 3:275 � 297, 2014.

[5] Angel Chang. Stanford tokens regex. http://nlp.stanford.edu/software/

tokensregex.shtml, 2012. [Accessed on 11/23/2015].

[6] Davide Falessi, Giovanni Cantone, and Gerardo Canfora. A comprehensive charac-terization of nlp techniques for identifying equivalent requirements. In Proceedings of

the 2010 ACM-IEEE International Symposium on Empirical Software Engineering

and Measurement, ESEM '10, pages 18:1�18:10, New York, NY, USA, 2010. ACM.

[7] Henning Femmer, Daniel Méndez Fernández, Elmar Juergens, Michael Klose, IlonaZimmer, and Jörg Zimmer. Rapid requirements checks with requirements smells: Twocase studies. In Proceedings of the 1st International Workshop on Rapid Continuous

Software Engineering, RCoSE 2014, pages 10�19, New York, NY, USA, 2014. ACM.

[8] European Cooperation for Space Standardization. Space engineering: Ground sys-tems and operations - telemetry and telecommand packet utilization, 2003. Doc. No.ECSS-E-70-41A.

[9] Gonzalo Genova, José M. Fuentes, Juan Llorens, Omar Hurtado, and ValentínMoreno. A framework to measure and improve the quality of textual requirements.Requirements Engineering, 18(1):25�41, 2013.

[10] Juliana Galvani Greghi, Eliane Martins, and Ariadne Maria Brito Rizzoni Carvalho.Semi-automatic generation of extended �nite state machines from natural language

61

http://nlp.stanford.edu/software/tokensregex.shtml

http://nlp.stanford.edu/software/tokensregex.shtml

BIBLIOGRAPHY 62

standard documents. In Dependable Systems and Networks Workshops (DSN-W),

2015 IEEE International Conference on, pages 45�50, June 2015.

[11] William G. Griswold. How to read an engineering research paper. http://cseweb.ucsd.edu/~wgg/CSE210/howtoread.html. [Accessed on 10/14/2015].

[12] Carlos Huertas and Reyes Juárez-Ramírez. Nlare, a natural language processing toolfor automatic requirements evaluation. In Proceedings of the CUBE International

Information Technology Conference, CUBE '12, pages 371�378, New York, NY, USA,2012. ACM.

[13] Mohd Sahid Husain and Mohd Rizwan Beg. Advances in ambiguity less nl srs: A re-view. In IEEE International Conference on Engineering and Technology (ICETECH),pages 221�225. IEEE, 2015.

[14] Ishrar Hussain, Olga Ormandjieva, and Leila Kosseim. Automatic quality assessmentof srs text by means of a decision-tree-based text classi�er. In Proceedings of the

Seventh International Conference on Quality Software, QSIC '07, pages 209�218,Washington, DC, USA, 2007. IEEE Computer Society.

[15] Nadzeya Kiyavitskaya, Nicola Zeni, Luisa Mich, and DanielM. Berry. Requirementsfor tools for ambiguity identi�cation and measurement in natural language require-ments speci�cations. Requirements Engineering, 13(3):207�239, 2008.

[16] Leonid Kof. Translation of textual speci�cations to automata by means of discoursecontext modeling. In Martin Glinz and Patrick Heymans, editors, Requirements Engi-neering: Foundation for Software Quality, volume 5512 of Lecture Notes in Computer

Science, pages 197�211. Springer Berlin Heidelberg, 2009.

[17] Carl Lamar and Gregory M. Mocko. Linguistic analysis of natural language engi-neering requirement statements. In Proceedings of the 8th International Symposium

on Tools and Methods of Competitive Engineering, TMCE 2010, volume 1, pages97�111, 2010.

[18] Alex Lash, Kevin Murray, and Gregory Mocko. Natural language processing appli-cations in requirements engineering. Proceedings of the ASME Design Engineering

Technical Conference, 2(PARTS A AND B):541�549, 2012.

[19] Seok Won Lee and David C. Rine. Missing requirements and relationship discoverythrough proxy viewpoints model. In Proceedings of the 2004 ACM Symposium on

Applied Computing, SAC '04, pages 1513�1518, New York, NY, USA, 2004. ACM.

[20] Christopher Manning, Tim Grow, Teg Grenager, Jenny Finkel, and John Bauer.Stanford tokenizer. http://nlp.stanford.edu/software/tokenizer.shtml, 2015.[Accessed on 11/23/2015].

[21] Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building alarge annotated corpus of english: The penn treebank. Comput. Linguist., 19(2):313�330, June 1993.

http://cseweb.ucsd.edu/~wgg/CSE210/howtoread.html

http://cseweb.ucsd.edu/~wgg/CSE210/howtoread.html

http://nlp.stanford.edu/software/tokenizer.shtml

BIBLIOGRAPHY 63

[22] George A. Miller. Wordnet: A lexical database for english. Commun. ACM,38(11):39�41, November 1995.

[23] Olga Ormandjieva, Ishrar Hussain, and Leila Kosseim. Toward a text classi�cationsystem for the quality assessment of software requirements written in natural lan-guage. SOQUA'07: Fourth International Workshop on Software Quality Assurance

- In conjunction with the 6th ESEC/FSE Joint Meeting, pages 39�45, 2007.

[24] Daniel Popescu, Spencer Rugaber, Nenad Medvidovic, and Daniel M. Berry. Re-ducing ambiguities in requirements speci�cations via automatically created object-oriented models. In Barbara Paech and Craig Martell, editors, Innovations for Re-quirement Analysis. From Stakeholders Needs to Formal Designs, volume 5320 ofLecture Notes in Computer Science, pages 103�124. Springer Berlin Heidelberg, 2008.

[25] Roger Pressman. Software Engineering: A Practitioner's Approach. McGraw-Hill,Inc., New York, NY, USA, 6 edition, 2005.

[26] Anderson Rossanez and Ariadne M. B. R. Carvalho. Semi-automatic checklist qualityassessment of natural language requirements for space applications. In Seventh Latin-American Symposium on Dependable Computing (LADC), pages 123�126, Oct 2016.

[27] Alberto Sardinha, Ruzanna Chitchyan, Nathan Weston, Phil Greenwood, and AwaisRashid. Ea-analyzer: automating con�ict detection in a large set of textual aspect-oriented requirements. Automated Software Engineering, 20(1):111�135, 2013.

[28] Roser Saurí, Robert Knippen, Marc Verhagen, and James Pustejovsky. Evita: Arobust event recognizer for qa systems. In Proceedings of the Conference on Human

Language Technology and Empirical Methods in Natural Language Processing, HLT'05, pages 700�707, Stroudsburg, PA, USA, 2005. Association for ComputationalLinguistics.

[29] Keld Schultz. Onboard operations support software requirements speci�cation.http://spd-web.terma.com/Projects/OBOSS/Home_Page/index.htm, 2003. [Ac-cessed on 05/31/2016].

[30] Ian Sommerville. Software Engineering: (Update) (8th Edition) (International Com-

puter Science). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA,2006.

[31] Walter F. Tichy and Sven J. Körner. Text to software: Developing tools to close thegaps in software engineering. Proceedings of the FSE/SDP Workshop on the Future

of Software Engineering Research, FoSER 2010, pages 379�383, 2010.

[32] Walter F. Tichy, Mathias Landhäuÿer, and Sven J. Körner. nlrpBench: A Benchmark

for Natural Language Requirements Processing. KIT, Fakultät für Informatik, 2014.

[33] Kristina Toutanova. Stanford pos tagger. http://nlp.stanford.edu/software/

tagger.shtml, 2015. [Accessed on 11/23/2015].

http://spd-web.terma.com/Projects/OBOSS/Home_Page/index.htm

http://nlp.stanford.edu/software/tagger.shtml

http://nlp.stanford.edu/software/tagger.shtml

BIBLIOGRAPHY 64

[34] Jon Turner. How to write a great research paper. https://www.arl.wustl.edu/

~pcrowley/cse/591/writingResearchPapers.pdf. [Accessed on 10/14/2015].

[35] Paulo C. Véras, Emilia Villani, Ana Maria Ambrósio, and Henrique Madeira. Bench-marking software requirements documentation for space applications. Instituto Tec-nológico de Aeronáutica, 2011.

[36] Paulo C. Véras, Emilia Villani, Ana Maria Ambrosio, Marco Vieira, and HenriqueMadeira. A benchmarking process to assess software requirements documentationfor space applications. J. Syst. Softw., 100(C):103�116, February 2015.

[37] Stefan Wagner, Florian Deissenboeck, and Sebastian Winter. Managing quality re-quirements using activity-based quality models. In Proceedings of the 6th Interna-

tional Workshop on Software Quality, WoSQ '08, pages 29�34, New York, NY, USA,2008. ACM.

[38] Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Bjöorn Regnell, andAnders Wesslén. Experimentation in Software Engineering: An Introduction. KluwerAcademic Publishers, Norwell, MA, USA, 2000.

https://www.arl.wustl.edu/~pcrowley/cse/591/writingResearchPapers.pdf

https://www.arl.wustl.edu/~pcrowley/cse/591/writingResearchPapers.pdf

Appendix A

Error-Based Checklist

This Appendix uses the following abbreviations:

• TBC To Be Con�rmed

• TBD To Be De�ned

A.1 Questions about Lack of Traceability

1. In the traceability matrix of the software requirements document, are all the softwarerequirements traced to at least a system or an interface requirement?

2. In the traceability matrix of the software requirements document, are all the softwarerequirements traced to the correct system or interface requirement?

3. In the text of the software requirements in which there is a reference to a requirementor section of other document, is this reference traced to the correct requirement orsection?

4. In the text of the software requirements in which there is a reference to some func-tion, is this reference traced to the correct function?

5. In the text of the software requirements in which there is a reference to anotherrequirement or a section of the same document, is this reference traced to the correctrequirement or section?

A.2 Questions about Requirement Incompleteness

6. In the text of the software requirements in which there should be the speci�cationof a numeric value, is this numeric value de�ned?

7. Are all the requirements without �TBC�?

8. Are all the requirements without �TBD�?

65

APPENDIX A. ERROR-BASED CHECKLIST 66

9. In the requirements in which there is a description of the actions of the softwarewith the occurrence of some events, does this (these) requirement(s) consider all thepossible events in the situation considered in the text?

A.3 Question about Incorrectness

10. Are all the requirements without the word �should� indicating a non-mandatoryfeature of the system?

11. Are all the requirements without the word �might� indicating a non-mandatoryfeature of the system?

12. Are all the requirements without the word �can� indicating a non-mandatory featureof the system?

13. Are all the variables or numeric values with the correct unit? (when applicable)

14. Are all the variables or numeric values that have a unit with the correct order ofmagnitude?

15. Are all the requirements without the expression �in preference� indicating a non-mandatory feature of the system?

16. Are all the requirements without the expression �a few� near a numeric value causingan imprecise idea?

17. Are all the requirements without the expression �in some� near a numeric valuecausing an imprecise idea?

18. Are all the requirements without the expression �in less than� near a numeric valuecausing an imprecise idea?

19. Are all the requirements without the expression �in more than� near a numeric valuecausing an imprecise idea?

A.4 Questions about Internal Con�ict/Inconsistency

20. Are all the actions/functions of the software of a speci�c operation mode consistentwith the description of these actions/functions in other places of the same document(such as other requirement, table or �gure)?

21. If there is more than one place (requirement, �gure, table or list) in the samedocument de�ning the frequency or period of some action/function of the software,are they consistent with each other?

22. If there is more than one place (requirement, �gure, table or list) in the same docu-ment describing the functioning of the watch dog, is the behaviour of it consistentin these di�erent places? (Its execution timing base, for example)

Appendix B

Implementation Aspects for

AutoChecklist

The AutoChecklist tool has been implemented in Java Standard Edition (SE), version 8,programming language, using the Eclipse Integrated Development Environment (IDE),version Neon 2. It is packaged in an executable Java ARchive (JAR) �le. In orderto organize the source code in a manner that re�ects the framework architecture, theclasses implementing the modules from the framework's analysis layer are under thecom.autochecklist.modules package, and the ones implementing the modules from theframework's utility layer are under the com.autochecklist.utils package. Other existingpackages in the same level are: the com.autochecklist.base, containing classes representingentities used along the layers, like Requirement, Finding, Question and QuestionCategory;and com.autochecklist.ui, containing user interface classes. Figure B.1 shows the packageexplorer from Eclipse IDE, containing the packages structure.

Aside from source code, there are resource packages containing the checklist XML �les,scripts used by the output generator, regular expression rules, and WordNet database �les.There is also a �libs� directory containing the external JAR libraries: Stanford's CoreNLPtoolkit, MIT Java WordNet Interface, OpenCSV, and Apache Tika. These resources arealso shown in Figure B.1.

The execution starts up in a class called Main, and the entry point is Java's staticvoid main method. If no parameters are provided, AutoChecklist starts up its GraphicalUser Interface (GUI). In case parameters are provided, they will be evaluated, and ifcorrect, it will run its Command Line Interface (CLI). If the parameters are incorrect,usage instructions are shown, and the execution stops. Appendix C describes all thepossible parameters accepted by AutoChecklist.

Once determined whether GUI or CLI should be used, a class called Orchestrator isinvoked. This class guides the execution �ow, invoking the proper modules describedin the framework's analysis layer, depending on the provided input. Once each module�nishes its work, the Orchestrator invokes the next one, until the output generator isinvoked, the �nal module from the three predicted execution stages in the framework.

The modules from analysis layer are implemented in classes that extend theModule ab-stract class. They are PreProcessor, AnalysisModule, and OutputGenerator. Such classesmust implement the start method, where they coordinate their work. This inheritance is

67

APPENDIX B. IMPLEMENTATION ASPECTS FOR AUTOCHECKLIST 68

Figure B.1: Packages structure.


Figure B.2: Execution stages modules' classes.

illustrated in Figure B.2.The analysis modules are four, given the question categories of the error-based check-

list. They are implemented in java classes extending the AnalysisModule abstract class.In their constructor, they receive as parameter a QuestionCategory object, containing aset of questions from the checklist. The classes extending AnalysisModule must imple-ment the startAnalysis method, where the analysis code must remain. The startAnalysismethod has one parameter, a RequirementList object, containing the list of requirementsto be analyzed.

Analysis classes are created by a factory class called AnalysisModuleFactory, whichprovides a queue structure containing the instances of all the analysis classes. The queueis used by the Orchestrator object, which invokes the analysis classes in the given order.This is illustrated by Figure B.3.

In the analysis process, all the Requirement objects are iterated over each Question ob-ject available in the current analysis module class. The default value for a question answeris �Yes�. When processing a question, the speci�c NLP techniques are used dependingon a QuestionAction parameter from the Question class. The QuestionAction has a typeand a sub-type, indicating the NLP technique that should be used in the requirementtext for answering that question. If an issue is detected in the requirement, a Finding

object is created, and a pair (Requirement, Question) is associated to it. The Find-

ing constructor takes as parameters: a question unique identi�er, a requirement uniqueidenti�er, a text explaining the issue encountered, and the answer considered for thatquestion/requirement pair. The possible answers are assigned in the situations describedin this chapter's introductory section.


Figure B.3: Analysis modules' classes.

Further details on the framework's modules implementation are described in the up-coming sections.

B.1 Preprocessing

The preprocessing is implemented in the PreProcessor class. It is instantiated by theOrchestrator, and it takes the SRS document �le name in its constructor. It implementsthe start method, where it initially calls convertFile method from the PlainTextConverterutility class, returning the plain-text string representation of the SRS document. This issaved in a text (TXT) �le, in the same path of the SRS document �le.

The text �le is saved because converting a SRS document from a speci�c format toplain-text is not a completely fail-proof operation. The operation may miss informationfrom tables, place line breaks where they are not available, or even placing them wherethey are not supposed to be. By providing such text �le to the user, he/she may per-form manual corrections to it, and later provide it as input to AutoChecklist, so it getspreprocessed.

Once the plain-text is available, the requirements, document sections, and the require-ment traceability matrix are extracted from it. This is performed by calling methods fromthe NLPTools utility class, that will be detailed later. Once all the data is extracted, itis saved in a XML �le, calling methods from the XMLPreProcBuilder utility class. Anexample of such �les is shown in Figure B.4.

The PreProcessor class provides the result of its work by the getPreprocessedFile

method, returning the saved XML �le. If the user desires, he/she may also review thegenerated XML �le, and perform manual changes in it, before moving on to the analysisstage.


Figure B.4: Preprocessed �le.


B.2 Traceability

The traceability questions check if all the requirements are present in the requirementstraceability matrix, and if their references are correct. Also, some questions check whetherthe requirements contain internal or external (other documents) references, or referencesto functions.

In order to answer the traceability questions, data extracted at the preprocessing stageis used: The requirements traceability matrix, and a list of sections from the documentunder analysis. Regarding the requirements traceability matrix, the extracted data maybe in CSV format, or a simple chunk of text. If the data is in CSV, and the requirementidenti�cation is found in the correct column, the answers will be either �Yes� or �No�. Ifthe data is not in CSV format, and any instance of the requirement identi�cation underanalysis is found, the answer will be �Possible Yes�, since we cannot be sure if that isreally a traceability reference. If not found, the answer is �No�.

Regarding the references questions, the analysis is be conducted using NLP Token-sRegex, provided by the NLP module. If references are detected, and not found in thelist of extracted sections, they are considered external, or internal otherwise. Regardingthe functions, they are also detected by a variation of words indicating a function. If anyof these are found, a warning answer is attributed, so that the reviewer may check is thereferences are correct.

B.3 Incompleteness

The four questions from the incompleteness category, check for terms indicating either�To Be Con�rmed� (TBC), or �To Be De�ned� (TBD) in the requirements text, and also,missing numeric values, and cases of actions taken by the software in response of someevents, where the reviewer should check if all the possible events are covered.

For analysing the TBC and TBD cases, the NLP TokensRegex are used, searching forvariances of �To Be Con�rmed/De�ned�, and their acronyms (TBC/TBD). If any of theseare detected, the questions are answered with �Yes�, and �No� if none are detected. Therules are saved in a resource �le named res/RegexRules/incompleteness.rules.

For the cases of detecting actions and events described in the requirement text, aNLP utility class has been implemented. The utility class detects actions, which aredenoted by verbs; and events, by using a NLP technique that implements the Algorithm2, described in Chapter 3. If actions and events are detected, a the question is answeredwith a warning, telling the reviewer to check if all the possible events are covered in thedescription.

Finally, the question which checks for missing numeric values, similar words denotingthat a possible numeric value could be missing are searched. If any of them are found, theanswer is a warning, telling the reviewer to check if indeed is the case of a missing numericvalue. The check is also implemented in a NLP utility class: It lemmatizes every token ina sentence, and checks it against a list of terms related to add/subtract/multiply/divide.


B.4 Incorrectness

The incorrectness category contains the greater number of questions in the error-basedchecklist, but its implementation only uses the NLP TokensRegex utility class in its anal-ysis. That is because the questions only check if there are terms and expressions thatare not desired in a requirement statement. Aside from that, the category contains twoquestions which aim to identify if there are numbers followed by units, and if they eitherhave the proper unit, or the proper order of magnitude. The incorrectness TokensRegexrules are saved in the res/RegexRules/incorrectness.rules resource �le.

In the analysis of the questions that only seek to �nd terms and expressions, if they arefound, the answer is �No�, and �Yes� if they are not found. The numbers followed by unitare detected by rules passed in to the tokens regex utility considering a number followedby a unit rule, which in turn considers a combination of pre�xes (e.g. k, mega, nano) andunits (e.g. m, meters, secs), and their variations. If it detects anything, a warning will begenerated so the reviewer may check if the unit and order of magnitude is correct. The�ndings involving numbers and units are stored in a list, that will be used by the outputgenerator module, for generating a sorted list indicating all of their occurrences, so thereviewer may check how all of them are related.

B.5 Inconsistency

The inconsistency module is the �nal analysis module implemented in AutoChecklist.The three questions from the inconsistency category check if the information regarding thedescription of actions and functions, their period or frequency, and watch dog descriptionsare consistent in every occurrence.

The questions also consider references on tables and images within the SRS document.Unfortunately, it is not possible at this time, to extract reliable information from tablesand images. This way, the answers in this category will be warnings, informing thereviewer that he/she should check if the information found in the requirements text isalso found on images and tables.

The detection of actions and functions is performed the same way it is described inthe Traceability and Incompleteness modules, and the inconsistency resource �le with To-kensRegex rules is res/RegexRules/inconsistency.rules. The period/frequency, and watchdog references are evaluated by searching lemmatized terms denoting it.

B.6 Output Generator

The output generator is implemented in the OutputGenerator class, that takes as pa-rameters in its constructor, the list of requirements, questions, answers and the �ndingsgenerated in all the analysis modules. It generates a report in HTML format, composedof three views:

1. Checklist View: Shows the entire checklist, with the �ndings encountered in therequirements for each question;


2. Requirements View: Shows the list of requirements analyzed, with all the �ndingsencountered for each speci�c requirement;

3. Numeric Occurrences: Shows a sorted list of all the numbers and units found, andthe requirements where they occur. This helps the reviewer on �nding issues relatedto inconsistencies among such values in the SRS document as a whole (e.g. di�erentmagnitude orders or units representing the same value).

Aside from the HTML report, a CSV report is also generated, showing the sameinformation in a spreadsheet format. The contents are generated using the utility classesHtmlBuilder and CSVBuilder, available in the �le builder package.

B.7 Text Module

The text module is the �rst of the modules from the framework's utility layer to be usedin the SRS document analysis. It provides a single Utility class, PlainTextConverter, witha static method called convertFile, taking the SRS document �le name, and returning astring object, with its plain-text contents. The module uses Apache Tika, version 1.13,to perform the text conversion.

If a text �le is passed in to the method, no operation will be performed, and the �lecontents are returned. For the other �les, the PlainTextConverter class uses an API fromthe Apache Tika library to perform its work. This library accepts as input a variety of�le formats, such as PDF, DOC, DOCX, among others, and returns a XHTML object.PlainTextConverter creates a string, that will be returned by the convertFile method,traversing the XHTML object in a depth-�rst manner, adding the leaf text nodes to aStringBuilder object as they are reached.

B.8 NLP Module

The NLP module implements all the NLP techniques that are needed by the Preprocessingand the Analysis modules. It implements the techniques in utility classes, and it uses NLPtools provided by the Stanford CoreNLP toolkit (Tokenizer, Parser, PoS Tagger, SentenceSplitter, Lemmatizer, among others), and also it uses the WordNet database, acessed bythe MIT Java WordNet Interface (JWI). We used CoreNLP version 3.6.0, and JWI version2.4.0, along with WordNet database �les version 3.0.

The CoreNLP toolkit needs to be initialized and loaded, so that its tools can be used.The initializing process is time consuming, therefore, in order to avoid it from being ini-tialized whenever any of its tools are needed, a class called NLPTools is responsible forloading it. This class implements a singleton design pattern, to ensure a single instanceduring the whole time AutoChecklist is running. NLPTools also loads the WordNet dic-tionary when it is created. Therefore, this class is the main point of access for NLPtools.

The NLPTools class also instantiate the implemented utility classes, providing accessto them by means of interfaces. This design is meant to avoid direct instantiation of those


classes in every source code entity within AutoChecklist. The reason for that, is if wewould prefer to change the NLP toolkit used, or adding di�erent implementation to theutility classes, the changes would be speci�c to this portion, not in the entire source code.

There are six utility classes implemented in the NLP package: RequirementsInfoExtrac-tor, DocumentSectionsExtractor, RequirementsTraceabilityMatrixSectionExtractor, Expres-

sionExtractor, EventActionDetector, and MissingNumericValueIndicativesDetector. Theclasses from the NLP module are shown in Figure B.5.

B.8.1 RequirementsInfoExtractor

This utility class implements the extraction of the requirements from the plain-text repre-sentation of the SRS document, implementing the Algorithm 1 from Chapter 3. Beyondpurely the algorithm, this utility class also identi�es the single requirement identi�er,that should be located prior to the detected requirement statement sentence, using theExpressionExtractor class.

B.8.2 DocumentSectionsExtractor

This utility class extracts the document sections from the plain-text representation of theSRS document. It runs the text lines and checks if a given line contains the structureof a document section (e.g. a numeric representation followed by a brief description). Itis implemented calling the ExpressionExtractor class with a resource containing regularexpression rules for the extraction. It also calls the next described utility class, Require-mentsTraceabilityMatrixSectionExtractor, for identifying the Requirements TraceabilityMatrix (RTM) section.

B.8.3 RequirementsTraceabilityMatrixSectionExtractor

The RequirementsTraceabilityMatrixSectionExtractor identi�es the RTM section, and ex-tracts its entire contents. It is called from the DocumentSectionsExtractor, whenever asection is identi�ed. The RTM section is also identi�ed using the ExpressionExtractor

and a set of regular expression rules.

B.8.4 ExpressionExtractor

This utility class identi�es and returns expressions from a set of rules passed into it. Itcalls the Stanford's CoreNLP TokensRegex tool, which splits a text into tokens, and usesthis list of tokens to evaluate regular expressions, not only in a single token level, butalso, in a sequence of tokens. The resource �les containing rules that are used by theExpressionExtractor are placed under the res/RegexRules/ directory.

B.8.5 EventActionDetector

The EventActionDetector utility class detects if there are actions and events in a giventext. It �rst detects actions, by tokenizing and PoS tagging the text. Then it looks for


FigureB.5:NLPmod

ule'sclasses.


verbs, excluding a some considering a list of weak verbs (e.g. �be� and �have�). If actionsare found, then it proceeds to the event detection.

The event detection is implemented according to the Algorithm 2, described in Chapter3.

B.8.6 MissingNumericValueIndicativesDetector

This utility class detects in a given text, if there are terms indicating possible missingnumeric values. It uses CoreNLP's tokenizer to split the text into tokens, performs thelemmatization of these tokens using CoreNLP's Lemmatizer, and checks it against alist of lemmatized terms indicating missing numeric values, all of them related to theadd/subtract/multiply/divide terms. All the terms were obtained from WordNet.

B.9 Checklists Module

The checklists module provides a single utility class, given that this implementation onlyconsiders the error-based checklist: ErrorBasedChecklist. Despite the module representa-tion in Chapter 4 describes four utility classes, they are merged in a single class in theactual implementation. This class is a specialization of the base class Checklist, whichcontains methods for parsing a XML �le containing the checklist questions, categories,and actions to be used in the analysis. The checklist �le for the error-based checklist isillustrated in Figure B.6.


Figure B.6: Error-based checklist resource.

The base class is responsible for parsing the XML resource �le, extracting the questionsalong with the actions associated to it, and creating Question objects to be used in theanalysis modules. The ErrorBasedChecklist class provides methods for retrieving thequestions from each of the categories from the checklist. The inheritance between theclasses is shown in Figure B.7.

Figure B.7: Checklist module's classes.


B.10 File Module

This module provides three distinct utility classes: XMLPreProcBuilder, HtmlBuilder,and CSVBuilder. The XMLPreProcBuilder is used by the PreProcessor class, for storingthe data extracted from the SRS document to a XML �le. XMLPreProcBuilder providesmethods to create new requirement, adding text to an existing requirement, document sec-tions, and traceability matrix contents. It uses classes and methods from the org.w3c.dompackage, integrated in Java SE 8 for generating XML contents.

The HtmlBuilder is used by the output generator to generate reports to the reviewer.It uses methods and classes from the org.jsoup.Jsoup package, also integrated in Java SE8, for generating HTML contents.

The CSVBuilder is also used by the output generator, to generate a CSV table sum-marizing all the answers provided to the analyzed requirements. It uses the OpenCSVlibrary, version 3.8, to either read or write a CSV �le.

B.11 User Interfaces

As stated in this Appendix introduction, AutoChecklist runs either in a Command LineInterface (CLI), or in a Graphical User Interface (GUI). We are going to describe some ofthe implemented di�erences between the two interface types in this section. The possibleways to run AutoChecklist are described in Appendix C.

B.11.1 Command Line Interface

The Main class, is the �rst class instantiated in AutoChecklist's execution. If parametersare provided to the static main method, they are evaluated, and if they are correct,AutoChecklist will run in CLI mode. The Orchestrator class is instantiated by the Main

class, starting o� from the preprocessing, or the analysis, depending on the parametersprovided by the command line.

In the CLI mode, the �nal output is provided in a directory that is created in the samepath of the input �le. The directory name is �AnalysisOutput�, followed by the currenttimestamp. Inside this directory, there will be three html �les representing the three viewsfrom the HTML report, and a CSV �le containing the answers in a spreadsheet format.The intermediary outputs of the preprocessing module (plain-text �le and preprocessedXML �le), are saved in the same directory of the provided input �le, and it is doneindependent of the UI mode.

A major di�erence between the UI modes is that the CLI performs the whole executionin a single thread, as the GUI performs the heavier processing in separate threads, asdescribed in the next subsection.

B.11.2 Graphical User Interface

If no parameters are provided to the main method from the Main class, AutoChecklistruns in GUI mode. The Main class starts up the GUI by calling the initUI from the


StartupUI class. The GUI is implemented using the JavaFX UI framework, available inJava SE version 8. The GUI runs as a wizard, where the three processing stages predictedin the proposed framework are shown in three di�erent screens: PreprocUI, AnalysisUI,and ResultsUI. The �rst two, have a simple screen displaying a log viewer that showsmessages that are printed-out as the preprocessing/analysis runs. The third shows thepossible reports that can be generated and provides the possibility of generating all ofthem, or only the reviewer may choose.

The ui package is organized in screens and widgets. The screens are shown whileAutoChecklist is running, and the widgets present alert dialog, search dialogs, and others.The GUI classes are illustrated in Figure B.8.

Figure B.8: GUI classes.

The startup class, StartupUI, is responsible for calling JavaFX' launch method, whichloads the GUI, and instantiates the initial screen, implemented in the InitialUI class. Thisis the screen where the user loads the SRS document to be analyzed. When the user clicksthe �Next� button, the PreprocUI is instantiated. Figure B.9 shows the initial screen.

Figure B.9: Initial GUI screen.

The preprocessing screen calls methods from the Orchestrator class, passing the Filethe user has chosen in the initial screen. The PreprocUI uses a worker thread to perform


this call. This way the GUI is not locked, and the user can still interact to it. While theworker thread is running, the user can neither access some menus nor click in the �Next�button. When the worker thread �nishes, meaning that the preprocessing is complete,the user can open the directory where the preprocessed �le and the text �le were saved,restart the preprocessing, return to the initial screen, and move on to the next screen, theAnalysisUI. The preprocessing UI is shown in Figure B.10.

Figure B.10: Preprocessing screen.

The analysis screen, similarly to the preprocessing, calls methods from theOrchestratorclass to perform the analysis in a worker thread. It runs all the available analysis modules,and when the worker thread �nishes, the user may go to the next screen, presenting theresults. The user may also restart the analysis, or return to the initial screen. The FigureB.11 shows the analysis screen.


Figure B.11: Analysis screen.

The results screen shows the reviewer the types of report that can be generated. Itprovides checkboxes for each possible view, and if at least one is checked, the �Generate�button gets enabled, so that when it gets clicked, the report with the selected views isgenerated. The reports are provided in a tabbed HTML viewer, and the reviewer maysave them in the viewer menus. The spreadsheet view is generated from the results screen�Findings� menu, showing a table view that can be reordered. The reviewer may alsochange the answers and providing comments to the presented �ndings. The reviewer maysave the spreadsheet contents in a CSV �le from the screen menus. The results viewer isshown in Figure B.12. The HTML viewer is shown in Figure B.13, and the spreadsheetviewer is shown in Figure B.14.


Figure B.12: Results screen.


Figure B.13: Reports screen.


FigureB.14:

Spreadsheetscreen.

Appendix C

Running AutoChecklist

We describe here the possible ways to run AutoChecklist. It can be run from its CommandLine Interface (CLI), or from its Graphical User Interface (GUI). We also describe sometweaks that may be performed to the intermediary generated �les in case of problems,and also, to allow a more accurate analysis.

C.1 Command Line Interface

In order to run AutoChecklist using the CLI, parameters must be provided when runningthe executable JAR �le (e.g. java -jar autochecklist.jar �srs Requirements.pdf). If noparameters are provided, it will run in its GUI.

C.1.1 Available Parameters

If any paramater is provided, it will be evaluated, and in case of wrong parameters, theparameters usages will be displayed, as shown in Figure C.1.

Figure C.1: CLI usage.

The next subsections describe how to use the set of parameters.

SRS Documents

In case we are providing as input a SRS document, it is possible to either preprocessand analyze it, running the three stages described in Chapter 3, or only preprocessing,

86

APPENDIX C. RUNNING AUTOCHECKLIST 87

without analyzing it. The parameters used for the two cases are:

Full run: java -jar autochecklist.jar �srs <SRS document �le>.Considering that we have a SRS document �le named as Requirements.pdf, the usagewould be:

java -jar autochecklist.jar �srs Requirements.pdf

Preprocess only: java -jar autochecklist.jar �srs <SRS document �le> �pre-only

Considering the same SRS document �le, Requirements.pdf, the usage would be:

java -jar autochecklist.jar �srs Requirements.pdf �pre-only

Preprocessed Files

If we already have a preprocessed XML �le, we can skip the preprocessing stage and movedirectly to the analysis stage. For that, we must provide the preprocessed �le in the inputparameters

Preprocessed �le: java -jar autochecklist.jar �pre <XML �le>

Considering a preprocessed XML �le named Requirements-preproc.xml, the usage wouldbe:

java -jar autochecklist.jar �pre Requirements-preproc.xml

C.1.2 Output

The output is saved in a directory created in the same path of the provided input �le, asshown in Figure C.2. The directory name is �AnalysisResults�, followed by the currenttimestamp. The intermediary outputs from the preprocessing stage, the converted text�le, and the preprocessed XML �le, are also saved in the same path of the input �le.

Figure C.2: CLI output �les.


C.2 Graphical User Interface

If no parameters are provided when running the tool (e.g. java -jar autochecklist.jar), theGUI will be started. The initial screen shows in its menu the possible types of �les thatcan be used as input: SRS documents, preprocessed �les, and analyzed �les. The inputtype can be changed by selecting the other possible types in the menu. When either ofthe options are chosen, the screen changes it state and will start the respective executionstage (preprocessing, analysis, or results) when the user clicks the �Next� button. Theinitial screen is shown in Figure C.3, with its menu in the initial state (SRS document).

Figure C.3: GUI input types.

C.2.1 SRS Documents

In the case of SRS documents, AutoChecklist accepts PDF, DOC, DOCX, RTF and TXT�les. If the user already has a text �le representation of the SRS document, or he/she hasalready ran AutoChecklist and wishes to use the text �le generated in the preprocessingstage, after performing manual corrections, it is also possible. The user must select thecorrect �le type in the operating system dialog �lter, displayed when the user chooses toload the �le.

C.2.2 Preprocessed Files

If the user has a preprocessed XML �le available, either because he/she already ran thetool and wishes to repeat the process, or because manual changes were performed to it,the preprocessing stage can be skipped, starting o� directly in the analysis screen. To doso, in the initial screen, we must select the menu option called �Switch to a preprocessed�le�. Upon selecting it, the initial screen will change its state allowing a XML �le asinput. When the user clicks the �Next� button, the analysis screen will be shown, startingup the analysis.

C.2.3 Analyzed Files

In the case of the GUI, it is possible to use as input a CSV �le that contains the analysisresults, created when the user saves the spreadsheet containing the questions answers and�ndings, or when running the CLI. To do so, at the initial screen the user must select


the menu option called �Switch to an analyzed �le�. The initial screen changes its stateto allow a CSV �le as input, and when the user selects and clicks the �Next� button, theresults screen will show up, just like if the analysis has just been �nished. The user maygenerate HTML reports, or even see the results in the spreadsheet viewer.

C.3 Iterative Preprocessing

The SRS documents may be written using di�erent formats, and the requirements maybe written in di�erent ways (e.g. document sections, text chunks, tables, etc...). This canresult on problems when converting the document to plain-text, like missing or misplacedline breaks, mingled or even missing sentences. Also, if dealing with PDF documents withheaders and footers, they can be mixed with the actual document content when breakingpages. This happens because the library used to convert the document to plain-text isnot robust enough to handle those di�erences.

To deal with these problems, AutoChecklist saves the plain-text representation of thePDF document in a text �le located at the same path of the original SRS document�le. If such issues are detected, the reviewer may edit this text �le, and perform someadjustments to it. The reviewer may use this text �le as input when running AutoChecklistagain, in an interative manner, until all the issues are �xed. We illustrate a failsafe formatfor representing the requirements in the text �le, in case the tool fails to recognize themdue to the issues from the previous paragraph, is illustrated in Figure C.4. In the failsafeformat, the requirement IDs are always separated from the requirement text by a blankline.

(...)

Req-01

All telecommand packet headers shall have a packet sequence control �eld asde�ned in section 5.3.2 of [AD1]. All �elds shall be supported.

Req-02

The sequence �ags �eld in the packet sequence control �eld of the telecommandpacket header shall always contain �11�, indicating a stand-alone packet.

Req-03

(...)

Figure C.4: Failsafe text representation of a SRS document.

Another way that the reviewer may �x possible errors in the preprocessing stage is toedit the XML �le that is generated after the preprocessing stage is completed. This �le


is also saved in the same path as the original SRS document that was used as input.Therefore, if any issue related to bad text conversion, or failures in identifying some

requirement from the SRS document should occur, we suggest that the reviewer iterativelycheck and edit the generated text �le, using it as input to AutoChecklist from the secondtime onwards, until no more issues related to text conversion are found. After that, thereviewer may check and edit the preprocessed �le, where he/she can edit the accquiredrequirement texts, or add requirements that were failed to be recognized, probably dueto a possible failure in the preprocessing. This procedure can be performed either usingCLI or GUI modes.

C.4 Requirements Traceability Matrix in CSV format

One of the steps from the preprocessing stage is to identify the requirements traceabilitymatrix in the SRS document, so that it can be checked in the Traceability analysis module.The matrix is usually represented in a table, so, when it gets converted to plain-textformat, it will be a single text chunk. This is saved in the XML preprocessed �le insidea pair of <RTM> tags. This is illustrated in Figure C.5

(...)</sections><RTM>Requirement Drawing Object Req-4.1.1-1 HRT HOOD PUS ExternalPUS Req-4.1.1-2 HRT HOOD PUS External PUS Req-4.1.1-3 HRT HOOD PUSExternal PUS (...) </RTM></SRS>

Figure C.5: Requirements Traceability Matrix as a text chunk.

In this format, the Traceability module will only be able to tell if instances of the IDof the requirement under analysis are found within the text chunk. For a more accurateanalysis, the traceability matrix may be stored in a Comma Separated Value (CSV)format, by editing the preprocessed XML �le. An example of a traceability matrix in thisformat is shown in Figure C.6.

(...)</sections><RTM>Requirement,Drawing,ObjectReq-4.1.1-1,HRT HOOD PUS,External PUSReq-4.1.1-2,HRT HOOD PUS,External PUSReq-4.1.1-3,HRT HOOD PUS,External PUS(...)</RTM></SRS>

Figure C.6: Requirements Traceability Matrix in CSV format.


The Traceability module is able to identify if the information is represented in CSV,and if it does, the analysis will be more accurate, since we are able to identify if aninstance of a requirement ID found is, in fact, indicating an entry in the matrix.

Appendix D

Adding new Checklists to

AutoChecklist

Here, we have a description on how to add new checklists to AutoChecklist. In its imple-mentation, a single checklist has been considered, but its design allows the adition of otherchecklists. We describe in the upcoming sections the changes that need to be performedin order to achieve it.

D.1 Changes to perform

Adding a new checklist to AutoChecklist will require not only source code changes, butalso, new resources, and changes in the user interface. The new checklist needs to beevaluated to determine what kind of NLP techniques could be deployed in the task ofanswering its questions. The checklist questions may or not be divided into categories,but if they are not, it would be useful to determine which of the questions are similar in theanalysis aspect (e.g. identifying terms or expressions, actions, speci�cs parts of speeches,among others). This would determine which and how many new analysis modules andNLP utility classes should be implemented. We are going to walk through the entitiesthat should be changed or created in the next subsections.

D.1.1 Checklist Resource

The �rst task that should be performed when adding new checklists to AutoChecklist, iscreating a new XML resource with a static representation of the checklist. It should beplaced under the res/Checklists/ package. This �le's contents must be de�ned inside theparent <checklist> tag.

The checklist questions must be de�ned inside <category> tags. If the new checklist'squestions aren't divided in categories, a single category must be speci�ed, so that the�le can be parsed properly when read. The categories might be also de�ned dependingon the similar questions, and may also re�ect on the number of analysis modules to beimplemented. The category tag must take a �type� string parameter, identifying it (e.g.<category type=�name�>.

92

APPENDIX D. ADDING NEW CHECKLISTS TO AUTOCHECKLIST 93

<checklist><category type=�incompleteness�><question id=�1� action=�extract, should�>Are all the requirements without theword �should�?</question>(...)</category>(...)</checklist>

Figure D.1: Checklist resource structure.

Each <category> tag must contain a number of <question> tags. The question textmust be de�ned between the opening and closing tags. The question unique identi�er(number) is de�ned in the �id� parameter. There is also an �action� parameter whichtakes two arguments: a type and sub-type. The type indicates an operation that shouldbe performed when analyzing this question, and the sub-type complements this informa-tion. For instance, if we need to extract the term �can� from the requirement text whenanalyzing it, the action parameter must be set as �extract, can�. An example of the wholestructure is shown in Figure D.1.

D.1.2 Checklist Class

Once the checklist resource is de�ned, a new class representing it must be created incom.autochecklist.utils.checklist package. The new class must extend the Checklist class,so that it inherits the XML resource parsing procedure.

The new checklist class must de�ne QuestionCategory attributes re�ecting the cate-gories de�ned in the XML �le. Once the category attributes are initialized, the methodparseChecklistResource should be called, passing the path to the XML �le and the cate-gory attributes, so that the methods de�ned in the parent class can parse the �le and addQuestion objects to the de�ned attributes.

The class should also provide getter methods returning the QuestionCategory objects,so that they get used by the analysis classes.

D.1.3 Question Actions

If the checklist resource only uses action types that are already existing, this procedurecan be skipped. Otherwise, it is necessary to perform code changes to support the newaction types.

The pair (type, sub-type) is resolved into a QuestionAction object. The type is trans-lated into an integer value, and the sub-type to a string value. When adding a new actiontype, a new constant must be added to QuestionAction class (e.g. ACTION_TYPE_DETECT).Changes should also be performed in the resolveAction method from the Checklist class,so that when it parses the checklist resource, it may resolve the type attribute to thecorrect constant.

There are no changes needed regarding the sub-type attributes, since they are set as a


string value. They are stored in uppercase, and any space characters that may exist arechanged to the underscore character (e.g. �a few� is stored as �A_FEW�).

D.1.4 Analysis Modules

Once the checklist is evaluated and the analysis modules that should be created are de-�ned, a new class representing each module should be created under thecom.autochecklist.modules.analysis package. Each analysis class must extend the Analy-sisModule abstract class.

Each new class must implement the performQuestionAction abstract method. Thismethod's parameters are two objects representing the current requirement and questionbeing analyzed at the time of the method call, and also, the type and sub-type of actionto be performed in the analysis.

Optionally, the classes may override the preProcessRequirement method, that is calledwhenever the analysis of a new requirement is to be started. This method is called oncefor each requirement, and before all the calls to performQuestionAction for that speci�crequirement. This method can be called, for instance, to perform actions that could beused by all the questions.

D.1.5 Analysis Modules Factory

A new static method should be added to the AnalysisModuleFactory class. This methodmust return a queue structure object containing all the objects created for each newanalysis class. The created objects must be cast to the parent class, AnalysisModule.This new static method should be called by the Orchestrator class, when initiating theanalysis stage.

D.1.6 Orchestrator

The Orchestrator must be aware of what checklist the analysis is being conducted to, sinceat this time we should have more than one. An internal private attribute could be de�nedand set by the time its object gets created. When instantiating the analysis modules,this attribute should be checked and the proper method from the AnalysisModuleFactory

called for the speci�c checklist.

D.1.7 Preprocessing

The preprocessing stage should also consider extracting di�erent information from theSRS document depending on the new checklist's questions. If this is the case, code shouldbe added to the PreProcessor class, and NLP utility classes should be changed or added.The PreProcessor class could also rely on an internal attribute pointing to which checklistthe preprocessing is being performed, so it will not extract unnecessary information.


D.1.8 NLP Utility Classes

Depending on the new information that should be extracted in the preprocessing stage, orthe analysis that should be performed in the new analysis modules, the NLP utility classesavailable could be changed, or even, new utility classes should be added. When adding newNLP utility classes, they should be created under the com.autochecklist.utils.nlp package.If they should implement an interface, following the speci�ed design, it should be createdunder the com.autochecklist.utils.nlp.interfaces package. The new utility class should im-plement the new interface, and a new method, returning the new interface type, shouldbe added to the NLPTools singleton class. The new analysis classes should call this newmethod from the NLPTools object.

D.1.9 User Interfaces

The primary change in the user interfaces when adding a new checklist, is that the usermust be able to specify which checklist should be used in the analysis of the desired SRSdocument. This change should consider AutoChecklist's Command Line Interface (CLI),and the Graphical User Interface (GUI).

Command Line Interface

The checklist chosen to guide the SRS document analysis should be spe�cied by a newparameter, to be provided by the user in the command line. The Main class should bechanged in order to consider this new parameter. The parameter should be evaluated andparsed in this class, de�ning which of the checklists should be used, and this informationshould be passed to the Orchestrator object that is created, ideally in its constructor.

In case of any errors with the new parameter expected syntax, the usage should beprinted, as it is already done when evaluating the existing parameters. The usage itselfshould also be updated, contemplating the new parameter.

Graphical User Interface

Similar to the CLI, the GUI must also be aware of which checklist should be used in theanalysis. This could be achieved by adding a new UI element (e.g. dropdown) to theinitial screen, so that the user may choose the desired checklist to the operation. Thisnew UI element should be displayed in two states of the initial screen: When loading aSRS document, and when loading a preprocessed �le. This UI element should provide theparameter that should be passed in to the Orchestrator object.

The preprocessing and the analysis screens should not likely be changed, but theresults UI could be. It should present new types of reports if needed. The existingviewers should be updated if the CSV and HTML reports should change their formats.New viewer classes might also be created to support those changes.


D.1.10 Output

The outputs might change in type and formats. They must be added in the Out-

putGenerator class, and the new methods must be added to its start method. Logicshould be added in this method to generate the proper outputs given the checklist be-ing applied. If new types of contents should be created, they should be added to thecom.autochecklist.utils.�lebuilder package.