similarity texter is a web application that measures, and reports lexical similarities, which are unique (no duplicates), between two input files or texts. Its implementation is based on the sim_text algorithm, developed by Dick Grune.
- Supports different types of input: DOCX, ODT, TXT files, and HTML/plain text.
- Provides options for fine-tuning the reading of files, and the comparison process.
- Supports auto-scrolling to a target match, when clicking on a source match.
- Generates a PDF report from the comparison output.
Supported web browsers
The web application has been tested on Google Chrome (v48.0), Mozilla Firefox (v44.0), and Internet Explorer (v11.0).
Proper functionality on previous versions and/or other web browsers than those specified cannot be guaranteed.
How to use
The use of this web tool can be summarized in the following steps:
- From the Settings panel, select the options of your choice.
- From the Input panel, set the value of the Minimum match length, and provide two input files or texts.
Start the comparison process by clicking on the COMPARE button.
- Examine the results, displayed in the Output panel.
- Generate a PDF report by clicking on the button in the Output panel.
Graphical User Interface
This panel contains a set of buttons, which show/hide different parts of the GUI.
- The button toggles the Settings panel.
- The button toggles the Input panel.
- The button opens the documentation page of similarity texter in a new tab.
This panel provides a set of options related to the reading of files, and the comparison process.
When you select/change an option, its new value is saved in your web browser's local storage. So, you don't have to set it once more, if you refresh the page or restart your web browser.
These options determine the way in which comparison is performed.
Ignore letter case
When checked, the letter case of the input is not taken into account during comparison, i.e. all input is converted to lowercase.
When checked, numbers are not taken into account during comparison, i.e. they are deleted from input.
When checked, the following punctuation characters/symbols are not taken into account during comparison, i.e. they are deleted from input.
Replace umlaut & ligatures
When checked, the following umlauted characters and ligatures are replaced by their equivalent expanded versions during comparison.
Character Replaced by ä, æ ae ö, œ oe ü ue ß ss
Input reading options
These options determine the way in which reading of files is performed.
Ignore footnotes in DOCX/ODT files
When checked, the footnotes or endnotes of a document (i.e. DOCX and ODT file format) are not parsed, i.e. they are excluded from input.
This panel provides two input panes, one for each input.
Users are allowed to provide different type of input in the source and the target pane.
For file input, click on the tab FILE,
and select a file from your local directory by pressing the Browse file button.
The file formats, currently supported are: DOCX, ODT, and TXT.
For text input, click on the tab TEXT, and type or paste some plain/HTML text.
If the provided input is HTML, please check the HTML checkbox. Otherwise, leave it unchecked.
Minimum match length spinner
Sets the minimum number of words that constitute a match.
Starts the comparison process.
This panel displays the output of the comparison process.
The left-side and right-side panes show the contents of the source and the target input respectively, together with any matches found.
Matches are highlighted in different background colors. However, the same background color is used for highlighting the same longest common substring found in the source and the target input.
Overlapping matches are surrounded by a dashed border.
Auto-scrolling to target match
Click on a highlighted match (in either output panes) to trigger auto-scrolling to the corresponding reference match on the other pane.
Both matches are aligned at the same level in order to provide a better overview of their content.
To display statistical data on the input, click on the button.
Generate PDF report
To generate a PDF report from the contents of the comparison output, follow the steps below:
- Click on the button.
The PRINT OUTPUT dialog is displayed, where you can provide a comment, for each input, to be included in the generated PDF report.
Press the PRINT button to proceed to printing.
The system's print dialog is displayed. Please enable the following options for the proper generation of the PDF report:
Chrome's native print dialog
System's print dialog
- Option Print to File, to direct printing to the PDF printer, and
- Option Print Background Colors.
NOTE: The precise names of the options may vary depending on the operating system.
- Press Save (in Chrome) or Print (in System) to generate the PDF report.