Having trouble?
Try the newer version at the University of Alberta:
http://taporware.ualberta.ca
Tools Home : Plain Text Tools : Text Summarizer

Click here to show HTML tools HTML Tools

Click here to expand XML tools XML tools

Click here to expand plain text tools Plain Text Tools

Click here to expand other tools Other tools

 Beta tools
 Add Tools Demo
 Manual
 About

Text Summarizer
?
Summary

Tool extracts infomation about the xml document provided by user. The information can be author, title, total words and words in specific element etc. It also let users select their interested topics such as (highest) word list, list of sentences that contain more than selected # of words, and elements against text distribution etc.

Note: The input text format should be plain text. If you submit an XML or HTML text, the tool will strip all the tags, and then process it as plain text. For best results with XML or HTML text, it is suggested to use XML-specific or HTML-specific tools.

For more details, see here.

Walkthrough

Example: fetch text from http://www.gutenberg.org/dirs/etext91/peter16.txt; list top ten high frequency words from all words in the document; list sentences that contain at least two high frequency words.
  1. Source text
    1. Enter `http://www.gutenberg.org/dirs/etext91/peter16.txt' in the Text source URL field;
  2. Summary limited to
    1. Check the List top checkbox and enter `10' in the text field found on the same line;
    2. select the from all words option;
    3. check the List sentences that have checkbox and enter `2' in the text field on the same line.
  3. Results
    No help written for this yet.
*
» Source text
  Example: http://taporware.mcmaster.ca/sampleDocs/plainText.txt


?
Summary

Determines the text source. Text can be obtained from a URL or by uploading a file.

Fields

Source URL
Text from the entered URL will be used as the data source for the analysis.

Local file
Use this field to upload a local file for analysis.

Treat XML/HTML as plain text
Enabling this option will strip tags from an HTML or XML document. <p> and <br /> in HTML documents and all tags in XML documents are converted to new lines (i.e. \n).
» Summary limited to





(separate words by `,')


?
Summary

Determines which types of information will be presented in the results.

Fields

List top n frequency words
Selecting this will include a list of the top n words in the results.

From all words
Takes into account all words in the document when searching for high frequency words.

Matching pattern
Allows the user to define a pattern that will limit which words will be accounted for when searching for high frequency words.

From word list
Limits the words accounted for when searching for high frequency words to those present in a user-defined word list.

Not from word list
Limits the words accounted for when searching for high frequency words to those not present in a user-defined word list.

Type in
Allows the user to manually enter a list of words that may be reffered to when From word list or Not from word list is selected in the above options.

From local file
Allows the user to upload a local file containing a list of words which will be referred to when From word list or Not from word list is selected in the above options.

List sentences that have n or more high frequency words
Selecting this option will display sentences containing n or more high frequency words.

For each high frequency word, list (first|first three|all) context(s) with context length of n words before and after
Displays high frequency words within their found context (first, first three or all occurences) with n words before and after each match.

List collocation within n words of the high frequency words
Lists collocates of high frequency words that appear no more than n words apart.

» Results
?
Summary

Allows the user to choose how the results will be formatted and whether they should be displayed in a new browser window.

Fields

Sort
Allows you to sort the results in one of several ways.

Display as
Determines the format in which results will be delivered

Open results in new window
Checking this box will display the results in a new window. This option is selected by default. In some cases pop-up blockers may disallow windows from being created, in which case this option may be de-selected.
`*' indicates a required field

 

 

TAPoRware Project, McMaster University,