Interface Specification For The HTML Widget

Interface Specification For The HTML Widget

This is a draft interface specification for the Tk HTML widget currently under development. Since it is still a draft, it is subject to change. Eventually, the interface will stabilize and this interface specification will morph into a manual page.

Configuration Options

-appletcommand

This option specifies the name of the Tcl procedure to invoke when the <applet>...</applet> tag sequence is seen. The html widget will append two arguments to the procedure before calling it. The first argument is the name of a widget that the callback should create to hold the applet. The second argument is a list of name/value pairs which are the arguments to the <applet> tag.

The text between <applet> and </applet> is normally suppressed. However, if the -appletcommand option is set to the empty string, the <applet> tag is ignored and all text between <applet> and </applet> is displayed normally.

"<embed>" is treated as an alias for "<applet></applet>".

-background

The background color for the widget.

Note that the <body bgcolor=...> HTML tag does not automatically cause the widget to change its background color. If you want the background color to change in response to this HTML tag, then your Tcl script should intercept the <body> tag using the ``token handler'' widget command (described below) and change the background color manually.

-base

The base URI for the current document. This should be set to the URI that was used to retrieve the document before parsing begins.

-bdAn alias for -borderwidth
-bgAn alias for -background
-borderwidth

The width of the 3-D border drawn around the parameter of the widget, in pixels.

-cursor

The cursor displayed when the pointer is positioned over the HTML widget. If {}, the cursor reverts to its default shape.

-exportselection
-fontcommand

The name of a TCL procedure that is used to convert HTML font names into TCL font names. A default built-in procedure is used if the value of this option is {}.

When the HTML widget needs a new font, it calls this procedure with two arguments. This first argument is the font size expressed as an integer between 1 and 7. The standard size is 4. The second argument is a set of between 0 and 3 keywords drawn from the following set: "bold", "italic", and "fixed". If the "bold" keyword is present in the second argument, the font returned should be bold. If the "italic" keyword is present, the font should be italic. If the "fixed" keyword is present, the font should be fixed-width. The TCL procedure should return the name of the TCL font that the HTML widget will use to render the given HTML font. If the TCL procedure returns an empty string, then the built-in default procedure is used to determine the font.

Examples: This is {4 {}}. This is {4 fixed}. This is {3 {}}. This is {5 {fixed bold}}

-fg An alias for -foreground.
-foreground The default foreground color in which HTML text is rendered. The HTML can override this using the color=... attribute on various HTML tags.
-formcommand string Declares a handler for everything to do with forms within a document. Arguments will be appended to string and the result evaluated during parsing (for form creation) and when the widget is cleared (for form cleanup). The first argument is a token for identifying a form. The second argument selects the action to perform. The remaining arguments depend on the action, as follows.
string token form URL method attrs
The handler should begin taking notes for form token, especially the (resolved) URL of the action and the method to be applied. The raw attributes of the FORM element are in the pairlist attrs.
string token flush
When the document is cleared, the widget will destroy all the windows it requested. This handler should clean up anything else it created for that form.
string token input path attrs
The handler should create a window named path appropriate for the element described by the attrs. The widget will map the window into its rendering appropriately.

It is not an error for the handler to return without creating such a window (it's natural in the case of type=hidden); the widget simply ignores the element in that case. The attributes are the raw values in the HTML, with one exception; a src will be resolved before the handler is called.

string token textarea path attrs initial
The handler should create a window (a single Text, or a Frame with Text and Scrollbars, or whatever) appropriate for a <textarea> and initialise it to the initial string.
string token select path attrs choices initial
<select> is quite a complicated case... The handler should create a window appropriate for a <select> of the given attributes and present the list of choices. Each choice is a pair, the value and its label. initial is a list of values initially selected. This approach is somewhat questionable but should do most of the time.
Caution: Be very careful to avoid confusing HTML variables with TCL variables. It may be tempting to use the name attribute fairly directly to link together related widgets, but it will likely cause incorrect behaviours. Also be careful to observe the order in which the elements are created; this determines the order in which they must be submitted. A default form handler with the correct bahaviour written in TCL will be bundled with the widget.

The attribute names will be downcased within attrs.

-framecommand The script specified by this option is invoked when the HTML parser encounters a <frameset>...</frameset> tag sequence. The arguments to the script are TBD. If the value of the option is the empty string, then the text within the <noframe>...</noframe> tag sequence is displayed.
-height Specifies the height of the area into which HTML is rendered. This value plus twice the -padx, -borderwidth and -highlightthickness values is the total height of the widget.
-highlightbackground
-highlightcolor
-highlightthickness
-hyperlinkcommand The script specified by this option is invoked whenever the user clicks on a hyperlink on the HTML page. Before invoking this script, the URI for the hyperlink is appended.
-imagecommand When a ``<img src=...>'' tag is encountered, the HTML widget invokes the script specified by this option in order to get the name of a Tk image object to display the HTML image. Before invoking the script, the following arguments are appended:
  1. The value of the src=... parameter after have been processed by the resolver.
  2. The value of the width=... parameter.
  3. The value of the height=... parameter.
  4. A list containing the names and values of all parameters.
If the name returned by this script is the empty string, or if the script is an empty string, then the HTML widget displays the alt=... text of the <img> tag instead of an image.
-bgimagecommand When a ``<table background=...>'' tag is encountered, the HTML widget invokes the script specified by this option in order to get the name of a Tk image object to display as the background for the table. Similarly for TD, TH, and TR. Before invoking the script, the following arguments are appended:
  1. The value of the background=... parameter after have been processed by the resolver.
  2. The token id (TID) of the table, row or col markup tag.
The image name to use for the background can may be returned. Or if the return value is empty. the user can set the background image when available via the bgimage command with the tokenid.
-isvisitedcommand When the HTML widget encounters a hyperlink (``<a href=...>'') it invokes the script specified by this option in order to determine whether or not the hyperlink has been visited. This information is needed to determine what color to use to display the hyperlink.
-padx The amount of extra space to insert between the 3-D border and the left and right sides of the document text.
-pady The amount of extra space to insert between the 3-D border and the top and bottom of the document text.
-relief The relief used to draw the 3-D border.
-resolvercommand

The name of a TCL command used to resolve URIs. If blank, a built-in resolver is used. If a TCL command is specified but it returns an empty string, the built-in resolver is used then too. The build-in resolver is based on the algorithm in section 5.2 of RFC 2396.

Multiple URIs are appended to the TCL command before it is executed. The first URI is the BASE URI of the document (the URL that specified by the -base configuration option and updated according to any prior <BASE> markup). Zero or more additional URIs are appended to this base. The result of the script should be the resolution of the whole series or URIs.

-rulerelief

Determines the appearance of the Horizontal Rule (<HR>) markup. The default is "sunken". This can also be "raised" or "flat". If "flat", then the <HR> is drawn using a solid line in the current foreground color. "groove" and "ridge" are the same as "flat".

-scriptcommand

Whenever <SCRIPT>...</SCRIPT> markup is encountered in the input HTML, the line number, the attributes of the <SCRIPT> markup and the body of the script are appended to this string and the result is executed as a TCL command. If this options is the empty string, then the script is ignored.

-selectioncolor The background color used when drawing the selection. The foreground color for the selection is the same as the regular foreground color.
-tablerelief

Determines the appearance of the borders around tables. The default is "raised". This can also be "sunken" or "flat". If "flat", then the borders is drawn using solid lines in the current foreground color. "groove" and "ridge" are the same as "flat".

-takefocus
-unvisitedcolor The foreground color used to draw hyperlinks that have not been visited.
-underlinehyperlinks Set to TRUE to cause hyperlinks to be drawn using an underlined font.
-visitedcolor The foreground color used to draw hyperlinks that have been visited.
-width The width of the document text. This value does not include space allocated for -highlightthickness, -borderwiddth or -padx.
-xscrollcommand
-yscrollcommand

Indices

Internally, the HTML widget stores the HTML document as a list of tokens. Each token is either

  • a contiguous sequence of non-space characters (Text),
  • a contiguous sequence of spaces, tabs or newlines (Space),
  • or an HTML markup tag (such as ``<em>''.)

Tokens are identified by number. The first token is ``1'', the second is ``2'' and so forth. So in its simplest form, an index is just an integer greater than 0.

Within a single Text or Space token, individual characters are also identified by number, though the counting starts with 0 instead of 1. The character number is connected to the token number by a period. So, for example, the 4th character in the 9th token would be ``9.3''.

Two integers separated by a dot is called the connonical form of an index. Other index forms are available, including:

end The keyword ``end'' means one character past the last character of the last token.
last The keyword ``last'' means the last character of the last token.
@X,Y The character located at screen coordinates X,Y.
&DOM The element matching the given DOM address. eg. ``tables(1).rows(3)''.
*.last The second integer can be replaced by the keyword ``last'' to mean the last character in the token.
sel.first This is the first character that is part of the selection.
sel.last This is the last character that is part of the selection.
insert The character immediately following the insertion cursor.

Commands

html window ?options ...?

Create a new HTML widget instance named windows

html reformat from to text

Convert text from one encoding to another. The text is given in the text argument. The current encoding of the text is specified by the from argument. This command returns the same text in the to encoding.

From and to may be any of the following values:

plain Ordinary text with no characters escaped.
http The text is encoded in a form suitable for use with the HTTP protocol. Spaces are converted to "+". Special characters and escaped as "%aa" where "a" is a hexadecimal digit. A special character is anything other than an alphanumeric or one of these: ".", "$", "-", or "_".
url The text is encoded in a form suitable for use as a URI. Spaces are converted to "+". Special characters and escaped as "%aa" where "a" is a hexadecimal digit. A special character is anything other than an alphanumeric or one of these: ".", "$", "-", "_", or "/".
html The text is encoded in a form suitable for use within HTML. "&" is encoded as "&amp;", "<" is encoded as "&lt;" and so forth.

This command is intended to be useful to the TCL procedures that implement callbacks for the HTML widget.

html urljoin scheme authority path query fragment

This command takes the five main components of a URI and joins them together into a complete URI. Special characters in any component are escaped.

html urlsplit uri

This command takes a single URI and splits it into its five major components: scheme, authorithy, path, query and fragement. The command returns a list where each component is an element of the list. Components missing from the URI are represented as empty elements in the list.

html gzip file FILE DATA

This command gzips data to a file.

html gunzip file FILE

This command gunzips data from a file.

html gunzip data DATA

This command gunzips data from a string.

html gunzip data DATA

This command gzips data from a string.

html gzip file FILE DATA

This command gzips data to a file.

html base64 encode DATA

This command base64 encodes data.

html base64 decode DATA

This command base64 decodes data.

html text format DATA LEN

This command formats text limiting line length to that specified.

html xor CMD DATA ...

This command returns an xor encryption of data. CMD is one of xor, encrypt or decrypt. The last two take a password argument.

html stdchan CMD CHANNEL

This command sets the channel to work around MS Windows exec problems. CMD is one of stdin, stdout, stderr.

html crc32 DATA

This command produces a 32 bit checksum (but not a real CRC).

Widget Commands

WIDGET  bgimage  IMAGE ?TID?

Set IMAGE to be the background image. TID, if supplied, is the token id of a TABLE, TD, TH or TR. If TID is ommitted, it is the background image for the whole page.

WIDGET  cget config-option

Return the value of a configuration option. Works just like any other Tk widget.

WIDGET  clear

Remove all tokens and text from the HTML widget. The parser is reset to its initial state. This routine should be called to changes pages.

WIDGET  configure ?args...?

The standard Tk configuration command.

WIDGET  coords  ?INDEX ?percent??

Return the screen coordinates of INDEX.

WIDGET  forminfo  INDEX

Return forminfo for given INDEX.

WIDGET  href  X  Y

If the coordinates X Y define a point above a hyperlink, then this command will return the target URL for that hyperlink. The URL will be resolved using the -resolvercommand before it is returned.

WIDGET  imageadd  ID  IMAGE

Add a single image onto animated image list.

WIDGET  imageat  X  Y

If the coordinates X Y define a point above an image, then this command will return the Token Id for that image.

WIDGET  images

Return the list of animated images.

WIDGET  imageset  ID  NUM

For animated gifs, set image number NUM to be the current image. This is only used for buffered animations.

WIDGET  imageupdate  ID  IMAGES

When an Animated gif comes in, this allows changing the current image into multiple images.

WIDGET  index  INDEX  ?COUNT  UNITS?

Translates INDEX into its connonical form. The connonical form of an index is two integers separated by a period.

The optional 3rd and 4th arguments specify a displacement from INDEX to the value of the index returned. COUNT can be any integer value, including a negative number. UNITS must be either ``char'' or ``line''.

WIDGET  insert  INDEX

Causes the insertion cursor (a flashing vertical bar) to be positioned immediately before the character specified by INDEX.

WIDGET  names

This command causes the widget to scan the entire text of the document looking for tags of the form ``<a name=...>''. It returns a list of values of the name=... fields.

The vertical position of the document can be moved to any of these names using the ``WIDGET yview NAME'' command described below.

WIDGET  onscreen  ID X  Y

Return 1 if ID is onscreen (visible).

WIDGET  over  X  Y ?-muponly?

Return a list of TIDS where the coordinates X Y define a point above objects. If -muponly, give only markup elements.

WIDGET  overattr  X  Y ATTRS

Like over but returns markup containing one or more of the attributes in the list ATTRS. ATTRS.

WIDGET  parse  HTML-TEXT

Adds the given HTML text to the end of any text previously received through the parse command and parses as much of the text as possible into tokens. Afterwards, the display is updated to show the new tokens, if they are visible.

WIDGET  resolver  ?uri ...?

The resolver specified by the -resolvercommand option is called with the base URI of the document followed by the remaining arguments to this commant. The result of this command is the result of the -resolvercommand script.

WIDGET  selection  subcommand args...

The selection widget command is used to control the selection.

WIDGET  selection clear

Clear the current selection. No text will be selected after this command executes.

WIDGET  selection set  START  END

Change the selection to be all text contained within the given indices.

WIDGET  refresh options

Cause a relayout and redraw. Useful after a token insert or update. Valid options are zero or more of: images, resize, focus, text, border, extend, clipwin,, styler, animate, vscroll, hscroll, gotfocus, layout. The default is layout. You may abreviate options with the first letter.

WIDGET  source

Return the html source for the current page.

WIDGET  text  subcommand args...

There are several token commands. They all have the common property that they directly manipulate the text that is displayed. These commands can be used to build an WYSIWYG editor for HTML.

WIDGET  text ascii  INDEX-1  INDEX-2

Returns plain ASCII text that represents all characters between INDEX-1 and INDEX-2. Formatting tags are omitted. The INDEX-1 character is included by INDEX-2 is omitted.

WIDGET  text delete  INDEX-1  INDEX-2

All text from INDEX-1 up to, but not including INDEX-2 is removed and the display is updated accordingly.

WIDGET  text html  INDEX-1  INDEX-2

Returns HTML text that represents all characters and formatting tags between INDEX-1 and INDEX-2. The INDEX-1 character is included by INDEX-2 is omitted.

WIDGET  text insert  INDEX  TEXT

Inserts one or more characters immediately before the character whose index is given. The insertion cursor is updated.

WIDGET  text break  INDEX 

Break the text token at index into two text tokens.

WIDGET  text find TEXT  ?nocase? ?before|after INDEX? 

Find text. If index is given, start search from there. If before, search backwards. nocase will ignore case.

WIDGET  text table INDEX  ?images? ?attrs? 

Return text (and optionally attributes and images) from a table as lists. The first list is a list of rows (each a list of cells). The next optional list is the list of attributes, like above, but the element 0 contains the table attrs, and element 0 of each row contains the row attrs. Another optional list is the list of images, each as a set of values: row col charoffset tokenid. charoffset is the character offset within the text that the image appears at. tokenid is the index to use to lookup the attributes such as src.

WIDGET  token  subcommand args...

There are several token commands. They all have the common property that they involve the list of tokens into which the HTML is parsed.

Some of the following subcommands make use of indices. The character number of these indices is ignored since these commands deal only with whole tokens.

WIDGET  token append  TAG  ARGUMENTS

The command causes a token to be appended to the current list of tokens in the HTML widget. This command is typically used within a token handler.

WIDGET  token delete  INDEX  ?INDEX-2?

Deletes the single token indentified by the index. If a second index is given, the range of tokens from the first to the second index inclusive is deleted.

WIDGET  token find  TAG ?before|after|near INDEX?

Locates all tokens with the given TAG and returns them all as a list. Each element of the returned list is a sublist containing the index for the token and the arguments for the token.

WIDGET  token get  INDEX  ?INDEX-2?

Returns a list of tokens in the range of INDEX through INDEX-2. Each element of the list consists of the token tag followed by the token arguments.

WIDGET  token list  INDEX  INDEX-2?

The same as token get, but has the token id as the first item in each list element.

WIDGET  token markup  INDEX  INDEX-2?

The same as token list, but ignores space and text.

WIDGET  token domtokens  INDEX  INDEX-2?

The same as token domtokens, but ignores all non-DOM tokens.

WIDGET  token getend  INDEX

Given a start token, find the matching end token.

WIDGET  token offset  START NUM1 NUM2

Hard to describe, but used as follows: when you extract text, and do a regex on it, with -indices, you need to convert these offsets back into INDEXES. This returns those begin and end anchor.

WIDGET  token attr  INDEX ?NAME ?VALUE??

Allow get or set a tokens attribute(s). Getting non-existent attr returns an empty string.

WIDGET  token handler  TAG  ?SCRIPT?

This command allows special processing to occur for selected tokens in the HTML input stream. The TAG argument is either ``Text'' or ``Space'' or the name of an HTML tag (ex: ``H3'' or ``/A''). If a non-empty script is specified for a particular tag, then when instances of that tag are encountered by the parser, the parser calls the corresponding script instead of appending the token to the end of the token list. Before calling the script, three arguments are appended:
  1. The token number.
  2. The tag. (ex: H3)
  3. A list of name/value pairs describing all arguments to the tag.
An empty handler script causes the default processing to occur for the tag. If the script argument is omitted all together, then the current value of the token handler for the given tag is returned.

Only one handler may be defined for each token type. If a new handler is specified for a token type that previously had a different handler defined, then the old handler is overwritten by the new.

WIDGET  token insert  INDEX  TAG  ARGUMENTS

Inserts a single token given by TAG and ARGUMENTS into the token list immediately before INDEX. if index is after end of a text token, inserts after token. The insertion cursor is updated.

WIDGET  token attrs  ATTRLIST ?INDEX ?INDEX? ?

Find all tags that contain an attr named in input list. Return TIDs.

WIDGET  token onEvents  ?INDEX ?INDEX? ?

Look for all the onSubmit, onMouseover, etc attributes. returns list of: Event TID Event TID...

WIDGET  token unique  TAG ?INDEX ?INDEX? ?

For the given tag, return all known unique attribute names for the tag.

WIDGET  dom  subcommand args...

There are several dom commands. In all the following, DOMSPEC is a DOM style address. eg. TABLE(1).ROW(2). The Token Id is returned for that element in the page.
WIDGET  dom nameidx  TAG NAME

Convert a named markup to it's array position. ie. TABLE foo might translate to TABLE[2] returning the integer index 2.

WIDGET  dom radioidx  TAG NAME

Translate a radio input items array index to a form item index.

WIDGET  dom id  DOMSPEC

Given a DOMSPEC, return the TID. Obsolete, use ``index &DOMSPEC''.

WIDGET  dom ids  DOMSPEC

Like above, but returns both begin and end TID.

WIDGET  dom value  DOMSPEC

Like dom id, but returns the attributes rather than the TID. Obsolete. Should now use: token attr &DOMSPEC.

WIDGET  dom addr  INDEX

Given an index, return the best guess of the DOM address. eg. TABLES(2).ROWS(1)

WIDGET  dom formel  NNAME

For the forms(N), return the form element with name NAME.

WIDGET  dom tree  INDEX VALUE

Return the HTML Doc as one big DOM tree list. Not fully implemented.

WIDGET  xview  args...

Used to control horizontal scrolling.

WIDGET  xview

Returns a list containing two elements. The elements are a fractions between 0.0 and 1.0 that define the position of the left and right edges of the visible part of the document as a fraction of the whole.

WIDGET  xview moveto  FRACTION

Adjusts the horizontal position of the document so that FRACTION of the horizontal span of the document is off-screen to the left.

WIDGET  xview scroll  NUMBER  WHAT

Shifts the view in the window left or right according to NUMBER and WHAT.   NUMBER is an integer and WHAT is either units or pages.

WIDGET  yview  args...

Used to control the vertical position of the document.

WIDGET  yview

Returns a list containing two elements. The elements are a fractions between 0.0 and 1.0 that define the position of the top and bottom edges of the visible part of the document as a fraction of the whole.

WIDGET  yview  NAME

Adjusts the vertical position of the document so that the tag ``<a name=NAME>'' is on screen, and preferably near the top of the screen.

WIDGET  yview moveto  FRACTION

Adjusts the horizontal position of the document so that FRACTION of the vertical span of the document is off-screen above the visible region.

WIDGET  xview scroll  NUMBER  WHAT

Shifts the view in the window up or down according to NUMBER and WHAT.   NUMBER is an integer and WHAT is either units or pages.