Interface Specification For The HTML Widget
This is a draft interface specification for the Tk HTML widget currently under development. Since it is still a draft, it is subject to change. Eventually, the interface will stabilize and this interface specification will morph into a manual page.
Configuration Options
-appletcommand |
This option specifies the name of the Tcl procedure to invoke when the <applet>...</applet> tag sequence is seen. The html widget will append two arguments to the procedure before calling it. The first argument is the name of a widget that the callback should create to hold the applet. The second argument is a list of name/value pairs which are the arguments to the <applet> tag. The text between <applet> and </applet> is normally suppressed. However, if the -appletcommand option is set to the empty string, the <applet> tag is ignored and all text between <applet> and </applet> is displayed normally. "<embed>" is treated as an alias for "<applet></applet>". |
-background |
The background color for the widget. Note that the <body bgcolor=...> HTML tag does not automatically cause the widget to change its background color. If you want the background color to change in response to this HTML tag, then your Tcl script should intercept the <body> tag using the ``token handler'' widget command (described below) and change the background color manually. |
-base |
The base URI for the current document. This should be set to the URI that was used to retrieve the document before parsing begins. |
-bd | An alias for -borderwidth |
-bg | An alias for -background |
-borderwidth |
The width of the 3-D border drawn around the parameter of the widget, in pixels. |
-cursor |
The cursor displayed when the pointer is positioned over the HTML widget. If {}, the cursor reverts to its default shape. |
-exportselection | |
-fontcommand |
The name of a TCL procedure that is used to convert HTML font names into TCL font names. A default built-in procedure is used if the value of this option is {}. When the HTML widget needs a new font, it calls this procedure with two arguments. This first argument is the font size expressed as an integer between 1 and 7. The standard size is 4. The second argument is a set of between 0 and 3 keywords drawn from the following set: "bold", "italic", and "fixed". If the "bold" keyword is present in the second argument, the font returned should be bold. If the "italic" keyword is present, the font should be italic. If the "fixed" keyword is present, the font should be fixed-width. The TCL procedure should return the name of the TCL font that the HTML widget will use to render the given HTML font. If the TCL procedure returns an empty string, then the built-in default procedure is used to determine the font. Examples: This is {4 {}}. This is {4 fixed}.
This is {3 {}}. |
-fg | An alias for -foreground. |
-foreground | The default foreground color in which HTML text is rendered. The HTML can override this using the color=... attribute on various HTML tags. |
-formcommand string
|
Declares a handler for everything to do with forms within a document.
Arguments will be appended to string and the result evaluated
during parsing (for form creation) and when the widget is cleared (for
form cleanup). The first argument is a token for
identifying a form. The second argument selects the action to perform.
The remaining arguments depend on the action, as follows.
name attribute
fairly directly to link
together related widgets, but it will likely cause incorrect
behaviours. Also be careful to observe the order in which the elements are
created; this determines the order in which they must be submitted.
A default form handler with the correct bahaviour written in TCL will be
bundled with the widget.
The attribute names will be downcased within attrs. |
-framecommand | The script specified by this option is invoked when the HTML parser encounters a <frameset>...</frameset> tag sequence. The arguments to the script are TBD. If the value of the option is the empty string, then the text within the <noframe>...</noframe> tag sequence is displayed. |
-height | Specifies the height of the area into which HTML is rendered. This value plus twice the -padx, -borderwidth and -highlightthickness values is the total height of the widget. |
-highlightbackground | |
-highlightcolor | |
-highlightthickness | |
-hyperlinkcommand | The script specified by this option is invoked whenever the user clicks on a hyperlink on the HTML page. Before invoking this script, the URI for the hyperlink is appended. |
-imagecommand |
When a ``<img src=...>'' tag is encountered, the
HTML widget invokes the script specified by this option in order to
get the name of a Tk image object to display the HTML image.
Before invoking the script, the following arguments are appended:
|
-bgimagecommand |
When a ``<table background=...>'' tag is encountered, the
HTML widget invokes the script specified by this option in order to
get the name of a Tk image object to display as the background for
the table. Similarly for TD, TH, and TR.
Before invoking the script, the following arguments are appended:
|
-isvisitedcommand | When the HTML widget encounters a hyperlink (``<a href=...>'') it invokes the script specified by this option in order to determine whether or not the hyperlink has been visited. This information is needed to determine what color to use to display the hyperlink. |
-padx | The amount of extra space to insert between the 3-D border and the left and right sides of the document text. |
-pady | The amount of extra space to insert between the 3-D border and the top and bottom of the document text. |
-relief | The relief used to draw the 3-D border. |
-resolvercommand |
The name of a TCL command used to resolve URIs. If blank, a built-in resolver is used. If a TCL command is specified but it returns an empty string, the built-in resolver is used then too. The build-in resolver is based on the algorithm in section 5.2 of RFC 2396. Multiple URIs are appended to the TCL command before it is executed. The first URI is the BASE URI of the document (the URL that specified by the -base configuration option and updated according to any prior <BASE> markup). Zero or more additional URIs are appended to this base. The result of the script should be the resolution of the whole series or URIs. |
-rulerelief |
Determines the appearance of the Horizontal Rule (<HR>) markup. The default is "sunken". This can also be "raised" or "flat". If "flat", then the <HR> is drawn using a solid line in the current foreground color. "groove" and "ridge" are the same as "flat". |
-scriptcommand |
Whenever <SCRIPT>...</SCRIPT> markup is encountered in the input HTML, the line number, the attributes of the <SCRIPT> markup and the body of the script are appended to this string and the result is executed as a TCL command. If this options is the empty string, then the script is ignored. |
-selectioncolor | The background color used when drawing the selection. The foreground color for the selection is the same as the regular foreground color. |
-tablerelief |
Determines the appearance of the borders around tables. The default is "raised". This can also be "sunken" or "flat". If "flat", then the borders is drawn using solid lines in the current foreground color. "groove" and "ridge" are the same as "flat". |
-takefocus | |
-unvisitedcolor | The foreground color used to draw hyperlinks that have not been visited. |
-underlinehyperlinks | Set to TRUE to cause hyperlinks to be drawn using an underlined font. |
-visitedcolor | The foreground color used to draw hyperlinks that have been visited. |
-width | The width of the document text. This value does not include space allocated for -highlightthickness, -borderwiddth or -padx. |
-xscrollcommand | |
-yscrollcommand |
Indices
Internally, the HTML widget stores the HTML document as a list of tokens. Each token is either
- a contiguous sequence of non-space characters (Text),
- a contiguous sequence of spaces, tabs or newlines (Space),
- or an HTML markup tag (such as ``<em>''.)
Tokens are identified by number. The first token is ``1'', the second is ``2'' and so forth. So in its simplest form, an index is just an integer greater than 0.
Within a single Text or Space token, individual characters are also identified by number, though the counting starts with 0 instead of 1. The character number is connected to the token number by a period. So, for example, the 4th character in the 9th token would be ``9.3''.
Two integers separated by a dot is called the connonical form of an index. Other index forms are available, including:
end | The keyword ``end'' means one character past the last character of the last token. |
last | The keyword ``last'' means the last character of the last token. |
@X,Y | The character located at screen coordinates X,Y. |
&DOM | The element matching the given DOM address. eg. ``tables(1).rows(3)''. |
*.last | The second integer can be replaced by the keyword ``last'' to mean the last character in the token. |
sel.first | This is the first character that is part of the selection. |
sel.last | This is the last character that is part of the selection. |
insert | The character immediately following the insertion cursor. |
Commands
- html window ?options ...?
- Create a new HTML widget instance named windows
- html reformat from to text
-
Convert text from one encoding to another. The text is given
in the text argument. The current encoding of the text
is specified by the from argument. This command returns
the same text in the to encoding.
From and to may be any of the following values:
plain Ordinary text with no characters escaped. http The text is encoded in a form suitable for use with the HTTP protocol. Spaces are converted to "+". Special characters and escaped as "%aa" where "a" is a hexadecimal digit. A special character is anything other than an alphanumeric or one of these: ".", "$", "-", or "_". url The text is encoded in a form suitable for use as a URI. Spaces are converted to "+". Special characters and escaped as "%aa" where "a" is a hexadecimal digit. A special character is anything other than an alphanumeric or one of these: ".", "$", "-", "_", or "/". html The text is encoded in a form suitable for use within HTML. "&" is encoded as "&", "<" is encoded as "<" and so forth. This command is intended to be useful to the TCL procedures that implement callbacks for the HTML widget.
- html urljoin scheme authority path query fragment
- This command takes the five main components of a URI and joins them together into a complete URI. Special characters in any component are escaped.
- html urlsplit uri
- This command takes a single URI and splits it into its five major components: scheme, authorithy, path, query and fragement. The command returns a list where each component is an element of the list. Components missing from the URI are represented as empty elements in the list.
- html gzip file FILE DATA
- This command gzips data to a file.
- html gunzip file FILE
- This command gunzips data from a file.
- html gunzip data DATA
- This command gunzips data from a string.
- html gunzip data DATA
- This command gzips data from a string.
- html gzip file FILE DATA
- This command gzips data to a file.
- html base64 encode DATA
- This command base64 encodes data.
- html base64 decode DATA
- This command base64 decodes data.
- html text format DATA LEN
- This command formats text limiting line length to that specified.
- html xor CMD DATA ...
- This command returns an xor encryption of data. CMD is one of xor, encrypt or decrypt. The last two take a password argument.
- html stdchan CMD CHANNEL
- This command sets the channel to work around MS Windows exec problems. CMD is one of stdin, stdout, stderr.
- html crc32 DATA
- This command produces a 32 bit checksum (but not a real CRC).
Widget Commands
- WIDGET  bgimage  IMAGE ?TID?
- Set IMAGE to be the background image. TID, if supplied,
is the token id of a TABLE, TD, TH or TR. If TID is ommitted,
it is the background image for the whole page.
- WIDGET  cget config-option
-
Return the value of a configuration option. Works just like any
other Tk widget.
- WIDGET  clear
-
Remove all tokens and text from the HTML widget.
The parser is reset to its initial state.
This routine should be called to changes pages.
- WIDGET  configure ?args...?
-
The standard Tk configuration command.
- WIDGET  coords  ?INDEX ?percent??
- Return the screen coordinates of INDEX.
- WIDGET  forminfo  INDEX
- Return forminfo for given INDEX.
- WIDGET  href  X  Y
- If the coordinates X Y define a point above a hyperlink,
then this command will return the target URL for that hyperlink.
The URL will be resolved using the -resolvercommand before it
is returned.
- WIDGET  imageadd  ID  IMAGE
-
Add a single image onto animated image list.
- WIDGET  imageat  X  Y
- If the coordinates X Y define a point above an image,
then this command will return the Token Id for that image.
- WIDGET  images
-
Return the list of animated images.
- WIDGET  imageset  ID  NUM
- For animated gifs, set image number NUM to be the current image.
This is only used for buffered animations.
- WIDGET  imageupdate  ID  IMAGES
- When an Animated gif comes in, this allows changing the
current image into multiple images.
- WIDGET  index  INDEX  ?COUNT  UNITS?
-
Translates INDEX into its connonical form.
The connonical form of an index is two integers separated by a period.
The optional 3rd and 4th arguments specify a displacement from INDEX to the value of the index returned. COUNT can be any integer value, including a negative number. UNITS must be either ``char'' or ``line''.
- WIDGET  insert  INDEX
-
Causes the insertion cursor (a flashing vertical bar) to be positioned
immediately before the character specified by INDEX.
- WIDGET  names
-
This command causes the widget to scan the entire text of the document
looking for tags of the form ``<a name=...>''.
It returns a list of values of the name=... fields.
The vertical position of the document can be moved to any of these names using the ``WIDGET yview NAME'' command described below.
- WIDGET  onscreen  ID X  Y
- Return 1 if ID is onscreen (visible).
- WIDGET  over  X  Y ?-muponly?
- Return a list of TIDS where the coordinates X Y
define a point above objects. If -muponly, give only markup elements.
- WIDGET  overattr  X  Y ATTRS
- Like over but returns markup containing one or more
of the attributes in the list ATTRS.
ATTRS.
- WIDGET  parse  HTML-TEXT
- Adds the given HTML text to the end of any text previously received
through the parse command and parses as much of the text as
possible into tokens.
Afterwards, the display is updated to show the new tokens, if they are
visible.
- WIDGET resolver ?uri ...?
- The resolver specified by the -resolvercommand option
is called with the
base URI of the document followed
by the remaining arguments to this commant. The result of this
command is the result of the -resolvercommand script.
- WIDGET  selection  subcommand args...
- The selection widget command is used to control the selection.
- WIDGET  selection clear
- Clear the current selection. No text will be selected after this
command executes.
- WIDGET  selection set  START  END
- Change the selection to be all text contained within the given
indices.
- WIDGET  selection clear
- WIDGET  refresh options
- Cause a relayout and redraw. Useful after a token insert or update.
Valid options are zero or more of: images, resize, focus, text, border, extend,
clipwin,, styler, animate, vscroll, hscroll, gotfocus, layout.
The default is layout. You may abreviate options with the first letter.
- WIDGET  source
- Return the html source for the current page.
- WIDGET  text  subcommand args...
- There are several token commands. They all have the common
property that they directly manipulate the text that is displayed.
These commands can be used
to build an WYSIWYG editor for HTML.
- WIDGET text ascii  INDEX-1  INDEX-2
Returns plain ASCII text that represents all characters between INDEX-1 and INDEX-2. Formatting tags are omitted. The INDEX-1 character is included by INDEX-2 is omitted.
- WIDGET text delete  INDEX-1  INDEX-2
All text from INDEX-1 up to, but not including INDEX-2 is removed and the display is updated accordingly.
- WIDGET text html  INDEX-1  INDEX-2
Returns HTML text that represents all characters and formatting tags between INDEX-1 and INDEX-2. The INDEX-1 character is included by INDEX-2 is omitted.
- WIDGET text insert  INDEX  TEXT
Inserts one or more characters immediately before the character whose index is given. The insertion cursor is updated.
- WIDGET text break  INDEX 
Break the text token at index into two text tokens.
- WIDGET text find TEXT  ?nocase? ?before|after INDEX? 
Find text. If index is given, start search from there. If before, search backwards. nocase will ignore case.
- WIDGET text ascii  INDEX-1  INDEX-2
- WIDGET text table INDEX  ?images? ?attrs? 
Return text (and optionally attributes and images) from a table as lists. The first list is a list of rows (each a list of cells). The next optional list is the list of attributes, like above, but the element 0 contains the table attrs, and element 0 of each row contains the row attrs. Another optional list is the list of images, each as a set of values: row col charoffset tokenid. charoffset is the character offset within the text that the image appears at. tokenid is the index to use to lookup the attributes such as src.
Some of the following subcommands make use of indices. The character number of these indices is ignored since these commands deal only with whole tokens.
- WIDGET token append
TAG ARGUMENTS
-
The command causes a token to be appended to the current list of
tokens in the HTML widget. This command is typically used within
a token handler.
- WIDGET token delete
INDEX  ?INDEX-2?
-
Deletes the single token indentified by the index. If a second index is
given, the range of tokens from the first to the second index inclusive
is deleted.
- WIDGET token find
TAG ?before|after|near INDEX?
-
Locates all tokens with the given TAG and returns them all
as a list.
Each element of the returned list is a sublist containing the index
for the token and the arguments for the token.
- WIDGET token get
INDEX  ?INDEX-2?
-
Returns a list of tokens in the range of INDEX through
INDEX-2.
Each element of the list consists of the token tag followed by
the token arguments.
- WIDGET token list
INDEX  INDEX-2?
-
The same as token get, but has the token id as the first
item in each list element.
- WIDGET token markup
INDEX  INDEX-2?
-
The same as token list, but ignores space and text.
- WIDGET token domtokens
INDEX  INDEX-2?
-
The same as token domtokens, but ignores all non-DOM tokens.
- WIDGET token getend
INDEX
-
Given a start token, find the matching end token.
- WIDGET token offset
START NUM1 NUM2
-
Hard to describe, but used as follows: when you extract text, and do
a regex on it, with -indices, you need to convert these offsets back
into INDEXES. This returns those begin and end anchor.
- WIDGET token attr
INDEX ?NAME ?VALUE??
-
Allow get or set a tokens attribute(s). Getting non-existent
attr returns an empty string.
- WIDGET token handler
TAG ?SCRIPT?
-
This command allows special processing to occur for selected tokens
in the HTML input stream.
The TAG argument is either ``Text'' or ``Space'' or the name
of an HTML tag (ex: ``H3'' or ``/A'').
If a non-empty script is specified for a particular tag, then when
instances of that tag are encountered by the parser, the parser calls the
corresponding script instead of appending the token to the end of the
token list. Before calling the script, three arguments are appended:
- The token number.
- The tag. (ex: H3)
- A list of name/value pairs describing all arguments to the tag.
Only one handler may be defined for each token type. If a new handler is specified for a token type that previously had a different handler defined, then the old handler is overwritten by the new.
- WIDGET token insert
INDEX  TAG  ARGUMENTS
-
Inserts a single token given by TAG and ARGUMENTS into
the token list immediately before INDEX.
if index is after end of a text token, inserts after token.
The insertion cursor is updated.
- WIDGET token attrs
ATTRLIST ?INDEX ?INDEX? ?
-
Find all tags that contain an attr named in input list. Return TIDs.
- WIDGET token onEvents
?INDEX ?INDEX? ?
-
Look for all the onSubmit, onMouseover, etc attributes. returns list of: Event TID Event TID...
- WIDGET token unique
TAG
?INDEX ?INDEX? ?
-
For the given tag, return all known unique attribute names for the tag.
- WIDGET dom nameidx
TAG NAME
-
Convert a named markup to it's array position. ie. TABLE foo
might translate to TABLE[2] returning the integer index 2.
- WIDGET dom radioidx
TAG NAME
-
Translate a radio input items array index to a form item index.
- WIDGET dom id
DOMSPEC
-
Given a DOMSPEC, return the TID. Obsolete, use ``index &DOMSPEC''.
- WIDGET dom ids
DOMSPEC
-
Like above, but returns both begin and end TID.
- WIDGET dom value
DOMSPEC
-
Like dom id, but returns the attributes rather than the TID.
Obsolete. Should now use: token attr &DOMSPEC.
- WIDGET dom addr
INDEX
-
Given an index, return the best guess of the DOM address.
eg. TABLES(2).ROWS(1)
- WIDGET dom formel
NNAME
-
For the forms(N), return the form element with name NAME.
- WIDGET dom tree
INDEX VALUE
-
Return the HTML Doc as one big DOM tree list. Not fully implemented.
- WIDGET  xview
- Returns a list containing two elements. The elements are a fractions
between 0.0 and 1.0 that define the position of the left and right
edges of
the visible part of the document as a fraction of the whole.
- WIDGET  xview moveto  FRACTION
- Adjusts the horizontal position of the document so that
FRACTION of the horizontal span of the document is off-screen
to the left.
- WIDGET  xview scroll  NUMBER  WHAT
-
Shifts the view in the window left or right according to
NUMBER and WHAT.   NUMBER is an integer
and WHAT is either units or pages.
- WIDGET  yview
- Returns a list containing two elements. The elements are a fractions
between 0.0 and 1.0 that define the position of the top and bottom
edges of
the visible part of the document as a fraction of the whole.
- WIDGET  yview  NAME
- Adjusts the vertical position of the document so that the tag
``<a name=NAME>'' is on screen,
and preferably near the top of the screen.
- WIDGET  yview moveto  FRACTION
- Adjusts the horizontal position of the document so that
FRACTION of the vertical span of the document is off-screen
above the visible region.
- WIDGET  xview scroll NUMBER WHAT
-
Shifts the view in the window up or down according to
NUMBER and WHAT.   NUMBER is an integer
and WHAT is either units or pages.