Abstract

PEP:284
Title:Integer for-loops
Version:$Revision$
Last-Modified:$Date$
Author:David Eppstein <eppstein at ics.uci.edu>, Greg Ewing <greg.ewing at canterbury.ac.nz>
Status:Rejected
Type:Standards Track
Content-Type:text/x-rst
Created:1-Mar-2002
Python-Version:2.3
Post-History:

This PEP proposes to simplify iteration over intervals of integers, by extending the range of expressions allowed after a "for" keyword to allow three-way comparisons such as:

for lower <= var < upper:

in place of the current:

for item in list:

syntax. The resulting loop or list iteration will loop over all values of var that make the comparison true, starting from the left endpoint of the given interval.

Pronouncement

This PEP is rejected. There were a number of fixable issues with the proposal (see the fixups listed in Raymond Hettinger's python-dev post on 18 June 2005 [5]). However, even with the fixups the proposal did not garner support. Specifically, Guido did not buy the premise that the range() format needed fixing, "The whole point (15 years ago) of range() was to avoid needing syntax to specify a loop over numbers. I think it's worked out well and there's nothing that needs to be fixed (except range() needs to become an iterator, which it will in Python 3.0)."

Rationale

One of the most common uses of for-loops in Python is to iterate over an interval of integers. Python provides functions range() and xrange() to generate lists and iterators for such intervals, which work best for the most frequent case: half-open intervals increasing from zero. However, the range() syntax is more awkward for open or closed intervals, and lacks symmetry when reversing the order of iteration. In addition, the call to an unfamiliar function makes it difficult for newcomers to Python to understand code that uses range() or xrange().

The perceived lack of a natural, intuitive integer iteration syntax has led to heated debate on python-list, and spawned at least four PEPs before this one. PEP 204 [1] (rejected) proposed to re-use Python's slice syntax for integer ranges, leading to a terser syntax but not solving the readability problem of multi-argument range(). PEP 212 [2] (deferred) proposed several syntaxes for directly converting a list to a sequence of integer indices, in place of the current idiom:

range(len(list))

for such conversion, and PEP 281 [3] proposes to simplify the same idiom by allowing it to be written as:

range(list).

PEP 276 [4] proposes to allow automatic conversion of integers to iterators, simplifying the most common half-open case but not addressing the complexities of other types of interval. Additional alternatives have been discussed on python-list.

The solution described here is to allow a three-way comparison after a "for" keyword, both in the context of a for-loop and of a list comprehension:

for lower <= var < upper:

This would cause iteration over an interval of consecutive integers, beginning at the left bound in the comparison and ending at the right bound. The exact comparison operations used would determine whether the interval is open or closed at either end and whether the integers are considered in ascending or descending order.

This syntax closely matches standard mathematical notation, so is likely to be more familiar to Python novices than the current range() syntax. Open and closed interval endpoints are equally easy to express, and the reversal of an integer interval can be formed simply by swapping the two endpoints and reversing the comparisons. In addition, the semantics of such a loop would closely resemble one way of interpreting the existing Python for-loops:

for item in list

iterates over exactly those values of item that cause the expression:

item in list

to be true. Similarly, the new format:

for lower <= var < upper:

would iterate over exactly those integer values of var that cause the expression:

lower <= var < upper

to be true.

Specification

We propose to extend the syntax of a for statement, currently:

for_stmt: "for" target_list "in" expression_list ":" suite
      ["else" ":" suite]

as described below:

for_stmt: "for" for_test ":" suite ["else" ":" suite]
for_test: target_list "in" expression_list |
        or_expr less_comp or_expr less_comp or_expr |
        or_expr greater_comp or_expr greater_comp or_expr
less_comp: "<" | "<="
greater_comp: ">" | ">="

Similarly, we propose to extend the syntax of list comprehensions, currently:

list_for: "for" expression_list "in" testlist [list_iter]

by replacing it with:

list_for: "for" for_test [list_iter]

In all cases the expression formed by for_test would be subject to the same precedence rules as comparisons in expressions. The two comp_operators in a for_test must be required to be both of similar types, unlike chained comparisons in expressions which do not have such a restriction.

We refer to the two or_expr's occurring on the left and right sides of the for-loop syntax as the bounds of the loop, and the middle or_expr as the variable of the loop. When a for-loop using the new syntax is executed, the expressions for both bounds will be evaluated, and an iterator object created that iterates through all integers between the two bounds according to the comparison operations used. The iterator will begin with an integer equal or near to the left bound, and then step through the remaining integers with a step size of +1 or -1 if the comparison operation is in the set described by less_comp or greater_comp respectively. The execution will then proceed as if the expression had been:

for variable in iterator

where "variable" refers to the variable of the loop and "iterator" refers to the iterator created for the given integer interval.

The values taken by the loop variable in an integer for-loop may be either plain integers or long integers, according to the magnitude of the bounds. Both bounds of an integer for-loop must evaluate to a real numeric type (integer, long, or float). Any other value will cause the for-loop statement to raise a TypeError exception.

Issues

The following issues were raised in discussion of this and related proposals on the Python list.

  • Should the right bound be evaluated once, or every time through the loop? Clearly, it only makes sense to evaluate the left bound once. For reasons of consistency and efficiency, we have chosen the same convention for the right bound.

  • Although the new syntax considerably simplifies integer for-loops, list comprehensions using the new syntax are not as simple. We feel that this is appropriate since for-loops are more frequent than comprehensions.

  • The proposal does not allow access to integer iterator objects such as would be created by xrange. True, but we see this as a shortcoming in the general list-comprehension syntax, beyond the scope of this proposal. In addition, xrange() will still be available.

  • The proposal does not allow increments other than 1 and -1. More general arithmetic progressions would need to be created by range() or xrange(), or by a list comprehension syntax such as:

    [2*x for 0 <= x <= 100]
    
  • The position of the loop variable in the middle of a three-way comparison is not as apparent as the variable in the present:

    for item in list
    

syntax, leading to a possible loss of readability. We feel that this loss is outweighed by the increase in readability from a natural integer iteration syntax.

  • To some extent, this PEP addresses the same issues as PEP 276 [4]. We feel that the two PEPs are not in conflict since PEP 276 is primarily concerned with half-open ranges starting in 0 (the easy case of range()) while this PEP is primarily concerned with simplifying all other cases. However, if this PEP is approved, its new simpler syntax for integer loops could to some extent reduce the motivation for PEP 276.
  • It is not clear whether it makes sense to allow floating point bounds for an integer loop: if a float represents an inexact value, how can it be used to determine an exact sequence of integers? On the other hand, disallowing float bounds would make it difficult to use floor() and ceiling() in integer for-loops, as it is difficult to use them now with range(). We have erred on the side of flexibility, but this may lead to some implementation difficulties in determining the smallest and largest integer values that would cause a given comparison to be true.
  • Should types other than int, long, and float be allowed as bounds? Another choice would be to convert all bounds to integers by int(), and allow as bounds anything that can be so converted instead of just floats. However, this would change the semantics: 0.3 <= x is not the same as int(0.3) <= x, and it would be confusing for a loop with 0.3 as lower bound to start at zero. Also, in general int(f) can be very far from f.

Implementation

An implementation is not available at this time. Implementation is not expected to pose any great difficulties: the new syntax could, if necessary, be recognized by parsing a general expression after each "for" keyword and testing whether the top level operation of the expression is "in" or a three-way comparison. The Python compiler would convert any instance of the new syntax into a loop over the items in a special iterator object.