Using Generator functions with tokenize

The tokenise module, generates tokens out of a stream of text and returns an iterator  for each treated line.

>>> import tokenize

>>> file = open(‘’).next #note no ()…tokenize.generate_tokens needs a function name as argument…which it calls repeatedly till a StopIteration is received

# tokenze.generate_tokens(readline) is a generator that requires 1 argument, readline, which must be a callable object that provides the same interface as the readline() method of built in objects.

# Each call to the function must return 1 line from the input as a string.

>>> tokens = tokenize.generate_tokens(file)

# Generator produces 5-tuples with the following members

# 1. Token Type

# 2. Token String

# 3. Tuple (srow, scol) specifying the row and column where the token begins in the file

# 4. Tuple (erow, ecol) specifying the row and column where the tokens end in the file

# 5. The line on which the token was found 

We get the tokens as :

1. different words

2. alpha-numeric characters separated by special characters are different tokens

3. Special characters are different tokens

4. Escape sequences are different tokens

Tagged , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: