Session Five: Files, Streams & String IO

Announcements

Review & Questions

Homework

Code review – let’s take a look.

Lightening talks

Today’s lightening talks will be from:

Strings

Quick review: a string literal creates a string type:

"this is a string"

'So is this'

"And maybe y'all need something like this!"

"""and this also"""

You can also use str()

In [256]: str(34)
Out[256]: '34'

String Manipulation

split and join:

In [167]: csv = "comma, separated, values"
In [168]: csv.split(', ')
Out[168]: ['comma', 'separated', 'values']
In [169]: psv = '|'.join(csv.split(', '))
In [170]: psv
Out[170]: 'comma|separated|values'

Case Switching

In [171]: sample = 'A long string of words'
In [172]: sample.upper()
Out[172]: 'A LONG STRING OF WORDS'
In [173]: sample.lower()
Out[173]: 'a long string of words'
In [174]: sample.swapcase()
Out[174]: 'a LONG STRING OF WORDS'
In [175]: sample.title()
Out[175]: 'A Long String Of Words'

Testing

In [181]: number = "12345"
In [182]: number.isnumeric()
Out[182]: True
In [183]: number.isalnum()
Out[183]: True
In [184]: number.isalpha()
Out[184]: False
In [185]: fancy = "Th!$ $tr!ng h@$ $ymb0l$"
In [186]: fancy.isalnum()
Out[186]: False

String Literals

Common Escape Sequences:

\\  Backslash (\)
\a  ASCII Bell (BEL)
\b  ASCII Backspace (BS)
\n  ASCII Linefeed (LF)
\r  ASCII Carriage Return (CR)
\t  ASCII Horizontal Tab (TAB)
\ooo  Character with octal value ooo
\xhh  Character with hex value hh

for example – for tab-separted values:

In [25]: s = "these\tare\tseparated\tby\ttabs"

In [26]: print(s)
these   are separated    by  tabs

https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals https://docs.python.org/3/library/stdtypes.html#string-methods

Raw Strings

Add an r in front of the string literal:

Escape Sequences Ignored

In [408]: print("this\nthat")
this
that
In [409]: print(r"this\nthat")
this\nthat

Gotcha

In [415]: r"\"
SyntaxError: EOL while scanning string literal

(handy for regex, windows paths...)

Ordinal values

Characters in strings are stored as numeric values:

  • “ASCII” values: 1-127
  • Unicode values – 1 - 1,114,111 (!!!)

To get the value:

In [109]: for i in 'Chris':
   .....:     print(ord(i), end=' ')
67 104 114 105 115
In [110]: for i in (67,104,114,105,115):
   .....:     print(chr(i), end='')
Chris

(these days, stick with ASCII, or use full Unicode: more on that in a few weeks)

Building Strings

You can, but please don’t do this:

'Hello ' + name + '!'

(I know – we did that in the grid_printing excercise)

Do this instead:

'Hello {}!'.format(name)

It’s much faster and safer, and easier to modify as code gets complicated.

https://docs.python.org/3/library/string.html#string-formatting

Old and New string formatting

back in early python days, there was the string formatting operator: %

" a string: %s and a number: %i "%("text", 45)

This is very similar to C-style string formatting (sprintf).

It’s still around, and handy — but ...

The “new” format() method is more powerful and flexible, so we’ll focus on that in this class.

The string format() method:

In [62]: "A decimal integer is: {:d}".format(34)
Out[62]: 'A decimal integer is: 34'

In [63]: "a floating point is: {:f}".format(34.5)
Out[63]: 'a floating point is: 34.500000'

In [64]: "a string is the default: {}".format("anything")
Out[64]: 'a string is the default: anything'

Multiple placeholders

In [65]: "the number is {} is {}".format('five', 5)
Out[65]: 'the number is five is 5'

In [66]: "the first 3 numbers are {}, {}, {}".format(1,2,3)
Out[66]: 'the first 3 numbers are 1, 2, 3'

The counts must agree:

In [67]: "string with {} formatting {}".format(1)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-67-a079bc472aca> in <module>()
----> 1 "string with {} formatting {}".format(1)

IndexError: tuple index out of range

Named placeholders

In [69]: "Hello, {name}, whaddaya know?".format(name="Joe")
Out[69]: 'Hello, Joe, whaddaya know?'

You can use values more than once, and skip values:

In [73]: "Hi, {name}. Howzit, {name}?".format(name='Bob')
Out[73]: 'Hi, Bob. Howzit, Bob?'

The format operator works with string variables, too:

In [80]: s = "{:d} / {:d} = {:f}"

In [81]: a, b = 12, 3

In [82]: s.format(a, b, a/b)
Out[82]: '12 / 3 = 4.000000'

So you can dynamically build a format string

Complex Formatting

There is a complete syntax for specifying all sorts of options.

It’s well worth your while to spend some time getting to know this formatting language. You can accomplish a great deal just with this.

input

For some of the exercises, you’ll need to interact with a user at the command line.

There’s a nice built in function to do this - input:

In [85]: fred = input('type something-->')
type something-->I've typed something

In [86]: print(fred)
I've typed something

This will display a prompt to the user, allowing them to input text and allowing you to bind that input to a symbol.

Lab: String Formatting

Let’s play with these a bit:

String Formatting Lab

Files

Text Files

f = open('secrets.txt')
secret_data = f.read()
f.close()

secret_data is a string

NOTE: these days, you probably need to use Unicode for text – we’ll get to that next week

Binary Files

f = open('secrets.bin', 'rb')
secret_data = f.read()
f.close()

secret_data is a byte string

(with arbitrary bytes in it – well, not arbitrary – whatever is in the file.)

(See the struct module to unpack binary data )

File Opening Modes

f = open('secrets.txt', [mode])
'r', 'w', 'a'
'rb', 'wb', 'ab'
r+, w+, a+
r+b, w+b, a+b

These follow the Unix conventions, and aren’t all that well documented in the Python docs. But these BSD docs make it pretty clear:

http://www.manpagez.com/man/3/fopen/

Gotcha – ‘w’ modes always clear the file

Text is default

  • Newlines are translated: \r\n -> \n
  • – reading and writing!
  • Use *nix-style in your code: \n

Gotcha:

  • no difference between text and binary on *nix
  • breaks on Windows

File Reading

Reading part of a file

header_size = 4096
f = open('secrets.txt')
secret_header = f.read(header_size)
secret_rest = f.read()
f.close()

Common Idioms

for line in open('secrets.txt'):
    print(line)

(the file object is an iterator!)

f = open('secrets.txt')
while True:
    line = f.readline()
    if not line:
        break
    do_something_with_line()

We will learn more about the keyword with later, but for now, just understand the syntax and the advantage over the try-finally block:

with open('workfile', 'r') as f:
    read_data = f.read()
f.closed
True

File Writing

outfile = open('output.txt', 'w')
for i in range(10):
    outfile.write("this is line: %i\n"%i)
outfile.close()

with open('output.txt', 'w') as f:
    for i in range(10):
       f.write("this is line: %i\n"%i)

File Methods

Commonly Used Methods

f.read() f.readline()  f.readlines()

f.write(str) f.writelines(seq)

f.seek(offset)   f.tell() # for binary files, mostly

f.close()

Stream IO

In [417]: import io
In [420]: f = io.StringIO()
In [421]: f.write("somestuff")
In [422]: f.seek(0)
In [423]: f.read()
Out[423]: 'somestuff'
Out[424]: stuff = f.getvalue()
Out[425]: f.close()

(handy for testing file handling code...)

There is also cStringIO – a bit faster.

from cStringIO import StringIO

Paths

Paths are generally handled with simple strings (or Unicode strings)

Relative paths:

'secret.txt'
'./secret.txt'

Absolute paths:

'/home/chris/secret.txt'

Either work with open() , etc.

(working directory only makes sense with command-line programs...)

os module

os.getcwd()
os.chdir(path)
os.path.abspath()
os.path.relpath()
os.path.split()
os.path.splitext()
os.path.basename()
os.path.dirname()
os.path.join()

(all platform independent)

os.listdir()
os.mkdir()
os.walk()

(higher level stuff in shutil module)

pathlib

pathlib is a package for handling paths in an OO way:

http://pathlib.readthedocs.org/en/pep428/

All the stuff in os.path and more:

In [64]: import pathlib
In [65]: pth = pathlib.Path('./')
In [66]: pth.is_dir()
Out[66]: True
In [67]: pth.absolute()
Out[67]: PosixPath('/Users/Chris/PythonStuff/UWPCE/IntroPython2015')
In [68]: for f in pth.iterdir():
             print(f)
junk2.txt
junkfile.txt
...

Lab: Files

In the class repo, in:

Examples\students.txt

You will find the list I generated of all the students in the class, and what programming languages they have used in the past.

Write a little script that reads that file, and generates a list of all the languages that have been used.

Extra credit: keep track of how many students specified each language.

Homework

Catch up!

Paths and File Processing

  • write a program which prints the full path to all files in the current directory, one per line
  • write a program which copies a file from a source, to a destination (without using shutil, or the OS copy command)
    • advanced: make it work for any size file: i.e. don’t read the entire contents of the file into memory at once.
    • Note that if you want it to do any kind of file, you need to open the files in binary mode: open(filename, 'rb') (or 'wb' for writing.)
  • update mailroom from last week to:
    • Use dicts where appropriate
    • Write a full set of letters to everyone to individual files on disk
    • See if you can use a dict to switch between the users selections
    • Try to use a dict and the .format() method to do the letter as one big template – rather than building up a big string in parts.

Material to review before next week

[Note that in py3 you don’t need to inherit from object]

Talk by Raymond Hettinger:

https://youtu.be/HTLu2DFOdTg

https://speakerdeck.com/pyconslides/pythons-class-development-toolkit-by-raymond-hettinger