Session Five: Files, Streams & String IO¶

Announcements¶

Review & Questions¶

Homework¶

Code review – let’s take a look.

Lightening talks¶

Today’s lightening talks will be from:

Strings¶

Quick review: a string literal creates a string type:

"this is a string"

'So is this'

"And maybe y'all need something like this!"

"""and this also"""

You can also use str()

In [256]: str(34)
Out[256]: '34'

String Manipulation¶

split and join:

In [167]: csv = "comma, separated, values"
In [168]: csv.split(', ')
Out[168]: ['comma', 'separated', 'values']
In [169]: psv = '|'.join(csv.split(', '))
In [170]: psv
Out[170]: 'comma|separated|values'

Case Switching¶

In [171]: sample = 'A long string of words'
In [172]: sample.upper()
Out[172]: 'A LONG STRING OF WORDS'
In [173]: sample.lower()
Out[173]: 'a long string of words'
In [174]: sample.swapcase()
Out[174]: 'a LONG STRING OF WORDS'
In [175]: sample.title()
Out[175]: 'A Long String Of Words'

Testing¶

In [181]: number = "12345"
In [182]: number.isnumeric()
Out[182]: True
In [183]: number.isalnum()
Out[183]: True
In [184]: number.isalpha()
Out[184]: False
In [185]: fancy = "Th!$ $tr!ng h@$ $ymb0l$"
In [186]: fancy.isalnum()
Out[186]: False

String Literals¶

Common Escape Sequences:

\\  Backslash (\)
\a  ASCII Bell (BEL)
\b  ASCII Backspace (BS)
\n  ASCII Linefeed (LF)
\r  ASCII Carriage Return (CR)
\t  ASCII Horizontal Tab (TAB)
\ooo  Character with octal value ooo
\xhh  Character with hex value hh

for example – for tab-separted values:

In [25]: s = "these\tare\tseparated\tby\ttabs"

In [26]: print(s)
these   are separated    by  tabs

https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals https://docs.python.org/3/library/stdtypes.html#string-methods

Raw Strings¶

Add an r in front of the string literal:

Escape Sequences Ignored

In [408]: print("this\nthat")
this
that
In [409]: print(r"this\nthat")
this\nthat

Gotcha

In [415]: r"\"
SyntaxError: EOL while scanning string literal

(handy for regex, windows paths...)

Ordinal values¶

Characters in strings are stored as numeric values:

“ASCII” values: 1-127
Unicode values – 1 - 1,114,111 (!!!)

To get the value:

In [109]: for i in 'Chris':
   .....:     print(ord(i), end=' ')
67 104 114 105 115
In [110]: for i in (67,104,114,105,115):
   .....:     print(chr(i), end='')
Chris

(these days, stick with ASCII, or use full Unicode: more on that in a few weeks)

Building Strings¶

You can, but please don’t do this:

'Hello ' + name + '!'

(I know – we did that in the grid_printing excercise)

Do this instead:

'Hello {}!'.format(name)

It’s much faster and safer, and easier to modify as code gets complicated.

https://docs.python.org/3/library/string.html#string-formatting

Old and New string formatting¶

back in early python days, there was the string formatting operator: %

" a string: %s and a number: %i "%("text", 45)

This is very similar to C-style string formatting (sprintf).

It’s still around, and handy — but ...

The “new” format() method is more powerful and flexible, so we’ll focus on that in this class.

The string format() method:

In [62]: "A decimal integer is: {:d}".format(34)
Out[62]: 'A decimal integer is: 34'

In [63]: "a floating point is: {:f}".format(34.5)
Out[63]: 'a floating point is: 34.500000'

In [64]: "a string is the default: {}".format("anything")
Out[64]: 'a string is the default: anything'

Multiple placeholders¶

In [65]: "the number is {} is {}".format('five', 5)
Out[65]: 'the number is five is 5'

In [66]: "the first 3 numbers are {}, {}, {}".format(1,2,3)
Out[66]: 'the first 3 numbers are 1, 2, 3'

The counts must agree:

In [67]: "string with {} formatting {}".format(1)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-67-a079bc472aca> in <module>()
----> 1 "string with {} formatting {}".format(1)

IndexError: tuple index out of range

Named placeholders¶

In [69]: "Hello, {name}, whaddaya know?".format(name="Joe")
Out[69]: 'Hello, Joe, whaddaya know?'

You can use values more than once, and skip values:

In [73]: "Hi, {name}. Howzit, {name}?".format(name='Bob')
Out[73]: 'Hi, Bob. Howzit, Bob?'

The format operator works with string variables, too:

In [80]: s = "{:d} / {:d} = {:f}"

In [81]: a, b = 12, 3

In [82]: s.format(a, b, a/b)
Out[82]: '12 / 3 = 4.000000'

So you can dynamically build a format string

Complex Formatting¶

There is a complete syntax for specifying all sorts of options.

It’s well worth your while to spend some time getting to know this formatting language. You can accomplish a great deal just with this.

`input`¶

For some of the exercises, you’ll need to interact with a user at the command line.

There’s a nice built in function to do this - input:

In [85]: fred = input('type something-->')
type something-->I've typed something

In [86]: print(fred)
I've typed something

This will display a prompt to the user, allowing them to input text and allowing you to bind that input to a symbol.

Lab: String Formatting¶

Let’s play with these a bit:

String Formatting Lab

Files¶

Text Files

f = open('secrets.txt')
secret_data = f.read()
f.close()

secret_data is a string

NOTE: these days, you probably need to use Unicode for text – we’ll get to that next week

Binary Files

f = open('secrets.bin', 'rb')
secret_data = f.read()
f.close()

secret_data is a byte string

(with arbitrary bytes in it – well, not arbitrary – whatever is in the file.)

(See the struct module to unpack binary data )

File Opening Modes

f = open('secrets.txt', [mode])
'r', 'w', 'a'
'rb', 'wb', 'ab'
r+, w+, a+
r+b, w+b, a+b

These follow the Unix conventions, and aren’t all that well documented in the Python docs. But these BSD docs make it pretty clear:

http://www.manpagez.com/man/3/fopen/

Gotcha – ‘w’ modes always clear the file

Text is default

Newlines are translated: \r\n -> \n

– reading and writing!

Use *nix-style in your code: \n

Gotcha:

no difference between text and binary on *nix

breaks on Windows

File Reading¶

Reading part of a file

header_size = 4096
f = open('secrets.txt')
secret_header = f.read(header_size)
secret_rest = f.read()
f.close()

Common Idioms

for line in open('secrets.txt'):
    print(line)

(the file object is an iterator!)

f = open('secrets.txt')
while True:
    line = f.readline()
    if not line:
        break
    do_something_with_line()

We will learn more about the keyword with later, but for now, just understand the syntax and the advantage over the try-finally block:

with open('workfile', 'r') as f:
    read_data = f.read()
f.closed
True

File Writing¶

outfile = open('output.txt', 'w')
for i in range(10):
    outfile.write("this is line: %i\n"%i)
outfile.close()

with open('output.txt', 'w') as f:
    for i in range(10):
       f.write("this is line: %i\n"%i)

File Methods¶

Commonly Used Methods

f.read() f.readline()  f.readlines()

f.write(str) f.writelines(seq)

f.seek(offset)   f.tell() # for binary files, mostly

f.close()

Stream IO¶

In [417]: import io
In [420]: f = io.StringIO()
In [421]: f.write("somestuff")
In [422]: f.seek(0)
In [423]: f.read()
Out[423]: 'somestuff'
Out[424]: stuff = f.getvalue()
Out[425]: f.close()

(handy for testing file handling code...)

There is also cStringIO – a bit faster.

from cStringIO import StringIO

Paths¶

Paths are generally handled with simple strings (or Unicode strings)

Relative paths:

'secret.txt'
'./secret.txt'

Absolute paths:

'/home/chris/secret.txt'

Either work with open() , etc.

(working directory only makes sense with command-line programs...)

os module¶

os.getcwd()
os.chdir(path)
os.path.abspath()
os.path.relpath()

os.path.split()
os.path.splitext()
os.path.basename()
os.path.dirname()
os.path.join()

(all platform independent)

os.listdir()
os.mkdir()
os.walk()

(higher level stuff in shutil module)

pathlib¶

pathlib is a package for handling paths in an OO way:

http://pathlib.readthedocs.org/en/pep428/

All the stuff in os.path and more:

In [64]: import pathlib
In [65]: pth = pathlib.Path('./')
In [66]: pth.is_dir()
Out[66]: True
In [67]: pth.absolute()
Out[67]: PosixPath('/Users/Chris/PythonStuff/UWPCE/IntroPython2015')
In [68]: for f in pth.iterdir():
             print(f)
junk2.txt
junkfile.txt
...

Lab: Files¶

In the class repo, in:

Examples\students.txt

You will find the list I generated of all the students in the class, and what programming languages they have used in the past.

Write a little script that reads that file, and generates a list of all the languages that have been used.

Extra credit: keep track of how many students specified each language.

Homework¶

Catch up!¶

Finish the LABs from today
Catch up from last week.
- Add Exception handling to mailroom
- and add some tests
- and list (and dict, and set) comprehensions...
If you’ve done all that – check out the collections module:
- https://docs.python.org/3.5/library/collections.html
- here’s a good overview: https://pymotw.com/3/collections/

Paths and File Processing¶

write a program which prints the full path to all files in the current directory, one per line
write a program which copies a file from a source, to a destination (without using shutil, or the OS copy command)
- advanced: make it work for any size file: i.e. don’t read the entire contents of the file into memory at once.
- Note that if you want it to do any kind of file, you need to open the files in binary mode: open(filename, 'rb') (or 'wb' for writing.)
update mailroom from last week to:
- Use dicts where appropriate
- Write a full set of letters to everyone to individual files on disk
- See if you can use a dict to switch between the users selections
- Try to use a dict and the .format() method to do the letter as one big template – rather than building up a big string in parts.

Material to review before next week¶

Dive into Python3: 7.2 – 7.3 http://www.diveintopython3.net/iterators.html#defining-classes

Think Pyhton: 15 – 18 http://www.greenteapress.com/thinkpython/html/thinkpython016.html

LPTHW: 40 – 44 http://learnpythonthehardway.org/book/ex40.html

[Note that in py3 you don’t need to inherit from object]

Talk by Raymond Hettinger:

https://youtu.be/HTLu2DFOdTg

https://speakerdeck.com/pyconslides/pythons-class-development-toolkit-by-raymond-hettinger

Session Five: Files, Streams & String IO¶

Announcements¶

Review & Questions¶

Homework¶

Lightening talks¶

Strings¶

String Manipulation¶

Case Switching¶

Testing¶

String Literals¶

Raw Strings¶

Ordinal values¶

Building Strings¶

Old and New string formatting¶

Multiple placeholders¶

Named placeholders¶

Complex Formatting¶

input¶

Lab: String Formatting¶

Files¶

File Reading¶

File Writing¶

File Methods¶

Stream IO¶

Paths¶

os module¶

pathlib¶

Lab: Files¶

Homework¶

Catch up!¶

Paths and File Processing¶

Material to review before next week¶

`input`¶