Session Five: Files, Streams & String IO¶
Announcements¶
Review & Questions¶
Homework¶
Code review – let’s take a look.
Lightening talks¶
Today’s lightening talks will be from:
Strings¶
Quick review: a string literal creates a string type:
"this is a string"
'So is this'
"And maybe y'all need something like this!"
"""and this also"""
You can also use str()
In [256]: str(34)
Out[256]: '34'
String Manipulation¶
split
and join
:
In [167]: csv = "comma, separated, values"
In [168]: csv.split(', ')
Out[168]: ['comma', 'separated', 'values']
In [169]: psv = '|'.join(csv.split(', '))
In [170]: psv
Out[170]: 'comma|separated|values'
Case Switching¶
In [171]: sample = 'A long string of words'
In [172]: sample.upper()
Out[172]: 'A LONG STRING OF WORDS'
In [173]: sample.lower()
Out[173]: 'a long string of words'
In [174]: sample.swapcase()
Out[174]: 'a LONG STRING OF WORDS'
In [175]: sample.title()
Out[175]: 'A Long String Of Words'
Testing¶
In [181]: number = "12345"
In [182]: number.isnumeric()
Out[182]: True
In [183]: number.isalnum()
Out[183]: True
In [184]: number.isalpha()
Out[184]: False
In [185]: fancy = "Th!$ $tr!ng h@$ $ymb0l$"
In [186]: fancy.isalnum()
Out[186]: False
String Literals¶
Common Escape Sequences:
\\ Backslash (\)
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\n ASCII Linefeed (LF)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\ooo Character with octal value ooo
\xhh Character with hex value hh
for example – for tab-separted values:
In [25]: s = "these\tare\tseparated\tby\ttabs"
In [26]: print(s)
these are separated by tabs
https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals https://docs.python.org/3/library/stdtypes.html#string-methods
Raw Strings¶
Add an r
in front of the string literal:
Escape Sequences Ignored
In [408]: print("this\nthat")
this
that
In [409]: print(r"this\nthat")
this\nthat
Gotcha
In [415]: r"\"
SyntaxError: EOL while scanning string literal
(handy for regex, windows paths...)
Ordinal values¶
Characters in strings are stored as numeric values:
- “ASCII” values: 1-127
- Unicode values – 1 - 1,114,111 (!!!)
To get the value:
In [109]: for i in 'Chris':
.....: print(ord(i), end=' ')
67 104 114 105 115
In [110]: for i in (67,104,114,105,115):
.....: print(chr(i), end='')
Chris
(these days, stick with ASCII, or use full Unicode: more on that in a few weeks)
Building Strings¶
You can, but please don’t do this:
'Hello ' + name + '!'
(I know – we did that in the grid_printing excercise)
Do this instead:
'Hello {}!'.format(name)
It’s much faster and safer, and easier to modify as code gets complicated.
https://docs.python.org/3/library/string.html#string-formatting
Old and New string formatting¶
back in early python days, there was the string formatting operator: %
" a string: %s and a number: %i "%("text", 45)
This is very similar to C-style string formatting (sprintf).
It’s still around, and handy — but ...
The “new” format()
method is more powerful and flexible, so we’ll focus on that in this class.
The string format()
method:
In [62]: "A decimal integer is: {:d}".format(34)
Out[62]: 'A decimal integer is: 34'
In [63]: "a floating point is: {:f}".format(34.5)
Out[63]: 'a floating point is: 34.500000'
In [64]: "a string is the default: {}".format("anything")
Out[64]: 'a string is the default: anything'
Multiple placeholders¶
In [65]: "the number is {} is {}".format('five', 5)
Out[65]: 'the number is five is 5'
In [66]: "the first 3 numbers are {}, {}, {}".format(1,2,3)
Out[66]: 'the first 3 numbers are 1, 2, 3'
The counts must agree:
In [67]: "string with {} formatting {}".format(1)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-67-a079bc472aca> in <module>()
----> 1 "string with {} formatting {}".format(1)
IndexError: tuple index out of range
Named placeholders¶
In [69]: "Hello, {name}, whaddaya know?".format(name="Joe")
Out[69]: 'Hello, Joe, whaddaya know?'
You can use values more than once, and skip values:
In [73]: "Hi, {name}. Howzit, {name}?".format(name='Bob')
Out[73]: 'Hi, Bob. Howzit, Bob?'
The format operator works with string variables, too:
In [80]: s = "{:d} / {:d} = {:f}"
In [81]: a, b = 12, 3
In [82]: s.format(a, b, a/b)
Out[82]: '12 / 3 = 4.000000'
So you can dynamically build a format string
Complex Formatting¶
There is a complete syntax for specifying all sorts of options.
It’s well worth your while to spend some time getting to know this formatting language. You can accomplish a great deal just with this.
input
¶
For some of the exercises, you’ll need to interact with a user at the command line.
There’s a nice built in function to do this - input
:
In [85]: fred = input('type something-->')
type something-->I've typed something
In [86]: print(fred)
I've typed something
This will display a prompt to the user, allowing them to input text and allowing you to bind that input to a symbol.
Files¶
Text Files
f = open('secrets.txt')
secret_data = f.read()
f.close()
secret_data
is a string
NOTE: these days, you probably need to use Unicode for text – we’ll get to that next week
Binary Files
f = open('secrets.bin', 'rb')
secret_data = f.read()
f.close()
secret_data
is a byte string
(with arbitrary bytes in it – well, not arbitrary – whatever is in the file.)
(See the struct
module to unpack binary data )
File Opening Modes
f = open('secrets.txt', [mode])
'r', 'w', 'a'
'rb', 'wb', 'ab'
r+, w+, a+
r+b, w+b, a+b
These follow the Unix conventions, and aren’t all that well documented in the Python docs. But these BSD docs make it pretty clear:
http://www.manpagez.com/man/3/fopen/
Gotcha – ‘w’ modes always clear the file
Text is default
- Newlines are translated:
\r\n -> \n
- – reading and writing!
- Use *nix-style in your code:
\n
Gotcha:
- no difference between text and binary on *nix
- breaks on Windows
File Reading¶
Reading part of a file
header_size = 4096
f = open('secrets.txt')
secret_header = f.read(header_size)
secret_rest = f.read()
f.close()
Common Idioms
for line in open('secrets.txt'):
print(line)
(the file object is an iterator!)
f = open('secrets.txt')
while True:
line = f.readline()
if not line:
break
do_something_with_line()
We will learn more about the keyword with later, but for now, just understand the syntax and the advantage over the try-finally block:
with open('workfile', 'r') as f:
read_data = f.read()
f.closed
True
File Writing¶
outfile = open('output.txt', 'w')
for i in range(10):
outfile.write("this is line: %i\n"%i)
outfile.close()
with open('output.txt', 'w') as f:
for i in range(10):
f.write("this is line: %i\n"%i)
File Methods¶
Commonly Used Methods
f.read() f.readline() f.readlines()
f.write(str) f.writelines(seq)
f.seek(offset) f.tell() # for binary files, mostly
f.close()
Stream IO¶
In [417]: import io
In [420]: f = io.StringIO()
In [421]: f.write("somestuff")
In [422]: f.seek(0)
In [423]: f.read()
Out[423]: 'somestuff'
Out[424]: stuff = f.getvalue()
Out[425]: f.close()
(handy for testing file handling code...)
There is also cStringIO – a bit faster.
from cStringIO import StringIO
Paths¶
Paths are generally handled with simple strings (or Unicode strings)
Relative paths:
'secret.txt'
'./secret.txt'
Absolute paths:
'/home/chris/secret.txt'
Either work with open()
, etc.
(working directory only makes sense with command-line programs...)
os module¶
os.getcwd()
os.chdir(path)
os.path.abspath()
os.path.relpath()
os.path.split()
os.path.splitext()
os.path.basename()
os.path.dirname()
os.path.join()
(all platform independent)
os.listdir()
os.mkdir()
os.walk()
(higher level stuff in shutil
module)
pathlib¶
pathlib
is a package for handling paths in an OO way:
http://pathlib.readthedocs.org/en/pep428/
All the stuff in os.path and more:
In [64]: import pathlib
In [65]: pth = pathlib.Path('./')
In [66]: pth.is_dir()
Out[66]: True
In [67]: pth.absolute()
Out[67]: PosixPath('/Users/Chris/PythonStuff/UWPCE/IntroPython2015')
In [68]: for f in pth.iterdir():
print(f)
junk2.txt
junkfile.txt
...
Lab: Files¶
In the class repo, in:
Examples\students.txt
You will find the list I generated of all the students in the class, and what programming languages they have used in the past.
Write a little script that reads that file, and generates a list of all the languages that have been used.
Extra credit: keep track of how many students specified each language.
Homework¶
Catch up!¶
- Finish the LABs from today
- Catch up from last week.
- Add Exception handling to mailroom
- and add some tests
- and list (and dict, and set) comprehensions...
- If you’ve done all that – check out the collections module:
- https://docs.python.org/3.5/library/collections.html
- here’s a good overview: https://pymotw.com/3/collections/
Paths and File Processing¶
- write a program which prints the full path to all files in the current directory, one per line
- write a program which copies a file from a source, to a destination
(without using shutil, or the OS copy command)
- advanced: make it work for any size file: i.e. don’t read the entire contents of the file into memory at once.
- Note that if you want it to do any kind of file, you need to open the files in binary mode:
open(filename, 'rb')
(or'wb'
for writing.)
- update mailroom from last week to:
- Use dicts where appropriate
- Write a full set of letters to everyone to individual files on disk
- See if you can use a dict to switch between the users selections
- Try to use a dict and the .format() method to do the letter as one big template – rather than building up a big string in parts.
Material to review before next week¶
- Dive into Python3: 7.2 – 7.3 http://www.diveintopython3.net/iterators.html#defining-classes
- Think Pyhton: 15 – 18 http://www.greenteapress.com/thinkpython/html/thinkpython016.html
- LPTHW: 40 – 44 http://learnpythonthehardway.org/book/ex40.html
[Note that in py3 you don’t need to inherit from object]
Talk by Raymond Hettinger:
https://speakerdeck.com/pyconslides/pythons-class-development-toolkit-by-raymond-hettinger