Part 1 of 43
This is the first in a series of questions and answers about core data science topics. See more standards on my Data Science Standards page.
1. Python
- Explain the difference between mutable and immutable types and their relationship to dictionaries.
- Mutable - Refers to a data structure whose state can be altered after it has been created
- Lists - ordered, dynamic collections that are meant for storing collections of data about disparate objects (e.g. different types) (type list)
- Dictionaries - unordered collections of key-value pairs, where each key has to be unique and immutable (type dict)
.get()
- Takes a key from which to return the associated value, or a default argument if that key is not in the dictionary.iteritems()
- Returns a generator that can be used to iterate over the key, value pairs of the dictionary.update()
- Update one dict with the key, value pairs
- Sets - unordered collections of unique keys, where each key is immutable (type set)
- empty curly brackets
{}
are interpreted as a dict. Initiate withmy_set = set()
- empty curly brackets
- Immutability - Refers to a data structure whose state cannot
be modified after it has been created
- Tuples - ordered, static collections that are meant for storing unchanging pieces of data (type tuple)
- Mutable - Refers to a data structure whose state can be altered after it has been created
- Compare the strengths and weaknesses of lists vs. dictionaries.
- Lists and dictionaries are both mutable data storage "containers" for storing a variety or objects. Lists are ordered, and you potentially have to iterate through all the values to search the object inside it. Dictionaries are unordered, and the hash maps of their immutable keys allow for O(1) recall speed, but the storage of the key:item pairings takes up more memory.
- Choose the appropriate collection (dict, Counter, defaultdict) to simplify a problem.
dict()
is a standard Python dictionary.defaultdict()
will automatically add keys that don't exist.Counter
does not automatically add keys, but does come with built-in multiset functions.
- Compare the strengths and weaknesses of lists vs. generators.
- Generators - Allow us to build up an iterator that evaluates lazily (only loads values into memory when explicitly needed to perform some calculation/operation)
xrange()
is the generator equivalent ofrange()
izip()
is the generator equivalent ofzip()
iteritems()
on a dictionary is the generator equivalent ofitems()
- Use a generator unless you need to use all items at once
- If the data can’t all fit in memory at the same time, then we are forced to use a generator (common in image processing or large data applications)
- Generators - Allow us to build up an iterator that evaluates lazily (only loads values into memory when explicitly needed to perform some calculation/operation)
- Write pythonic code.
- Writing Pythonic code means we are using the language in such a way that makes our code more readable while (often) at the same time using Python’s power to make your solution more optimal
- Using
enumerate()
if we need to use the index - Using
with
statements when working with files - Using
izip()
to iterate over two lists at the same time - Using a set to check membership list (and other) comprehensions
squares = [x**2 for x in xrange(1000)]
if x:
instead ofif x == True:
- Pep8 Style Guide
- Python 2 vs 3 Import Guide