10.8. File Read

  • Works with both relative and absolute path

  • Fails when directory with file cannot be accessed

  • Fails when file cannot be accessed

  • Uses context manager

  • mode parameter to open() function is optional (defaults to mode='rt')

10.8.1. SetUp

>>> from pathlib import Path
>>> Path('/tmp/myfile.txt').unlink(missing_ok=True)
>>> Path('/tmp/myfile.txt').touch()
>>>
>>>
>>> DATA = """Sepal length,Sepal width,Petal length,Petal width,Species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... 6.3,2.9,5.6,1.8,virginica
... 6.4,3.2,4.5,1.5,versicolor
... 4.7,3.2,1.3,0.2,setosa
... """
>>>
>>> with open('/tmp/myfile.txt', mode='w') as file:
...     _ = file.write(DATA)

10.8.2. Read From File

  • Always remember to close file

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> file = open(FILE)
>>> data = file.read()
>>> file.close()

10.8.3. Read Using Context Manager

  • Context managers use with ... as ...: syntax

  • It closes file automatically upon block exit (dedent)

  • Using context manager is best practice

  • More information in Protocol Context Manager

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     data = file.read()

10.8.4. Read File at Once

  • Note, that whole file must fit into memory

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     data = file.read()

10.8.5. Read File as List of Lines

  • Note, that whole file must fit into memory

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     data = file.readlines()

Read selected (1-30) lines from file:

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     lines = file.readlines()[1:30]

Read selected (1-30) lines from file:

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     for line in file.readlines()[1:30]:
...         line = line.strip()

Read whole file and split by lines, separate header from content:

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> 
... with open(FILE) as file:
...     header, *content = file.readlines()
...
...     for line in content:
...         line = line.strip()

10.8.6. Reading File as Generator

  • Use generator to iterate over other lines

  • In those examples, file is a generator

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     for line in file:
...         line = line.strip()
>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     header = file.readline()
...
...     for line in file:
...         line = line.strip()

10.8.7. Examples

>>> FILE = r'/tmp/myfile.txt'
... # Sepal length,Sepal width,Petal length,Petal width,Species
... # 5.8,2.7,5.1,1.9,virginica
... # 5.1,3.5,1.4,0.2,setosa
... # 5.7,2.8,4.1,1.3,versicolor
... # 6.3,2.9,5.6,1.8,virginica
... # 6.4,3.2,4.5,1.5,versicolor
... # 4.7,3.2,1.3,0.2,setosa
>>>
>>>
>>> result = []
>>>
>>> with open(FILE) as file:
...     header = file.readline().strip().split(',')
...
...     for line in file:
...         *features,label = line.strip().split(',')
...         features = [float(x) for x in features]
...         row = features + [label]
...         pairs = zip(header, row)
...         result.append(dict(pairs))
>>>
>>> result
[{'Sepal length': 5.8, 'Sepal width': 2.7, 'Petal length': 5.1, 'Petal width': 1.9, 'Species': 'virginica'}, {'Sepal length': 5.1, 'Sepal width': 3.5, 'Petal length': 1.4, 'Petal width': 0.2, 'Species': 'setosa'}, {'Sepal length': 5.7, 'Sepal width': 2.8, 'Petal length': 4.1, 'Petal width': 1.3, 'Species': 'versicolor'}, {'Sepal length': 6.3, 'Sepal width': 2.9, 'Petal length': 5.6, 'Petal width': 1.8, 'Species': 'virginica'}, {'Sepal length': 6.4, 'Sepal width': 3.2, 'Petal length': 4.5, 'Petal width': 1.5, 'Species': 'versicolor'}, {'Sepal length': 4.7, 'Sepal width': 3.2, 'Petal length': 1.3, 'Petal width': 0.2, 'Species': 'setosa'}]

10.8.8. StringIO

>>> from io import StringIO
>>>
>>>
>>> DATA = """Sepal length,Sepal width,Petal length,Petal width,Species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... 6.3,2.9,5.6,1.8,virginica
... 6.4,3.2,4.5,1.5,versicolor
... 4.7,3.2,1.3,0.2,setosa
... """
>>>
>>>
>>> with StringIO(DATA) as file:
...     result = file.readline()
...
>>> result
'Sepal length,Sepal width,Petal length,Petal width,Species\n'
>>> from io import StringIO
>>>
>>>
>>> DATA = """Sepal length,Sepal width,Petal length,Petal width,Species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... 6.3,2.9,5.6,1.8,virginica
... 6.4,3.2,4.5,1.5,versicolor
... 4.7,3.2,1.3,0.2,setosa
... """
>>>
>>>
>>> file = StringIO(DATA)
>>>
>>> file.read(50)
'Sepal length,Sepal width,Petal length,Petal width,'
>>> file.seek(0)
0
>>> file.readline()
'Sepal length,Sepal width,Petal length,Petal width,Species\n'
>>> file.close()

10.8.9. Use Case - 0x01

>>> DATA = """A,B,C,red,green,blue
... 1,2,3,0
... 4,5,6,1
... 7,8,9,2"""
>>>
>>> header, *lines = DATA.splitlines()
>>> colors = header.strip().split(',')[3:]
>>> colors = dict(enumerate(colors))
>>> result = []
>>>
>>> for line in lines:
...     line = line.strip().split(',')
...     *numbers, color = map(int, line)
...     line = numbers + [colors.get(color)]
...     result.append(tuple(line))

10.8.10. Assignments

Code 10.15. Solution
"""
* Assignment: File Read Str
* Required: yes
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Read `FILE` to `result: str`
    2. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE` do `result: str`
    2. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `with`
    * `open()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove
    >>> result = open(FILE).read()
    >>> remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is str, \
    'Variable `result` has invalid type, should be str'

    >>> result
    'hello'
"""

FILE = '_temporary.txt'
DATA = 'hello'

with open(FILE, mode='wt') as file:
    file.write(DATA)

# Define `result` with FILE content
# type: str
result = ...

Code 10.16. Solution
"""
* Assignment: File Read Multiline
* Required: yes
* Complexity: easy
* Lines of code: 3 lines
* Time: 3 min

English:
    1. Read `FILE` to `result: list[str]`
    2. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE` do `result: list[str]`
    2. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `with`
    * `open()`
    * `[x for x in data]`
    * `str.strip()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove; remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is str for x in result), \
    'All rows in `result` should be str'

    >>> result
    ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
"""

FILE = '_temporary.txt'
DATA = 'sepal_length\nsepal_width\npetal_length\npetal_width\nspecies\n'

with open(FILE, mode='wt') as file:
    file.write(DATA)

Code 10.17. Solution
"""
* Assignment: File Read CSV
* Required: yes
* Complexity: easy
* Lines of code: 15 lines
* Time: 8 min

English:
    1. Read `FILE`
    2. Separate header from data
    3. Write header (first line) to `header`
    4. Read file and for each line:
        a. Strip whitespaces
        b. Split line by coma `,`
        c. Convert measurements do `tuple[float]`
        d. Append measurements to `features`
        e. Append species name to `labels`
    5. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE`
    2. Odseparuj nagłówek od danych
    3. Zapisz nagłówek (pierwsza linia) do `header`
    4. Zaczytaj plik i dla każdej linii:
        a. Usuń białe znaki z początku i końca linii
        b. Podziel linię po przecinku `,`
        c. Przekonwertuj pomiary do `tuple[float]`
        d. Dodaj pomiary do `features`
        e. Dodaj gatunek do `labels`
    5. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `tuple(float(x) for x in X)`
    * `str.split()`
    * `str.strip()`
    * `with`
    * `open()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove; remove(FILE)

    >>> assert header is not Ellipsis, \
    'Assign your result to variable `header`'
    >>> assert features is not Ellipsis, \
    'Assign your result to variable `features`'
    >>> assert labels is not Ellipsis, \
    'Assign your result to variable `labels`'
    >>> assert type(header) is list, \
    'Variable `header` has invalid type, should be list'
    >>> assert type(features) is list, \
    'Variable `features` has invalid type, should be list'
    >>> assert type(labels) is list, \
    'Variable `labels` has invalid type, should be list'
    >>> assert all(type(x) is str for x in header), \
    'All rows in `header` should be str'
    >>> assert all(type(x) is tuple for x in features), \
    'All rows in `features` should be tuple'
    >>> assert all(type(x) is str for x in labels), \
    'All rows in `labels` should be str'

    >>> header
    ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

    >>> features  # doctest: +NORMALIZE_WHITESPACE
    [(5.4, 3.9, 1.3, 0.4),
     (5.9, 3.0, 5.1, 1.8),
     (6.0, 3.4, 4.5, 1.6),
     (7.3, 2.9, 6.3, 1.8),
     (5.6, 2.5, 3.9, 1.1),
     (5.4, 3.9, 1.3, 0.4)]

    >>> labels
    ['setosa', 'virginica', 'versicolor', 'virginica', 'versicolor', 'setosa']
"""

FILE = '_temporary.csv'

DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.4,3.9,1.3,0.4,setosa
5.9,3.0,5.1,1.8,virginica
6.0,3.4,4.5,1.6,versicolor
7.3,2.9,6.3,1.8,virginica
5.6,2.5,3.9,1.1,versicolor
5.4,3.9,1.3,0.4,setosa
"""

header = []
features = []
labels = []

with open(FILE, mode='w') as file:
    file.write(DATA)

Code 10.18. Solution
"""
* Assignment: File Read CleanFile
* Required: no
* Complexity: medium
* Lines of code: 10 lines
* Time: 8 min

English:
    1. Read `FILE` and for each line:
        a. Remove leading and trailing whitespaces
        b. Split line by whitespace
        c. Separate IP address and hosts names
        d. Append IP address and hosts names to `result`
    2. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE` i dla każdej linii:
        a. Usuń białe znaki na początku i końcu linii
        b. Podziel linię po białych znakach
        c. Odseparuj adres IP i nazwy hostów
        d. Dodaj adres IP i nazwy hostów do `result`
    2. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `str.isspace()`
    * `str.split()`
    * `str.strip()`
    * `with`
    * `open()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove; remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is dict, \
    'Variable `result` has invalid type, should be dict'
    >>> assert all(type(x) is str for x in result.keys()), \
    'All keys in `result` should be str'
    >>> assert all(type(x) is list for x in result.values()), \
    'All values in `result` should be list'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    {'127.0.0.1': ['localhost'],
     '10.13.37.1': ['nasa.gov', 'esa.int'],
     '255.255.255.255': ['broadcasthost'],
     '::1': ['localhost']}
"""

FILE = '_temporary.txt'

DATA = """127.0.0.1       localhost
10.13.37.1      nasa.gov esa.int
255.255.255.255 broadcasthost
::1             localhost
"""

with open(FILE, mode='w') as file:
    file.write(DATA)

# Example {'10.13.37.1': ['nasa.gov', 'esa.int'], ...}
# type: dict[str,list[str]]
result = ...

Code 10.19. Solution
"""
* Assignment: File Read DirtyFile
* Required: no
* Complexity: easy
* Lines of code: 4 lines
* Time: 3 min

English:
    1. Modify code below:
        a. Remove leading and trailing whitespaces
        b. Skip line if it's empty, is whitespace or starts with comment `#`
        c. Split line by whitespace
        d. Separate IP address and hosts names
        e. Append IP address and hosts names to `result`
    2. Run doctests - all must succeed

Polish:
    1. Zmodyfikuj kod poniżej:
        a. Usuń białe znaki na początku i końcu linii
        b. Pomiń linię jeżeli jest pusta, jest białym znakiem
           lub zaczyna się od komentarza `#`
        c. Podziel linię po białych znakach
        d. Odseparuj adres IP i nazwy hostów
        e. Dodaj adres IP i nazwy hostów do `result`
    2. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `with`
    * `open()`
    * `str.strip()`
    * `str.split()` - without an argument
    * `len()`
    * `str.startswith()`
    * `result = True if ... else False`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove; remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is dict, \
    'Variable `result` has invalid type, should be dict'
    >>> assert all(type(x) is str for x in result.keys()), \
    'All keys in `result` should be str'
    >>> assert all(type(x) is list for x in result.values()), \
    'All values in `result` should be list'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    {'127.0.0.1': ['localhost'],
     '10.13.37.1': ['nasa.gov', 'esa.int'],
     '255.255.255.255': ['broadcasthost'],
     '::1': ['localhost']}
"""

FILE = '_temporary.txt'

DATA = """
##
# `/etc/hosts` structure:
#   - IPv4 or IPv6
#   - Hostnames
 ##

127.0.0.1       localhost
10.13.37.1      nasa.gov esa.int
255.255.255.255 broadcasthost
::1             localhost
"""

with open(FILE, mode='w') as file:
    file.write(DATA)


# Example {'10.13.37.1': ['nasa.gov', 'esa.int'], ...}
# type: dict[str,list[str]]
result = {}

with open(FILE) as file:
    for line in file:
        ...


Code 10.20. Solution
"""
* Assignment: File Read List of Dicts
* Required: no
* Complexity: hard
* Lines of code: 19 lines
* Time: 21 min

English:
    1. Read file and for each line:
        a. Skip line if it's empty, is whitespace or starts with comment `#`
        b. Remove leading and trailing whitespaces
        c. Split line by whitespace
        d. Separate IP address and hosts names
        e. Use one line `if` to check whether dot `.` is in the IP address
        f. If is present then protocol is IPv4 otherwise IPv6
        g. Append IP address and hosts names to `result`
    3. Run doctests - all must succeed

Polish:
    1. Przeczytaj plik i dla każdej linii:
        a. Pomiń linię jeżeli jest pusta, jest białym znakiem
           lub zaczyna się od komentarza `#`
        b. Usuń białe znaki na początku i końcu linii
        c. Podziel linię po białych znakach
        d. Odseparuj adres IP i nazwy hostów
        e. Wykorzystaj jednolinikowego `if` do sprawdzenia czy jest
           kropka `.` w adresie IP
        f. Jeżeli jest obecna to protokół  jest IPv4,
           w przeciwnym przypadku IPv6
        g. Dodaj adres IP i nazwy hostów do `result`
    3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `with`
    * `open()`
    * `str.strip()`
    * `str.split()` - without an argument
    * `len()`
    * `str.startswith()`
    * `result = True if ... else False`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove; remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is dict for x in result), \
    'All keys in `result` should be dict'
    >>> assert [x['ip'] for x in result].count('127.0.0.1') == 1, \
    'You did not merge hostnames for the same ip (127.0.0.1)'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [{'ip': '127.0.0.1', 'hostnames': ['localhost', 'astromatt'], 'protocol': 'IPv4'},
     {'ip': '10.13.37.1', 'hostnames': ['nasa.gov', 'esa.int'], 'protocol': 'IPv4'},
     {'ip': '255.255.255.255', 'hostnames': ['broadcasthost'], 'protocol': 'IPv4'},
     {'ip': '::1', 'hostnames': ['localhost'], 'protocol': 'IPv6'}]
"""

FILE = '_temporary.txt'

DATA = """
##
# `/etc/hosts` structure:
#   - IPv4 or IPv6
#   - Hostnames
 ##

127.0.0.1       localhost
127.0.0.1       astromatt
10.13.37.1      nasa.gov esa.int
255.255.255.255 broadcasthost
::1             localhost
"""

with open(FILE, mode='w') as file:
    file.write(DATA)

# Example [{'ip': '127.0.0.1', 'hostnames': ['localhost', 'astromatt'], 'protocol': 'IPv4'}, ...]
# type: list[dict]
result = []

with open(FILE) as file:
    for line in file:
        line = line.strip()
        if len(line) == 0:
            continue
        if line.startswith('#'):
            continue
        ip, *hosts = line.split()



Code 10.21. Solution
"""
* Assignment: File Read Passwd
* Required: no
* Complexity: hard
* Lines of code: 100 lines
* Time: 55 min

English:
    1. Save listings content to files:
        a. `etc_passwd.txt`
        b. `etc_shadow.txt`
        c. `etc_group.txt`
    2. Copy also comments and empty lines
    3. Parse files and convert it to `result: list[dict]`
    4. Return list of users with `UID` greater than 1000
    5. User dict should contains data collected from all files
    6. Run doctests - all must succeed

Polish:
    1. Zapisz treści listingów do plików:
        a. `etc_passwd.txt`
        b. `etc_shadow.txt`
        c. `etc_group.txt`
    2. Skopiuj również komentarze i puste linie
    3. Sparsuj plik i przedstaw go w formacie `result: list[dict]`
    4. Zwróć listę użytkowników, których `UID` jest większy niż 1000
    5. Dict użytkownika powinien zawierać dane z wszystkich plików
    6. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `from datetime import date`
    * `date.fromtimestamp(timestamp: int)`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [{'username': 'watney',
      'uid': 1000,
      'gid': 1000,
      'home': '/home/watney',
      'shell': '/bin/bash',
      'algorithm': None,
      'password': None,
      'groups': ['astronauts', 'mars'],
      'last_changed': datetime.date(2015, 4, 25),
      'locked': True},
     {'username': 'lewis',
      'uid': 1001,
      'gid': 1001,
      'home': '/home/lewis',
      'shell': '/bin/bash',
      'algorithm': 'SHA-512',
      'password': 'tgfvvFWJJ5FKmoXiP5rXWOjwoEBOEoAuBi3EphRbJqqjWYvhEM2wa67L9XgQ7W591FxUNklkDIQsk4kijuhE50',
      'groups': ['astronauts', 'sysadmin', 'moon'],
      'last_changed': datetime.date(2015, 7, 16),
      'locked': False},
     {'username': 'martinez',
      'uid': 1002,
      'gid': 1002,
      'home': '/home/martinez',
      'shell': '/bin/bash',
      'algorithm': 'MD5',
      'password': 'SWlkjRWexrXYgc98F.',
      'groups': ['astronauts', 'sysadmin'],
      'last_changed': datetime.date(2005, 2, 11),
      'locked': False}]

      >>> from os import remove
      >>> remove(FILE_GROUP)
      >>> remove(FILE_SHADOW)
      >>> remove(FILE_PASSWD)
"""

from datetime import date
from os.path import dirname, join


BASE_DIR = dirname(__file__)
FILE_GROUP = join(BASE_DIR, 'etc-group.txt')
FILE_SHADOW = join(BASE_DIR, 'etc-shadow.txt')
FILE_PASSWD = join(BASE_DIR, 'etc-passwd.txt')


CONTENT_GROUP = """##
# `/etc/group` structure
#   - Group Name: from `/etc/passwd`
#   - Group Password: `x` indicates that shadow passwords are used)
#   - GID: Group ID
#   - Members: usernames from `/etc/passwd`
##

root::0:root
other::1:
bin::2:root,bin,daemon
sys::3:root,bin,sys,adm
adm::4:root,adm,daemon
mail::6:root
astronauts::10:watney,lewis,martinez
daemon::12:root,daemon
sysadmin::14:martinez,lewis
mars::1000:watney
moon::1001:lewis
nobody::60001:
noaccess::60002:
nogroup::65534:"""


CONTENT_PASSWD = """##
# `/etc/passwd` structure:
#   - Username
#   - Password: `x` indicates that shadow passwords are used
#   - UID: User ID number
#   - GID: User's group ID number
#   - GECOS: Full name of the user
#   - Home directory
#   - Login shell
##

root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
nobody:x:99:99:Nobody:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
watney:x:1000:1000:Mark Watney:/home/watney:/bin/bash
lewis:x:1001:1001:Melissa Lewis:/home/lewis:/bin/bash
martinez:x:1002:1002:Rick Martinez:/home/martinez:/bin/bash"""


CONTENT_SHADOW = """##
# `/etc/shadow` structure
#   - Username: from `/etc/passwd`
#   - Password
#   - Last Password Change: Days since 1970-01-01
#   - Minimum days between password changes: 0 - changed at any time
#   - Password validity: Days after which password must be changed, 99999 - many, many years
#   - Warning threshold: Days to warn user of an expiring password, 7 - full week
#   - Account inactive: Days after password expires and account is disabled
#   - Time since account is disabled: Days since 1970-01-01
#   - A reserved field for possible future use
#
# Password field (split by `$`):
#   - algorithm
#   - salt
#   - password hash
#
# Password algorithms:
#   - `1` - MD5
#   - `2a` - Blowfish
#   - `2y` - Blowfish
#   - `5` - SHA-256
#   - `6` - SHA-512
#
# Password special chars:
#   - ` ` (blank entry) - password is not required to log in
#   - `*` (asterisk) - account is disabled, cannot be unlocked, no password has ever been set
#   - `!` (exclamation mark) - account is locked, can be unlocked, no password has ever been set
#   - `!<password_hash>` - account is locked, can be unlocked, but password is set
#   - `!!` (two exclamation marks) - account created, waiting for initial password to be set by admin
##

root:$6$Ke02nYgo.9v0SF4p$hjztYvo/M4buqO4oBX8KZTftjCn6fE4cV5o/I95QPekeQpITwFTRbDUBYBLIUx2mhorQoj9bLN8v.w6btE9xy1:16431:0:99999:7:::
adm:$6$5H0QpwprRiJQR19Y$bXGOh7dIfOWpUb/Tuqr7yQVCqL3UkrJns9.7msfvMg4ZO/PsFC5Tbt32PXAw9qRFEBs1254aLimFeNM8YsYOv.:16431:0:99999:7:::
watney:!!:16550::::::
lewis:$6$P9zn0KwR$tgfvvFWJJ5FKmoXiP5rXWOjwoEBOEoAuBi3EphRbJqqjWYvhEM2wa67L9XgQ7W591FxUNklkDIQsk4kijuhE50:16632:0:99999:7:::
martinez:$1$.QKDPc5E$SWlkjRWexrXYgc98F.:12825:0:90:5:30:13096:"""

with open(FILE_GROUP, mode='w') as file:
    file.write(CONTENT_GROUP)

with open(FILE_PASSWD, mode='w') as file:
    file.write(CONTENT_PASSWD)

with open(FILE_SHADOW, mode='w') as file:
    file.write(CONTENT_SHADOW)

SECOND = 1
MINUTE = 60 * SECOND
HOUR = 60 * MINUTE
DAY = 24 * HOUR

ALGORITHMS = {
    '1': 'MD5',
    '2a': 'Blowfish',
    '2y': 'Blowfish',
    '5': 'SHA-256',
    '6': 'SHA-512',
}

# Joined data from all files for users with `UID` greater than 1000
# type: list[dict]
result = ...