Handling the fear of Asynchronous file handling in Python
File handling in Python with aiofiles and asyncio
Introduction
Let's face it, the clock speed of our CPU doesn't really affect the performance of simple programs highly in the modern age. Things like concurrency
, multiprocessing
or multithreading
are the leading factors affecting the performance of those programs. Luckily with most of the current programming languages, it's really easy to implement those features. In this article, we're going to focus more on asynchronous
(concurrency) programming, and more specifically the overlooked part in asynchronous programming; file handling
.
Click here to learn more about multiprocessing
or multithreading
What is Asynchronous Programming?
Asynchronous code just means that the language has a way to tell the computer/program that at some point in the code, it will have to wait for something else to finish somewhere else. Let's say that something else is called "slow-file". So, during that time, the computer can go and do some other work, while "slow-file" finishes.
In async, every function has a protocol of saying "I will take some time doing this please continue running other things'"
We can think of asynchronous programming in terms of hotels and waiters. Waiters come to you and give you a menu then they go away. They periodically come to you to check if you have completed picking the food, but while you pick the food, they go to other tables and take other orders.
In the same way, your async program also doesn't stop doing other tasks while waiting for a function to complete, it jumps back and forth from one task to another while waiting for the tasks to complete.
Why would you want to use async code?
One of the main uses of asynchronous codes is while I/O (hence the name asycnio
). It takes around 126.95 seconds to retrieve data 500 times from an API in sync code. Whereas it takes 0.4880 seconds to retrieve data 500 times from an API in async code (For the same API). It's even faster and more efficient than threading
and multiprocessing
in the case of I/O because spawning threads
and processes
takes more time and resources than adding another coroutine in the event loop
.
Click here for further reading.
Async Code Examples
Awaiting a function
await some_function()
NOTE: You can only await inside of async functions
Creating an Async function
async def some_function():
...
Running async function in root python
import asyncio
async def some_function():
...
if __name__ == '__main__':
asyncio.run(some_function())
Key Points
- Even though
multiprocessing
,multithreading
andconcurrency
may seem similar, they are different. You can check out the link below to learn more. - async functions are also commonly referred to as
coroutines
. - asynchronous programming is also known as
concurrency
.
Click here for further reading.
Credits: Nirjal Paudel, Mahesh C. Regmi
What is File Handling?
Simply put, file handling means reading
and writing
in the file through code (or specifically python
in our case.) Through python, we can operate on the files. The concept of file handling is slightly different over various programming languages. In python's case, we can handle file handling with the help of the built-in function open()
.
We can open files in python in various modes such as:
- Read (r)
- Write (w)
- Append (a)
NOTE: There are a lot of other modes, you can learn about them here.
File Handling Code Examples
Assume we have an empty file 'random.txt':
Writing Content
# Opening the file 'random.txt' in write mode
with open('random.txt', 'w') as f:
# Writing the contents of the file
for number in range(1, 4):
f.write(f"hello {number}\n")
If you check 'random.txt' you'll see these contents there:
hello 1
hello 2
hello 3
Reading content
# Opening the file 'random.txt' in read mode
with open('random.txt', 'r') as file:
# Reading the contents of the file
content = file.read()
# Printing the contents
print(content)
OUTPUT:
hello 1
hello 2
hello 3
Click here for further reading.
You may be confused by the use of with
while opening the file. with
shows the usage of context manager
in python.
What are Context Managers?
Usage of resources like file operations or database connections is very common in programming. But those resources are limited in supply. If we don't release these resources after using them, it may lead to resource leaking. This is the perfect use case for context managers, which automatically set up and teardown resources.
Click here for further reading.
Asynchronous file handing in python
The default file handling in python is synchronous
hence blocking. It stops the execution of the whole program while doing any operations related to the file. The bigger the file, the more resource and time intensive it is, which may severely degrade the performance of the whole program. This is where asynchronous
file handling comes into play.
We're going to use asyncio
as the asynchronous runtime provider in python, and aiofiles
python module for asynchronous file handling.
NOTE: You need to install
aiofiles
usingpip install aiofiles
for running the code examples below.
Using aiofiles
Asynchronously reading file contents
# Importing required modules
import asyncio
import aiofiles
# Defining the entrypoint of our asynchronous program
async def main():
# Opening random.txt with aiofiles in read mode
async with aiofiles.open('random.txt') as f:
# Asynchronously reading the files contents
contents = await f.read()
# Printing the file contents
print(contents)
# Explanation: https://www.freecodecamp.org/news/if-name-main-python-example/
if __name__ == '__main__':
# Running the async function main() using asyncio
asyncio.run(main())
NOTE: You might have noticed that I haven't passed
r
as the mode. That is becauser
(read-mode) is the default mode, so we don't have to explicitly pass it every time.
It's going to have the same output as the file handling example above
Asynchronously iterating through lines of the file
# Opening 'random.txt' in read mode
async with aiofiles.open('random.txt') as f:
# Asynchronously iterating through the lines
async for line in f:
# Printing each line in the file with seperator of ', '
print(f"New Line: '{line}'", sep=', ')
NOTE: I have removed the boilerplate code of imports, defining the main function and running of the function, if you want to run the code examples, you'll have to add that yourself.
OUTPUT:
New Line: 'hello 1', New Line: 'hello 2', New Line: 'hello 3'
Asynchronously writing to multiple files
# Importing modules
import asyncio
import aiofiles
# Creating a function that writes to files asynchronously
async def write_to_file(filename: str, content: str) -> None:
async with aiofiles.open(filename, "w+") as f:
await f.write(content)
# Creating an entry point for our async program
async def main():
# Creating an empty list of tasks
tasks = []
# Running a loop for 100 times
for number in range(100):
# Appending a coroutine of write_to_file function
# to the list of tasks
tasks.append(
asyncio.ensure_future(
write_to_file(f"{number}.txt", f"File Number: {number}")
)
)
# Running all the tasks using asyncio.gather
await asyncio.gather(*tasks)
# Running the main() function
if __name__ == "__main__":
asyncio.run(main())
What is asyncio.ensure_future
?
The function ensure_future
lets us execute a coroutine
(async function) in the background, without explicitly waiting for it to finish. If we need, we can wait for it later or poll for the result. In other words, this is a way of executing code in asyncio without await. Click here to learn more.
What is asyncio.gather
?
The function gather
lets you fire off a bunch of coroutines simultaneously. In other words, it returns the results of each coroutine (async function) passed The return value is a list of responses from each coroutine. Click here to learn more.
Task for the reader
- The reader can write the synchronous version of the above code and find out how the time difference between the 2 versions of code.
- The reader can write a program to read those 100 files you just created and keep the contents of every file in a list.
Resources for Learning More
- Aynchronous Programming in Python
- Asynchronous file handling in Python
- Python asyncio Documentation
- Awesome Python async modules
Conclusion
File handling is often overlooked while doing asynchronous programming which leads to unintended slow ups as file handling is very resource and time-intensive for big files. Using aiofiles
for file handling drastically decreases the execution time of the program due to its asynchronous nature. It does not have to block the whole program while reading a file, hence other files can be read/ write and modified in the meantime, so it's a lot faster than the default way of file handling in python.