Vectorization in Python can be used to replace loops

The for-loop is commonly used to execute code repeatedly. When we need to utilize nested loops a lot, our code can get a little slower. If you’re thinking about iterating through a large dataset with a for loop in deep learning, it could take a long time. As a result, the for-loop can be a big bottleneck at times, causing a program to run for a long period.

Now the question is, what may be a more convenient replacement? The answer is vectorization, which we can verify by comparing them to real-world cases.

Things that will be covered

  1. What is Vectorization?
  2. for-loop vs vectorization
  3. Why NumPy(vectorization process) is faster than loop ?
  4. details of functions/mudules used in the code

What is Vectorization?

Vectorization is a method to make code efficient without for-loops. In deep learning, code gets much faster if we vectorized our code. And to achieve that goal, we use functions defined by various modules. We perform operations with these functions to reduce the execution and runtime.

We will create two variables with each having a one-dimensional array.

import time
import numpy as np

#randomly generating 1000000 digits and storing them in one dimensional array array1
array1=np.random.rand(1000000) 
#randomly generating 1000000 digits and storing them in one dimensional array array2
array2=np.random.rand(1000000)

Now with array1 and array2’s values, we will create a third array called array3 which will be a 2-dimensional array. The third array will store the product of array1 and array2.

array3=0
tic=time.time()           #counting initial time
for i in range(1000000):
 array3+=array1[i]*array2[i]
toc = time.time()          #counting final time
print("array3: "+str(array3))
#for_loop execution time=final time -initial time
print("\nFor loop exec time :"+str(1000*(toc-tic))+" ms")

OUTPUT

array3: 250177.2599928903  

For loop exec time :650.6094932556152 ms

Vectorization Example:

Let’s try the same example from before by replacing the loop and checking the execution time.

array3=0
tic=time.time() #counting initial time
#replacing for-loop with np.dot 
array3=np.dot(array1,array2)
toc = time.time() #counting final time
print("array3: "+str(array3))
#vectorized execution time=final time -initial time
print("\nVectorized exec time :"+str(1000*(toc-tic))+" ms")

OUTPUT

array3: 250177.25999289122

Vectorized exec time :4.674673080444336 ms

Why NumPy(vectorization process) is faster than loop ?

NumPy uses the parallelism technique to work faster with arrays. So it can run operations on them at once. Also, NumPy arrays are homogeneous and only contain objects of one type. In addition, vectorized operations in NumPy are optimized with C.

On the other hand, for-loop can’t work parallelly instead it iterates every element one by one. In Python, there is no way to predict the datatype beforehand in a for loop as stored data can be heterogeneous if it’s a list.

So every time iterating, the interpreter has to figure out the datatype. As a result, the loop is slower in Python.

Functions and libraries used

Special thanks to Nasifa Alam

Exit mobile version