Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / Python

Shallow Copy vs. Deep Copy in Python 3

5.00/5 (7 votes)
24 Nov 2019CPOL4 min read 10.3K   174  
This article describes the difference between shallow and deep copy using Python 3.

Introduction

In this article, we will discuss the difference between shallow and deep copy in Python.

Simple Copying

Let's look at an example:

Python
lst_first  = ['Yesterday', ['we', 'used'], 'the', ['Python', '2']] 
lst_second = lst_first
lst_third  = lst_first
lst_fourth = lst_first
          
print("\nOriginal list and three copies of it:")
print("lst_first  =", lst_first) 
print("lst_second =", lst_second) 
print("lst_third  =", lst_third) 
print("lst_fourth =", lst_fourth) 
                    
print("\nLet's change the first word to "Today" and add the word "also" 
       ONLY at the end of the 4th list...")
lst_fourth [0] = 'Today' # replace the word 'Yesterday' with 'Today' 
lst_fourth.append('too') # add the word 'too'
          
print("\nLet's see the contents of the lists again:")
print("lst_first  =", lst_first) 
print("lst_second =", lst_second) 
print("lst_third  =", lst_third) 
print("lst_fourth =", lst_fourth)

Let's see the output of the above code:

Simple copying

Wow .... We only changed the values in the ‘lst_fourth’ list, but the values in ALL lists changed.
Why?!

In the first line of the script, we create the lst_first list, and then it initializes the other three lists.
After assigning lst_first to the other three lists, all four lists refer to the same memory area.
Later modifications to the content of either are immediately reflected on the contents of ALL other lists.

Let's use the id() built-in Python's function to get the "identity" (unique integer) of each of the list objects:

Python
print()
print("Let's show "identities" (the addresses) of the objects in memory:")
print("id(lst_first)  = ",  id(lst_first))
print("id(lst_second) = ",  id(lst_second))
print("id(lst_third)  = ",  id(lst_third))
print("id(lst_fourth) = ",  id(lst_fourth))

The result for the above example would be something like this:

Simple copying

We can see that the value returned by the id() function for DIFFERENT OBJECTS is the SAME!
This confirms the above, that ALL four objects of list refer to the SAME address space!
Schematically, it looks like this:

Simple copying

It is clear that for the actual copying of lists, we need to use a different way.
Let's try another approach to create independent lists...

Shallow Copy

The following example illustrates 3 different ways to create a shallow copy:

Python
import copy

lst_first  = ['Yesterday', ['we', 'used'], 'the', ['Python', '2']] 
lst_second = copy.copy(lst_first) # make a shallow copy by using copy module 
lst_third  = list(lst_first)  # make a shallow copy by using the factory function
lst_fourth = lst_first[:] # make a shallow copy by using the slice operator 
          
print("\nOriginal list and three copies of it:")
print("lst_first  =", lst_first) 
print("lst_second =", lst_second) 
print("lst_third  =", lst_third) 
print("lst_fourth =", lst_fourth) 
                    
print()
print("Let's show "identities" (the addresses) of the objects in memory:")
print("id(lst_first)  = ",  id(lst_first))
print("id(lst_second) = ",  id(lst_second))
print("id(lst_third)  = ",  id(lst_third))
print("id(lst_fourth) = ",  id(lst_fourth))

Now let's look at the result of the program:

Simple copying

OK. As we can see, all lists (lst_second, lst_third and lst_fourth) store the same values as the original list (lst_first).
However, the obtained values of the id() function for each of the four lists show DIFFERENT values.
This means that this time, each list's object has its own, independent address space.

Note

  1. To copy to lst_second, the 'copy' function of the 'copy module' provides generic shallow copy operation.
  2. The list() takes sequence of lst_first object and converts it to lst_third list. The list() constructor returns a mutable sequence list of elements.
  3. Shallow copies of lists can be made by assigning a slice of the entire list, for example, lst_fourth = lst_first[:]. Here, the lst_first list return a new list containing the requested elements. This means that the above slice operator returns a copy of the list, where all elements will be duplicated into lst_fourth list (i.e., to another area in memory).

Great!
Now everything works as we expected. All three methods created independent address spaces.
Now let's modify the elements of the above lists, adding more specific information to the created lists:

Python
lst_first [0] = 'In 2014'
lst_first.append(".7.9")

lst_second[0] = 'In 2015'
lst_second.append(".7.10")

lst_third [0] = 'In 2016'
lst_third.append(".7.13")

lst_fourth [0] = 'In 2017'
lst_fourth.append(".7.14")

print("\nLists after modification:")
print("lst_first  =", lst_first) 
print("lst_second =", lst_second) 
print("lst_third  =", lst_third) 
print("lst_fourth =", lst_fourth) 

And run the code:

Shallow copying

Well, it worked really well, didn't it?
As expected, each list contains its own historical data on the use of Python releases.

But for now, however, let's inform the user that a new release, Python 3.8, was released on October 14th, 2019.
For this purpose, we will change the content of only ONE of the lists.

Python
lst_fourth [0] = 'From October 14th, 2019'
lst_fourth [1][1] = 'are using'
lst_fourth [2] = 'a new release is'
lst_fourth [3][1] = '3.8'
lst_fourth.remove('.7.14') #remove the version extension, because we don't need it anymore 

print("\nLet's see ALL the lists again, after modifying the 'lst_fourth' list:")
print("lst_first  =", lst_first) 
print("lst_second =", lst_second) 
print("lst_third  =", lst_third) 
print("lst_fourth =", lst_fourth) 

Now let's run the script again to see what the last 4 lines print:

Shallow copying

Ops..... This is not at all what we expected to see!
We expected to see changes only in ‘lst_fourth’, but all lists changed!!!
Why were the values added in all the lists?

The 'lst_first' list contained three elements and two pointers to other (internal) lists, which look like this: ['we', 'used'] and ['Python', '2']. When we did a shallow copy of lst_first to create the lists, then only simple string elements were duplicated, but NOT compound objects (our lists). A shallow copy constructed a new list and then inserted simple strings objects and references into it to the compound objects found in the original. So, that all these references pointing to SAME address space (i.e., pointers were cloned, and NOT the address spaces to which these pointers referred) As a result, 'lst_first', 'lst_second', 'lst_third' and 'lst_fourth' lists are stored in separate address spaces, but at the same time, pointers to internal lists refer to ONE address. The problem arises whenever someone modifies "their" internal lists, because this change will affect the contents of ALL internal lists.

Schematically, it looks like this:

Shallow copy

Deep Copy

Clearly, we need a different approach when creating object clones to force Python to make a full copy of ALL the list's elements (simple and compound objects) that are contained in the list and its sublists or objects?
The cure for the problems of copying ALL list items is to make a deep copy!
That is, rather than just copy the address of the compound objects, we need to force Python to make a full copy of ALL the list's elements (simple and compound objects).That way, each object gets its address space rather than refer to one common object.
Let's see how deep copying is performed by using the deepcopy () function:

Python
import copy

lst_first  = ['Yesterday', ['we', 'used'], 'the', ['Python', '2']] 
lst_second = copy.deepcopy(lst_first)  
lst_third  = copy.deepcopy(lst_first)  
lst_fourth = copy.deepcopy(lst_first)   
          
print("\nOriginal list and three copies of it:")
print("lst_first  =", lst_first) 
print("lst_second =", lst_second) 
print("lst_third  =", lst_third) 
print("lst_fourth =", lst_fourth) 
                    
print()
print("Let's show "identities" (the addresses) of the objects in memory:")
print("id(lst_first)  = ",  id(lst_first))
print("id(lst_second) = ",  id(lst_second))
print("id(lst_third)  = ",  id(lst_third))
print("id(lst_fourth) = ",  id(lst_fourth))

lst_fourth [0] = 'From October 14th, 2019'
lst_fourth [1][1] = 'are using'
lst_fourth [2] = 'a new release is'
lst_fourth [3][1] = '3.8'
lst_fourth.remove('.7.14') #remove the version extension, because we don't need it anymore 

print("\nLet's see ALL the lists again, after modifying the 'lst_fourth' list:")
print("lst_first  =", lst_first) 
print("lst_second =", lst_second) 
print("lst_third  =", lst_third) 
print("lst_fourth =", lst_fourth) 

Let's look at the output of the above code:

Deep copying

Schematically, it looks like this:

Simple copying

Summary

Instead of a summary, I will provide what is written in the Python documentation here:

“The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.”

History

  • 23rd November, 2019: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)