Ever wondered why some Python classes call methods out of no-where? Or implement some methods just to pass?
If you have ever encountered a Scikit-Learn custom transformer, you are very likely very well familiar with this phenomenon. Well if that's the case this article is for you. We will dive into the concept called Polymorphism that enables such behavior, and we will build some custom classes to get some hands on experience.
Scikit-learn transformers are a great set of tools to set up a pipeline to prepare data for production model. Though the built-in transformer list is pretty exhaustive, building your custom transformer is a great way to automate custom feature transformation and experimentation. If you have ever worked with the scikit-learn transformer, you have very likely encountered the commonly used pattern of
# defining a custom transformer
class CustomTransformer(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
......
# calling fit_transform()
customTransformer = CustomTransformer()
data = customTransformer.fit_transform(data)
But if you come from a non-programming background, it may seem a bit puzzling that the fit_transform()
method wasn't defined in the CustomTransformer
class yet it was callable from that class. Moreover, if you have figured out that the method came from one of the classes above, yet it may seem puzzling to think how it can use methods that were not defined inside the same class where it belongs. For example, checkout the script of TransformerMixin
class in the official GitHub repo here, you won't find any methods defined for fit()
or transform()
inside the TransformerMixin
class.
In this article we will try to understand the concepts : Polymorphism, and Duck typing, that enable these behaviors. We will also do some hands on exercises to deepen our understanding.
What is Polymorphism?¶
In general terms Polymorphism means the ability of an object to take different forms. A concrete example would be when objects of different classes would contain the same methods but exhibit different behavior.
For example, in Python we can run operations like 1+2
or 'a' + 'b'
and get results of 3
and ab
respectively. Behind the scene Python calls a magic method called __add__()
that's already implemented in the string and integer classes. For detail of how Python converts these core syntax into special methods check out my last post on Python Core Syntax and the Magic Behind Them!.
This magic method - __add__()
is an example of polymorphism - where it's the same method but depending on from which class object it's called, it adjusts its behavior from summing up numbers versus concatenating strings.
In Python class context we can achieve polymorphism in two ways: Inheritance, and Duck Typing.
Polymorphism Through Inheritance¶
Inheritance in the Object Oriented Programming context, means when we inherit class properties from another class. The class that we inherit from is called Superclass and where we inherit the properties are called Subclass. Since the focus of this write up is not inheritance, let's jump into a quick examples and hopefully they'll make sense as we go. But if they don't or you need a quick refresher feel free to read my previous post on Object Oriented Programming in Python — Inheritance and Subclass.
For our example, We will create a superclass called InheritList
and three subclasses: DefaultList
, EvenList
and OddList
to run examples of inheritance, and polymorphism.
# A custom class to create a list from user input and modify the list
class InheritList:
def __init__(self):
self.list = []
def add_value(self, val):
self.list.append(val)
def remove_value(self):
rv = self.list[-1]
del self.list[-1]
return rv
def get_list(self):
return self.list
def do_all(self, val):
self.add_value(val)
self.remove_value()
return self.get_list()
# Example subclass 01: leaves superclass intact
class DefaultList(InheritList):
def __init__(self):
super().__init__()
pass
# Example subclass 02: removes odd numbers from list
class EvenList(InheritList):
def __init__(self):
super().__init__()
def remove_value(self):
self.list = [x for x, y in zip(self.list, [i%2 == 0 for i in self.list]) if y == True]
return self.list
# Example subclass 03: removes even numbers from list
class OddList(InheritList):
def __init__(self):
super().__init__()
def remove_value(self):
self.list = [x for x, y in zip(self.list, [i%2 != 0 for i in self.list]) if y == True]
return self.list
Inheritance¶
In the above code block, notice we didn't implement any methods inside the DefaultList
class. And notice in the following code block that how yet we could call the methods (e.g. add_value()
, get_list()
) from the instance created from the class. Because DefaultList
subclass inherited these methods from its superclass - InheritList
. This is inheritance at play.
nums = [1, 2, 3, 4, 5]
defaultNumList = DefaultList()
[defaultNumList.add_value(i) for i in nums]
print(f"List with all added values: {defaultNumList.get_list()}")
# removes the last item from the list
defaultNumList.remove_value()
print(f"List after removing the last item: {defaultNumList.get_list()}")
List with all added values: [1, 2, 3, 4, 5] List after removing the last item: [1, 2, 3, 4]
The above example shows the basic inheritance - we get all the properties from the superclass and use them as they are. But we could change or update the methods that were inherited inside the subclass like we did in the other two subclasses - EvenList
, and OddList
.
Method Overriding¶
In EvenList
, and OddList
classes we modified the remve_value()
method so that EvenList
class would remove all the odd values and OddList
would remove all the even values from the built list. By doing so we will introduce polymorphism - where remove_value()
would behave differently in two cases.
evenNumList = EvenList()
oddNumList = OddList()
nums = [1, 2, 3, 4, 5]
[evenNumList.add_value(i) for i in nums]
[oddNumList.add_value(i) for i in nums]
print(f"evenNumList with all the values: {evenNumList.get_list()}")
print(f"evenNumList after applying remove_value(): {evenNumList.remove_value()}")
print(f"\noddNumList with all the values: {oddNumList.get_list()}")
print(f"oddNumList after applying remove_value(): {oddNumList.remove_value()}")
evenNumList with all the values: [1, 2, 3, 4, 5] evenNumList after applying remove_value(): [2, 4] oddNumList with all the values: [1, 2, 3, 4, 5] oddNumList after applying remove_value(): [1, 3, 5]
Polymorphism Through Duck-Typing¶
Before going detail into Duck typing, let's talk about another method do_all()
that was implemented in the superclass - InheritList
. Which takes a value as an input, adds it to the list, remove the unwanted values from the list, and return the final list. o accomplish all these tasks, it depends on other internal methods: add_value()
, remove_value()
, and get_list()
. Check out the demo below.
print(f"evenNumList after calling do_call(58): {evenNumList.do_all(58)}")
print(f"oddNumList after calling do_call(58): {oddNumList.do_all(55)}")
evenNumList after calling do_call(58): [2, 4, 58, 58] oddNumList after calling do_call(58): [1, 3, 5, 55, 55]
But Python allows us to be more flexible to implement this. We could've removed the remove_value()
method entirely from the superclass, create a separate class with the combine_all()
method, and yet be able to use it without any problem. All thanks to Duck Typing!
Basically, we don't care if the dependency properties come from the same class or not. We are good as long as the dependency properties are available. Which basically reflects the widely used quote to represent duck typing:
"If it walks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck."
To demonstrate, let's create a new class called ComboFunc
with only one method - combine_all()
which will perform the same functionality as do_all()
method. Also, let's create a new subclass that will have one of the previously created subclass - EvenList
and this new class as superclasses.
# Superclass to demo duck typing
class ComboFunc:
def combine_all(self, val):
self.add_value(val)
self.remove_value()
return self.get_list()
class GenDuckList(EvenList, ComboFunc):
pass
Notice we didn't define any of the dependency methods (add_value()
,remove_value()
and get_list()
) inside neither of the classes. And yet we will be able to successfully call the combine_all()
method from an instance of GenDuckList
class. Because the dependency methods will be inherited from the EvenList
class and combine_all()
method doesn't care about where they come from, as long as they exist.
nums = [1, 2, 3, 4, 5]
genDuck = GenDuckList()
[genDuck.add_value(i) for i in nums]
print(f"Initial list: {genDuck.get_list()}")
genDuck.combine_all(45)
genDuck.combine_all(40)
print(f"Final list: {genDuck.get_list()}")
Initial list: [1, 2, 3, 4, 5] Final list: [2, 4, 40]
Notice that we could accomplish the above task in these other ways too,
We could also totally avoid inheriting anything from the
EvenList
class and implement the dependency methods inside the class if we needed something customized. Or,We could leave it as a superclass and yet override any specific dependency methods to make it more customized. Overall, polymorphism let us become more flexible and re-use already implemented methods easily. Or,
We we could remove
remove_value()
from the superclass and implement it inside ourGenDuckList
class and yet be able to perform the same tasks.
So to complete the circle, when we build a custom transformer in scikit-learn using BaseEstimator
, and TransformerMixin
classes, we basically apply duck typing to achieve polymorphism. To relate, you can think of the GenDuckList
as a dummy custom transformer class, ComboFunc
as a dummy TransformerMixin
class, and EvenList
as a dummy BaseEstimator
class. The only difference between the duck typing example and the transformer example is we inherited the remove_value()
method from a superclass but in custom transformer we define it inside the custom class - the 3rd alternaive way noted above.