Thursday, September 21, 2006

Duplication smells

Duplication. A programmer's worst nightmare. Does that sound over the top? Maybe it does, but I guess most programmers would agree that duplication is one of the worst code smells around. So, a programmer should do everything to prevent and reduce duplication, right? Well, I think that in the case of simple duplication we all agree on that. With simple I mean that two pieces of code are textually equal or very similar, e.g.:

patterns.Publisher().registerObserver(self.onMatchAllChanged,
eventType='view.taskcategoryfiltermatchall')
patterns.Publisher().registerObserver(self.onFilteredCategoriesChanged,
eventType='view.taskcategoryfilterlist')
patterns.Publisher().registerObserver(self.onAddCategoryToTask,
eventType='task.category.add')
patterns.Publisher().registerObserver(self.onRemoveCategoryFromTask,
eventType='task.category.remove')


This is actual code from Task Coach (blush) and represents a component registering itself as observer for different event types. Of course, this type of duplication is rather easy to fix, e.g.:

callbacks = {'view.taskcategoryfiltermatchall': self.onMatchAllChanged, ...}
publisher = patterns.Publisher()
for eventType, callback in callbacks.items():
publisher.registerObserver(callback, eventType)

But then there's also duplication that's not strictly textual, but more structural. For example, two for-loops looping over the same datastructure but with a different body. An example from Task Coach again. These two event handlers update a check-listbox (a list with items that can be checked and unchecked by the user) when categories are added to or removed from a task:

def onAddCategoryToTask(self, event):
for category in event.values():
if self._checkListBox.FindString(category) == wx.NOT_FOUND:
self._checkListBox.Append(category)
self.Enable(len(self.__taskList.categories()) > 0)

def onRemoveCategoryFromTask(self, event):
for category in event.values():
if category not in self.__taskList.categories():
self._checkListBox.Delete(self._checkListBox.FindString(category))
self.Enable(len(self.__taskList.categories()) > 0)


Removing this type of duplication is more tricky. Refactoring to something like the following should work (untested):

def updateCategoriesListbox(self, categories, updateNeeded, updateListbox):
for category in categories:
if updateNeeded(category):
updateListbox(category)
self.Enable(len(self.__taskList.categories()) > 0)

def onAddCategoryToTask(self, event):
def updateNeeded(category):
return self._checkListBox.FindString(category) == wx.NOT_FOUND

def updateListbox(category):
self._checkListBox.Append(category)

self.updateCategoriesListbox(event.values(), updateNeeded, updateListbox)

def onRemoveCategoryFromTask(self, event):
def updateNeeded(category):
return category not in self.__taskList.categories()

def updateListbox(category):
self._checkListBox.Delete(self._checkListBox.FindString(category))

self.updateCategoriesListbox(event.values(), updateNeeded, updateListbox)


But this raises the question: is the increase in lines of code and number of functions worth the reduction of duplication? Or did I miss a more simple solution?

2 comments:

Greg said...

Haven't read the code in detail, Frank, but in my former life as a programmer, this code smell was a sign that I needed to use some form of table driven programming. I generally found the redesigned code to be both smaller and :strike: more obvious :/strike: easier to comprehend.

Have a look at the discussion of the versions of the trash sorter program in the 2nd edition of Thinking in Java (Bruce Eckel) - it might suggest ways to refactor your code.

Frank said...

Hi Greg, I am curious how you would refactor the example using table driven programming and arrive at a shorter and/or simpler version than my solution. Wouldn't the overhead of setting up the table be relatively large, just like the overhead of creating the local functions in my example? Cheers, Frank