An Exercise In Recursion
Without the appropriate amount of caffeine or alcohol, “just do it recursively” can be a developer’s least favorite phrase. Don’t get me wrong, there is nothing more elegant than a well written recursive function. But when faced with a choice, most of us choose to add programming overhead and iterate before we dive into the seedy underbelly of recursive logic. Nevertheless yesterday I was faced with a situation where I knew recursion would shine. I took the leap, and was happy with the result.
A module we’re building in our current Python-Django project is in charge of communicating with a 3rd party API. Our method takes a list of input IDs and calls an API endpoint using the Python HTTP requests library, then returns a sensible data structure. It looked something like this:
Original (non-recursive) code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ''' getDocumentMeta: Use IDs to get doc metadata from a 3rd party endpoint. ''' def getDocumentMeta(self, ids): # If we didn't get a list, make it a list if not isinstance(ids, list): ids = [ids] # Make post call to api, self.post makes call to request.post # URL gets passed from ENV file by DataProvider data = self.post(self.urls.get('documentsMeta'),{ "apiKey" : self.apiKey, "docIds" : ids }) # Handle 3rd party formatting result = json.loads(data).get('result',{}) # Return document links from id's ret = {} for docs in result: ret[docs.get('docId')] = docs.get('meta') return ret # |
OMG he codes Python now? Who knew.
In testing, this method was hunky-dory. We would pass sets of up to 100 test IDs to the endpoint and get data back, noting that the more input IDs we passed, the longer the endpoint would take to return. Enter production data… a request with over 1000 IDs. This caused the API to hang until it timed out completely returning a 502.
After some poking I learned the timeout was exactly 2 mins with a notable sigma of processing/in-flight time for the same batch. That meant that I could send the IDs in smaller subsets, but there was no guarantee that also wouldn’t cause a timeout. Because of the variability, a lot of traditional solutions were off the table. Because of the massive codebase surrounding the module, it would be nice to keep our changes contained.
The best solution: an in-place refactor of the function. If we encountered a timeouts, call our endpoint again with a decreasing subset of IDs. The higher level architecture doesn’t even know something changed.
Here was my solution:
New (recursive) method
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | ''' This has been written recursively to send only a chunkSize number of ids at a time to the endpoint to prevent timeout ''' def getDocumentMeta(self, ids, chunkSize=300, ret={}): # Pop off the number of ids based on a chunk size # Set a terminating condition if not len(ids): return ret nextChunkSize = min(len(loans),chunkSize) # Make post call to api, self.post makes call to request.post # URL gets passed from ENV file by DataProvider data = self.post(self.urls.get('documentsMeta'),{ "apiKey" : self.apiKey, "docIds" : ids[0:nextChunkSize] }) try: # Handle 3rd party formatting result = json.loads(data).get('result',{}) for docs in result: ret[docs.get('docId')] = docs.get('files') # Call this function again with the remaining docs return self.getLoanDocuments(loans[nextChunkSize:],chunkSize,ret=ret) except ValueError, e: # Call the collection that was passed with a smaller chunk size return self.getLoanDocuments(loans, chunkSize=(chunkSize/2), ret=ret) # |
I’ll abstract this a little more in the future. Maybe recursion isn’t that bad after all?
0 Comments