Yield: A Little Background
The yield keyword in C# is pretty cool. Being used within an iterator, yield
lets a function return an item as well as control of execution to the caller and upon next iteration, resume where it left off. Neat, right? MSDN documentation lists these limitations surrounding the use of the yield
keyword:
- Unsafe blocks are not allowed.
- Parameters to the method, operator, or accessor cannot be
ref
or out
.
- A
yield
return statement cannot be located anywhere inside a try
-catch
block. It can be located in a try
block if the try
block is followed by a finally
block.
- A
yield
break statement may be located in a try
block or a catch
block but not a finally
block.
So what does this have to do with API specifications?
A whole lot really, especially if you’re dealing with collections. I personally haven’t been a big user of the yield
keyword, but I’ve never really been forced to use it. After playing around with it for a bit, I saw a lot of potential. I’ve written before about what I think makes a good API. In my article, I was making a point to discuss two perspectives:
- Who needs to implement your interface. You want it to be easy for them to implement.
- Who needs to call your interface. You want it to be easy for them to use.
In my opinion, the IEnumerable<T> interface was a tricky thing to work with as a return value. You can essentially only iterate an IEnumerable
, and at the time of calling a function, maybe that’s not what you want to do. The flip side is that for the person implementing the interface, IEnumerable<T>
is a really easy interface to satisfy. However, the yield
keyword has opened up some new doors.
In this article, I’d like to go over a couple of different approaches for an API and then explain why the yield
keyword might be something you consider next time around.
Disclaimer: I’m not claiming anything I’m about to present is the only way or the best way–I’m just sharing some of my own findings and perspective.
Interface For Returning Collections
The first type of API I’d like to look at is for returning collections. Based on my own API guidelines, I’d ideally choose an interface or class to return that provides a lot of information to the caller that is also easy to create for the implementer of my interface. The List<T> class is a great choice:
- It’s easy to construct
- It’s built-in to the .NET framework
- It provides many handy functions (All of the IList<T> functionality as well as things like
AddRange()
, or functions that support delegates)
My next choice might be to have a return type of IList<T>
, which would provide a little less ease of use to the caller, but make it even easier for the implementer of the interface. They could return arrays of type T
, since an array implements the IList<T>
interface, or their own custom list implementation that doesn’t inherit from the List<T>
class. The differences between using IList<T>
and List<T>
are arguable pretty small.
A third alternative, which I would have avoided in the past, is to return an IEnumerable<T>
. My opinion used to be that this made the life of the interface implementer a bit easier compared to returning an IList<T>
, but complicated the life of the caller for a couple of reasons:
- The caller would have to use the results of the function in a
foreach
loop.
- The caller would have to add the items to their own collection to be able to do much more with the items.
My naive implementations of being forced to return an IEnumerable<T>
were… well… crap. I would have constructed a collection within the function, fill it up, and then return it as an IEnumerable<T>
. Then as the caller of my function, I’d have to re-enumerate the results (or add it to another collection):
public static IEnumerable<T> GetItems()
{
var collection = new List<T>();
return collection;
}
private static void Main()
{
var myCollection = new List<T>();
myCollection.AddRange(GetItems());
foreach (var item in GetItems())
{
}
}
Seems like overkill to me with that implementation. However, we’ll examine how using yield
can truly transform this into something… better. So to reiterate, a few potential implementations for an API involving collections might be:
- Return a
List<T>
class
- Return an
IList<T>
(or even an ICollection<T>
) interface
- Return an
IEnumerable<T>
interface
Constantly Creating Collections
My design decisions, in the past, were really driven by two guidelines:
- Make it easier for the person implementing/extending the API
- Make it easy for the person consuming the API
As I quickly illustrated in the first section, this meant that I would have a method where I would create a collection, fill it with items, and then return it. I could generally pick any concrete collection class and return it since I would usually pick a simple collection as the return type. Easy.
One thing that might be noticeable with this approach is that it looks pretty inefficient to keep creating new collections, fill them, and then return them. I’ll illustrate with a simple example. We’ll create a class that has a method on it called GetItems()
. As per my reasoning presented earlier, we’ll have this method return a List<T>
instance, and to make this example easier to work with, we’ll pass in an IEnumerable<T>
instance. For what it’s worth, the input to this function is really just for demonstration purposes here–We’re really focusing on how we’re creating our return value.
public class CreateNewListApi<T>
{
public List<T> GetItems(IEnumerable<T> input)
{
var newCollection = new List<T>();
foreach (var item in input)
{
newCollection.Add(item);
}
return newCollection;
}
}
And now that we have our simple class, we can mock up a little test for performance… Just how inefficient is creating new lists every time?
internal class Program
{
private static void Main(string[] args)
{
const int NUM_ITEMS = 100000000;
var inputItems = new int[NUM_ITEMS];
Console.WriteLine("API Creating New Collections");
var api = new CreateNewListApi<int>();
var watch = Stopwatch.StartNew();
var results = api.GetItems(inputItems);
foreach (var item in results)
{
}
Console.WriteLine(watch.Elapsed);
Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);
Console.ReadLine();
}
}
When I run this on my machine, I get an average of about 1.73 seconds. The memory printout I get when running is 1615908864 bytes. So is that slow? Is that a lot of memory usage? I think it’s pretty hard to say conclusively without being able to compare it against anything. So let’s keep this number in mind as we continue to investigate the alternatives.
Side Note: At this point, some readers may be saying “Well, if the input to our function was also a list (or if whatever our function has to work with was otherwise equivalent to our return value) then we wouldn’t have to go populate a new collection every time… We can just return the underlying collection”! And I would say you are absolutely correct. If your function has access to an instance of the same type as the return type, then you could always just return that instance. But what implications does this have? You’re now giving people access to your underlying internals, and they can go modify them as they please. So, if you need to control access to items being added or removed, then it might not make sense for you to expose your internal collections like this.
Yield to Incoming API Alternatives
We’ve seen how my past implementations may have looked, so how might we tweak this? If we tweak our API a bit, we can make our method return an IEnumerable<T>
instead. Let’s see what that might look like:
public class YieldingApi<T>
{
public IEnumerable<T> GetItems(IEnumerable<T> input)
{
foreach (var item in input)
{
yield return item;
}
}
}
So in this API implementation, all we’ll be doing is iterating over some type of collection and then yielding each result. If we run it through the same type of test as our previous API implementation, what kind of results do we end up with?
internal class Program
{
private static void Main(string[] args)
{
const int NUM_ITEMS = 100000000;
var inputItems = new int[NUM_ITEMS];
Console.WriteLine("API Yielding");
var api = new YieldingApi<int>();
var watch = Stopwatch.StartNew();
var results = api.GetItems(inputItems);
foreach (var item in results)
{
}
Console.WriteLine(watch.Elapsed);
Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64);
Console.ReadLine();
}
}
When I run this on my machine, I get an average of about 2.80 seconds. The memory printout I get when running is 449409024 bytes. How does this relate back to our first implementation? Well, it’s certainly slower. It takes about 1.62x as long to enumerate using the yield implementation as it did with the first API we created. However, yield
also uses less than 1/3 (about 27.8%, actually) of the memory footprint when compared to the first implementation. Pretty cool results!
Site Note: So yield
was a bit slower according to our results, but what happens if print the elapsed time before we run that foreach loop? Well, on my machine it averages at about one millisecond. Now that’s fast, right?! The cool thing about using yield
with the IEnumerable<T>
interface is that the work is deferred. That is, not until the program goes to actually run the enumeration do we get our performance hit. Try it out! Try moving the time printout from after the foreach
loop to before the foreach
loop. Try sticking breakpoints in on the line that yields. You’ll see what I mean.
Summary
In this article, I’ve explored two different ways of implementing an API (specifically focusing on the return value). We saw a brief performance analysis between the two and I highlighted some differences in both approaches. Let’s recap though:
- Approach 1: Returning a
List<T>
and creating the collection ahead of time
- Appeared to be overall a bit faster then yielding
- Consumed much more memory than yielding
- Callers can use the results immediately for enumeration, checking count, or as a collection to add more things to
- The return type of
List<T>
is a bit more restrictive than an IEnumerable<T>
like in the second API implementation
- Approach 2: Return type of
IEnumerable<T>
and yielding results
- Appeared to be overall a bit slower than the
List<T>
implementation
- Lazy. We don’t actually execute any enumeration code until the caller actually enumerates
- Consumed significantly less memory than the first approach using
List<T>
- Callers can enumerate the results immediately, but they need to add the results to a collection class to do much more than enumerate
So next time you’re designing an API for your interfaces and classes, try keeping these things in mind!
The post Yield! Reconsidering APIs with Collections appeared first on Dev Leader.