Pages

Saturday, 20 August 2016

C-Sharp Compiler Implementation of C-Sharp 6 Features

Introduction

C# 6.0 was formerly introduced (along with .NET 4.6 and VS2015) in July 2015, and has largely been received positively.
The most common new features in C# 6.0 are:
A simple google for "new C# 6.0 features" should bring up enough articles to properly explain the above listed features.
This article looks into how C# 6 features are implemented behind-the-scene by the C# compilers.

Background

After coming across the new C# 6 features I got curious as to how these features were implemented behind-the-scene. Were they implemented by new Common Intermediate Language (CIL) op codes or were they implemented simply at the C# compiler level, with no modification to CIL?
To help me answer these questions I turned to an old project I did back in 2013 when I was learning CIL. I call it Tril (for TRanslate IL).

Tril builds a language-agnostic "model" of your .NET binary (.exe, .dll) and passes the model to a "translator" which translates it into the syntax of a specified language. Translators are implemented as plugins; this means, in theory, that with Tril I can "decompile" any .NET binary into any given language. I would love to write about Tril but I fear it won't meet the criteria for a Code Project article. If you are interested, however, you can follow the project on GitHub. Please note that Tril is still very much a work-in-progress, and I have hardly touched it for years now. It is also not exactly the best designed project since I started it years ago, before I learnt many of the things I know now.
Back to the issue at hand. With the help of Tril, I set out to see how the new C# 6.0 features are implemented. This article summarizes my findings. The input codes below are the original source codes I wrote, which were compiled by the C# compiler. The output codes are the reconstructed codes generated by Tril from the DLL produced by the C# compiler. The inferences are what I learnt from comparing the input codes to the output codes.
Important notes:
  • I have not had the time to fine-tune Tril, so some of the code generated by the tool still look very machine-like. Also, you will still see some goto CIL op codes in places where I have not yet developed a reliable algorithm to tell me if they should translate to if blocks, while blocks or some other kinds of constructs. And some of the translations are just plain bad. I plan to focus more on the project henceforth, and will resolve many of the issues with time.
  • I have left some pieces of CIL information in the output code for debug purposes. One such piece of information is the CIL code labels (like IL0000:).
  • The test DLL I used was compiled in Debug mode, as compiling it in Release mode removed some pieces of information that were needed by Tril to reconstruct the source code, especially for the methods NullConditional() and ExceptionFilters() discussed below.
  • To conserve space in this article, however, I used the CIL generated in Release mode for methods that could be reconstructed in that mode, and Debug mode only for methods that could not be reconstructed in Release mode.

My Investigations

The nameof Operator

Input code:

public void NameOf()
{
    Console.WriteLine(nameof(NameOf));
}
public void NameOf<T>()
{
    Console.WriteLine(nameof(T));
}

Output code:

public System.Void NameOf() 
{
	IL_0000:  System.Console.WriteLine("NameOf");
	IL_000a:  return;
}
public System.Void NameOf<T>() 
{
	IL_0000:  System.Console.WriteLine("T");
	IL_000a:  return;
}

Inference:

It seems the compiler resolves the name of the member supplied to the nameof operator into a string literal at compile time. No information about the member passed to the nameof operator is carried into CIL.

String Interpolation

Input code:

public void StringInterpolation()
{
    string a = "first", b = "second";
    Console.WriteLine($"string a = {a}, and b = {b}");
    Console.WriteLine($"the ToSting() of this class is {this.ToString()}");
}

Output code:

public System.Void StringInterpolation() 
{
	          System.String V_0;
	          System.String V_1;
	IL_0000:  V_0 = "first";
	IL_0006:  V_1 = "second";
	IL_000c:  System.Console.WriteLine(System.String.Format("string a = {0}, and b = {1}", V_0, V_1));
	IL_001d:  System.Console.WriteLine(System.String.Format("the ToSting() of this class is {0}", ToString()));
	IL_0032:  return;
}

Inference:

I like this one. As you may have guessed, it shows that the new string interpolation feature uses plain old System.String.Format behind the scene.

using static

Input code:

using static System.Console;
...
public void UsingStatic()
{
    Console.WriteLine($"nameof(Console.WriteLine)={nameof(Console.WriteLine)}");
    WriteLine($"nameof(WriteLine)={nameof(WriteLine)}");
}

Output code:

public System.Void UsingStatic() 
{
	IL_0000:  System.Console.WriteLine(System.String.Format("nameof(Console.WriteLine)={0}", "WriteLine"));
	IL_0014:  System.Console.WriteLine(System.String.Format("nameof(WriteLine)={0}", "WriteLine"));
	IL_0028:  return;
}

Inference:

This one is not surprising either. If you know CIL, you know that the concept of namespaces do not really exist in CIL. Type and member names are usually resolved to their fully-qualified names. So, using static is simply a compile-time syntactic sugar to make us even lazier than we already are.
Still on the example above, notice how $"nameof(Console.WriteLine)" and $"nameof(WriteLine)" both resolve to "WriteLine".

Null Conditional

Input code:

public void NullConditional()
{
    bool? defTrueNull = new CSharp6Features()?.DefaultTrue;
    bool defTrue = new CSharp6Features()?.DefaultTrue ?? true;
    string strThis = new CSharp6Features()?[0];
}

Output code:

public System.Void NullConditional() 
{
	          System.Nullable<System.Boolean> V_0;
	          System.Boolean V_1;
	          System.String V_2;
	          System.Nullable<System.Boolean> V_3;
	IL_0000:  ;
	IL_0006:  if (new CSharp6Features() != 0)
	IL_0007:   goto IL_0015;
	IL_0001:  new CSharp6Features();
	IL_000a:  V_3 = default(System.Nullable);
	IL_0013:  goto IL_001f;
	IL_0012:  V_0 = new System.Nullable(V_3.get_DefaultTrue());
	IL_0025:  if (new CSharp6Features() != 0)
	IL_0026:   goto IL_002c;
	IL_0020:  new CSharp6Features();
	IL_002a:  goto IL_0031;
	IL_0029:  V_1 = 1.get_DefaultTrue();
	IL_0037:  if (new CSharp6Features() != 0)
	IL_0038:   goto IL_003e;
	IL_0032:  new CSharp6Features();
	IL_003c:  goto IL_0044;
	IL_003b:  V_2 = null.get_Item(0);
	IL_0045:  return;
}

Inference:

Like I stated above, Tril is still a work in progress. That explains the presence of goto in the output code. It also explains why we have if (new CSharp6Features() != 0) instead of if (new CSharp6Features() != null). I will look into these issues. The labels for the goto op codes are also missing (although you can guess where they should be). The shortcomings nevertheless, you can see what the compiler is trying to do; the compiler is manually checking the objects against null before accessing their members. When I work more on Tril the output will become more obvious.

Exception Filters

Input code:

public void ExceptionFilters()
{
    try
    {
        System.IO.File.Create("////");
    }
    catch (Exception ex) when (ex.Message == null)
    {
        Console.WriteLine("ex.Message is null");
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
    }
}

Output code:

public System.Void ExceptionFilters() 
{
	          System.Exception V_0;
	          System.Boolean V_1;
	          System.Exception V_2;
	IL_0000:  ;
	          try {
	           try {
	IL_0001:    ;
	IL_0002:    System.IO.File.Create("////");
	IL_000d:    ;
	IL_000e:    goto IL_004e;
	           }
	           //end of try block
	           /*filter*/ {
	IL_0015:    if (V_0 != null)
	IL_0016:     goto IL_001c;
	IL_001a:    goto IL_002b;
	IL_0019:    V_0 = null;
	IL_001d:    V_1 = V_0.get_Message() == null;
	IL_002b:    
	           }
	           //end of filter block
	           /*filter-handler*/ {
	IL_002e:    ;
	IL_002f:    System.Console.WriteLine("ex.Message is null");
	IL_0039:    ;
	IL_003a:    ;
	IL_003b:    goto IL_004e;
	           }
	           //end of filter handler block
	           catch(System.Exception ____ex636070160594114817) {
	            V_2 = ____ex636070160594114817;
	IL_003e:    ;
	IL_003f:    System.Console.WriteLine(V_2.get_Message());
	IL_004a:    ;
	IL_004b:    ;
	IL_004c:    goto IL_004e;
	           }
	           //end of catch block
	IL_004e:   goto IL_005e;
	          }
	          //end of try block
	          finally {
	IL_0050:   ;
	IL_0051:   System.Console.WriteLine("finally");
	IL_005b:   ;
	IL_005c:   ;
	IL_005d:   
	          }
	          //end of finally block
	IL_005e:  return;
}

Inference:

I haven't yet fine-tuned the exception filter syntax so the output code still shows how things are laid out in CIL. Exception filters are new to C# but not to CIL, and are implemented in CIL using two blocks: one for the filter itself and one for the block of code to run (the filter handler) if the filter condition is true.

Index Initializers

Input code:

public void IndexInitializers()
{
    var numbers = new Dictionary<int, string>
    {
        [1] = "One",
        [2] = "Two",
        [3] = "Three",
        [3] = "Three again??"
    };
}

Output code:

public System.Void IndexInitializers() 
{
	          System.Collections.Generic.Dictionary<System.Int32, System.String> V_0;
	IL_0000:  ;
	IL_0006:  new System.Collections.Generic.Dictionary<System.Int32, System.String>().set_Item(1, "One");
	IL_0012:  ;
	IL_0013:  new System.Collections.Generic.Dictionary<System.Int32, System.String>().set_Item(2, "Two");
	IL_001f:  ;
	IL_0020:  new System.Collections.Generic.Dictionary<System.Int32, System.String>().set_Item(3, "Three");
	IL_002c:  ;
	IL_002d:  new System.Collections.Generic.Dictionary<System.Int32, System.String>().set_Item(3, "Three again??");
	IL_0039:  ;
	IL_0001:  V_0 = new System.Collections.Generic.Dictionary<System.Int32, System.String>();
	IL_003b:  return;
}

Inference:

Well, I got the translation here quite wrong. It appears Tril isn't good with compiler-generated anonymous objects. It should be something more along the lines:
public System.Void IndexInitializers() 
{
	          System.Collections.Generic.Dictionary<System.Int32, System.String> V_0, V_1;
	IL_0000:  V_1 = new System.Collections.Generic.Dictionary<System.Int32, System.String>();
	IL_0006:  V_1.set_Item(1, "One");
	IL_0012:  ;
	IL_0013:  V_1.set_Item(2, "Two");
	IL_001f:  ;
	IL_0020:  V_1.set_Item(3, "Three");
	IL_002c:  ;
	IL_002d:  V_1.set_Item(3, "Three again??");
	IL_0039:  ;
	IL_0001:  V_0 = V_1;
	IL_003b:  return;
}
The code V_1.set_Item(1, "One"); is the CIL version of V_1[1] = "One";. I would have expected V_1.Add(1, "One") instead of V_1[1] = "One". Maybe I'm missing something.
Anyway, basically, the compiler creates an object on the fly (V_1 in this case), sets the items of that compiler-generated object to match what is described in our user-created object (V_0 in this case), then assigns the compiler-generated object to our user-created object.
Fun fact:
Examining the CIL code for this method, it appears question marks "?" are escaped in CIL strings.

Property Initializers

Input code:

public bool DefaultTrue { get; set; } = true;
public bool DefaultFalse { get; set; } = false;
public bool DefaultTrueReadOnly { get; } = true;
public string First { get; set; } = "First";
public string Last { get; set; } = "Last";

Output code:

[System.Runtime.CompilerServices.CompilerGeneratedAttribute]
[System.Diagnostics.DebuggerBrowsableAttribute(0)]
private System.Boolean _DefaultTrue_k__BackingField;
[System.Runtime.CompilerServices.CompilerGeneratedAttribute]
[System.Diagnostics.DebuggerBrowsableAttribute(0)]
private System.Boolean _DefaultFalse_k__BackingField;
[System.Runtime.CompilerServices.CompilerGeneratedAttribute]
[System.Diagnostics.DebuggerBrowsableAttribute(0)]
private readonly System.Boolean _DefaultTrueReadOnly_k__BackingField;
[System.Runtime.CompilerServices.CompilerGeneratedAttribute]
[System.Diagnostics.DebuggerBrowsableAttribute(0)]
private System.String _First_k__BackingField;
[System.Runtime.CompilerServices.CompilerGeneratedAttribute]
[System.Diagnostics.DebuggerBrowsableAttribute(0)]
private System.String _Last_k__BackingField;
...
public CSharp6Features() 
{
	IL_0000:  _DefaultTrue_k__BackingField = true;
	IL_0007:  _DefaultFalse_k__BackingField = false;
	IL_000e:  _DefaultTrueReadOnly_k__BackingField = true;
	IL_0015:  _First_k__BackingField = "First";
	IL_0020:  _Last_k__BackingField = "Last";
	IL_002b:  base();
	IL_0031:  ;
	IL_0032:  return;
}
...
public System.Boolean DefaultTrue
{
	get
	{
		IL_0000:  return _DefaultTrue_k__BackingField;
	}
	set
	{
		IL_0000:  _DefaultTrue_k__BackingField = value;
		IL_0007:  return;
	}
}
public System.Boolean DefaultFalse
{
	get
	{
		IL_0000:  return _DefaultFalse_k__BackingField;
	}
	set
	{
		IL_0000:  _DefaultFalse_k__BackingField = value;
		IL_0007:  return;
	}
}
public System.Boolean DefaultTrueReadOnly
{
	get
	{
		IL_0000:  return _DefaultTrueReadOnly_k__BackingField;
	}
}
public System.String First
{
	get
	{
		IL_0000:  return _First_k__BackingField;
	}
	set
	{
		IL_0000:  _First_k__BackingField = value;
		IL_0007:  return;
	}
}
public System.String Last
{
	get
	{
		IL_0000:  return _Last_k__BackingField;
	}
	set
	{
		IL_0000:  _Last_k__BackingField = value;
		IL_0007:  return;
	}
}

Inference:

First, the attributes [System.Diagnostics.DebuggerBrowsableAttribute(0)] should be [System.Diagnostics.DebuggerBrowsableAttribute(false)]. C# compiler represents different types of zeroes (0, false, null) using the integer 0.
So, what the compiler does (unsurprisingly) is to:
  1. Create compiler-generated backing fields for the properties. Creating such backing fields is what the compiler has been doing since C# started allowing us to create concrete (non-abstract) properties with no bodies defined, like public string Last { get; set; }.
  2. Assign the user-specified default values to the backing fields in the constructor of the class.
  3. Create the appropriate bodies for the properties.

Expression-Bodied Function Members

Input code:

public string Full => $"{First} + {Last}";
public string GetFull() => $"{First} + {Last}";

public string this[int i] => i.ToString();

public static CSharp6Features operator +(CSharp6Features left, CSharp6Features right)
    => new CSharp6Features() { DefaultFalse = left.DefaultFalse || right.DefaultFalse };

Output code:

public System.String Full
{
	get
	{
		IL_0000:  return System.String.Format("{0} + {1}", get_First(), get_Last());
	}
}
public System.String GetFull() 
{
	IL_0000:  return System.String.Format("{0} + {1}", get_First(), get_Last());
}
public System.String Item
{
	get
	{
		IL_0000:  return i.ToString();
	}
}
public static CSharp6Features op_Addition(CSharp6Features left, CSharp6Features right) 
{
			  CSharp6Features V_0;
	IL_0000:  V_0 = new CSharp6Features();
	IL_0006:  if (left.get_DefaultFalse() != false)
	IL_000c:   V_0.set_DefaultFalse(1);
	IL_000f:  else
	IL_0014:   V_0.set_DefaultFalse(right.get_DefaultFalse());
	IL_001c:  ;
	IL_001d:  return V_0;
}

Inference:

As you would expect, the C# compiler generates the appropriate bodies for the members originally implemented as expression-bodied function members.

Conclusion

This concludes our examination of how the new C# 6.0 features are implemented behind-the-scene. Perhaps in the near feature I will examine interesting features from older versions of C#, like the dynamic keyword. Tril, the tool I built for the decompilation, is not yet perfect, so some of the generated codes looked really bad. I will work on that. Beyond Tril's shortcomings, though, we see some of the little tricks the C# compiler had to learn to add the new C# 6.0 features. We get to also notice that no new feature was built into the Common Intermediate Language (CIL) to implement any of the new C# features; the new features are little more than syntactic sugar.
To get and follow Tril, the tool I used for the decompilations, see GitHub.
The input and output codes used in this article are located in these files:
I hope you enjoyed this article. To explore the field of Common Intermediate Language (CIL), I would recommend the following resources as starting points:
Note:
There may be more recent editions of the files listed above.

No comments:

Post a Comment