воскресенье, 17 ноября 2013 г.

ServiceStack performance in mono part2


In previous part I told about some performance enhancements which could be used with ServiceStack running over mono XSP web server. But nobody uses XSP in production environment, the most common use cases are nginx+mono-fastcgi and apache+mod_mono. But what is the performance in such environment? Will see it.

Configuration

Apache

If you want to use mono with apache, you have to install mod_mono for apache and configure it according to this article. To install mod_mono in Ubuntu you can type
  sudo apt-get install libapache2-mod-mono
after that you have to reinstall mono web server was compiled from sources. Change the directory to xsp source code and run sudo make install from it. I am going to benchmark mono under apache in following configurations:
  • 1. Direct access to static html file from apache without mono.
  • 2. Get ServiceStack "Hello, World!" service throught apache2-mod-mono
  • 3. Get static html file and "Hello, World!" aspx page throught apache2-mod-mono without ServiceStack.
To manage this I use following config in /etc/apache2/http.conf. For direct static file access I placed hello.html in the web server root (/var/www)
NameVirtualHost ssbench3:80
NameVirtualHost ssbench2:80

<VirtualHost ssbench3:80>
    ServerName ssbench3
    DocumentRoot /var/www/ssbench3
#    MonoPath default "/usr/bin/mono/2.0"
    MonoServerPath ssbench3 /usr/bin/mod-mono-server4
    AddMonoApplications ssbench3 "ssbench3:/:/var/www/ssbench3"
        
    <location />
 MonoSetServerAlias ssbench3
 Allow from all
 Order allow,deny
 SetHandler mono
    </location>
</VirtualHost>

<VirtualHost ssbench2:80>
    ServerName ssbench2
    DocumentRoot /var/www/ssbench2
#    MonoPath default "/usr/bin/mono/2.0"
    MonoServerPath ssbench2 /usr/bin/mod-mono-server4
    AddMonoApplications ssbench2 "ssbench2:/:/var/www/ssbench2"
        
    <location />
 MonoSetServerAlias ssbench2
 Allow from all
 Order allow,deny
 SetHandler mono
    </location>
</VirtualHost>

Nginx

Configuration of Nginx is similar to Apache, differences are only in transport between mono and front-end web server. Apache uses mod-mono-server while nginx uses fastcgi-mono-server. Also, you may note that I added one additional configuration: nginx as proxy to xsp4.

To configure nginx I followed this guide. I added following lines to /etc/nginx/fastcgi_params

fastcgi_param HTTP_HOST $host;
fastcgi_param  PATH_INFO          "";
fastcgi_param  SCRIPT_FILENAME    $document_root$fastcgi_script_name;

And added virtual hosts to /etc/nginx/sites-enabled/default

 server {
         listen   81;
         server_name  ssbench1;
         access_log   /var/log/nginx/ssbench1.log;
 
         location / {
     proxy_pass http://127.0.0.1:8080/;
     proxy_set_header   X-Real-IP $remote_addr;
     proxy_set_header   Host $http_host;
     proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
         }
 }

 server {
         listen   81;
         server_name  ssbench2;
         access_log   /var/log/nginx/ssbench2.log;
 
         location / {
                 root /var/www/ssbench2/;
                 index index.html index.htm default.aspx Default.aspx;
                 fastcgi_index Default.aspx;
                 fastcgi_pass 127.0.0.1:9000;
                 include /etc/nginx/fastcgi_params;
             }

 }
 
 server {
         listen   81;
         server_name  ssbench3;
         access_log   /var/log/nginx/ssbench3.log;
 
         location / {
                 root /var/www/ssbench3/;
                 index index.html index.htm default.aspx Default.aspx;
                 fastcgi_index Default.aspx;
                 fastcgi_pass 127.0.0.1:9000;
                 include /etc/nginx/fastcgi_params;
         }

 }

after that, I ran the command:

fastcgi-mono-server4 /applications=ssbench2:/:/var/www/ssbench2/,ssbench3:/:/var/www/ssbench3 /socket=tcp:127.0.0.1:9000

Also, I ran xsp4 server hosted ServiceStack on port 8080

EDIT: after the post was written, I additionaly benchmarked several configuration were not mentioned in the first version, they are:

  • Nginx as frontend proxy to apache server with mod-mono
  • Self-hosted ServiceStack instance based on two classes: AppHostHttpListenerBase and AppHostHttpListenerLongRunningBase. How to create self-hosted ServiceStack you can read in ServiceStack wiki. Also, you can look in test source code to get additional details
  • Nginx as frontend proxy to self-hosted ServiceStack.
  • Nginx plus HyperFastCgi (is a new fastcgi server I written. Replacement of mono-webserver-fastcgi)

Benchmark results

Before I'll print the results I want to say a couple words about my expectations. What did I expect? At first, I predicted that nginx be winner of serving static html pages. It was obvious. Secondly, I thought that nginx+ServiceStack get slightly better results versus Apache+ServiceStack and maybe XSP+ServiceStack due to nginx async behaviour and lower processor usage. Also, I thought that performance difference between Apache+ServiceStack and XSP+ServiceStack should be minimal. They are both use the same threading model and what could I expect a little overhead in apache<->mod-mono communications. But... Here are the results

Configurationrequests/secStandart deviationstd dev %Comments
Apache2 direct file7129.95217.573.05
Apache2+mod_mono+ServiceStack1314.3022.401.70
Apache2+mod_mono hello.html924.0212.821.39
Apache2+mod_mono hello.aspx-----------Memory Leaks, Crashes
Nginx direct file10458.71147.281.41
Nginx+fastcgi-server+ServiceStack571.368.811.54Memory Leaks
Nginx+fastcgi-server hello.html409.489.142.23Memory Leaks
Nginx+fastcgi-server hello.aspx458.559.892.16Memory Leaks, Crashes
Nginx+proxy to Apache2+mod-mono+ServiceStack1143.828.490.74
Nginx+proxy to self-hosted ServiceStack (AppHost HttpListenerBase)1993.8217.620.88
Nginx+proxy to self-hosted ServiceStack (AppHost HttpListenerLongRunningBase)1664.9427.451.65
Nginx+HyperFastCgi (tcp keepalive)+ServiceStack2041.2523.181.14See more info
Nginx+proxy to xsp4+ServiceStack1402.3345.423.24Unstable Results, Errors
xsp4+ServiceStack2246.5121.310.94
Self-hosted ServiceStack (AppHost HttpListenerBase)2697.130.11.12
Self-hosted ServiceStack (AppHost HttpListenerLongRunningBase)2313.1133.141.43

What can we see? First place in serving ServiceStack takes xsp4. Then goes Apache+mod_mono and the last one is Nginx+fastcgi-server which is four times worse then the winner. I did not mentioned here Nginx+proxy xsp4 configuration because during test execution in half of test runs I get errors when receive json data. There were not so many errors (~1500 on 100 000 requests), but they were exist and this was the reason to drop away nginx+xsp4 configuration from competition. By the way performance result for the configuration slightly better than apache+mod_mono and much better than Nginx+fastgi-server.

Also I did not include HyperFastCgi server in the chart which shows good performance, because it was created after these benchmarks have done. Benchmarks of the Nginx+HyperFastCgi server you can find in next part

As serving static html files the first place takes Nginx as expected, second by Apache and after them goes all other configuration: xsp4 (you can see test results for static xsp4 html serving in previous post), Apache+mod_mono, Nginx+fastcgi. They all are really very slow comparing with Nginx or Apache.

For .aspx page I could not get reliable results. At first, there are memory leaks in mono web server during processing the aspx pages and they are possible a reason of crashes I've got. I could only get ~20000 requests with Nginx+fastcgi and several thouthands request with Apache+mod_mono before mono hanged or got SIGSEGV. I suspect that the reason of these faults are changes of hadling and spawning threads and changes performed in mono GC. Hope that this instablity will be fixed in next mono release.

Also I've mentioned that fastcgi-mono-server produced a huge memory leaks during runs. After processing 100 000 requests it was used about 600M of memory! With such configuration you cannot serve large amount or requests without regular restarting of the server. Also performance of fastcgi-mono-server is extremely slow compared to mod-mono-apache. What is going on in the server? I am going to look inside it in the next posts

Links:

среда, 6 ноября 2013 г.

Servicestack performance in mono


When I read ServiceStack channel on Google+ I found an benchmark which said that ServiceStack serialization under mono is very slow. That is discouraged me because I thought that SS demonstrated very good json serialization performance versus other .net json serialization frameworks. Maybe testers used wrong configuration or bad test case? The questions were opened for me and I decided to check it by myself.

Preparing environment and measurement metrics

My environment:
CPU: Intel(R) Core(TM)2 Duo CPU E4600 @ 2.40GHz
OS: Ubuntu 12.04 32 bit
Mono Runtime Engine version 3.2.5 (master/6a9c585 Fri Oct 25 01:56:00 NOVT 2013)

I have built mono from the github sources as described here. As measurement tool I am going to use ab from apache2-utils package. If you want to install ab, you can write apt-get install apache2-utils. I am going to run ab 5 times, performing 100000 url gets each time and get the result mean. Every run I will use 10 threads to run request in parallel.

The command looks like this: ab -n 100000 -c 10 http://host:port/url

ServiceStack was compiled from github v3 branch in mono release build for Mono/.NET 4.0 platform

As soon as environment is prepared I have to create test case. I choose to create very simple ServiceStack service similar to benchmarks which returns "Hello, world!" message. You can find source code at github. Also I would like to get some metrics for comparison. I choose to create simple ASP.NET application with "Hello, world" .aspx and .html files and benchmark them.

Start benchmarking

All tests I made from localhost. This reduces overhead for network traffic, but takes processor resources what penalties to absolute results. But difference is not so much for mono benchmarks, so I decide to choose more stable results rather than higher absolute values (which could be more higher when run at faster processor unit)

UrlWeb serverrequests/secStandart deviationstd dev %
hello.aspxxsp41659.23879.394.78
hello.htmlxsp41004.42834.473.43
hello.htmlapache27129.95676.801.08
Servicestackxsp41913.74634.841.82

Amazing results. You can see, that serving static html page in apache2 has the better performance than do it with xsp4, what was predictable, but not seven-times difference! Also, apsx page serves 1.6x faster than static html. Do you expect this? I did not.

Also, when I ran these benchmarks, I found that xsp4 grew in memory very fast when serving apsx pages, and after some limit (~265m) killed threads and produced deny of service error. Seems there is some memory leak in mono web server

But our goal is ServiceStack. You can see, that ServiceStack runs faster than aspx page or static html page in xsp4, but not so fast as apache2 static html. Why is so slow? Can we improve the performance? Answers to these questions you will find in next chapters

Looking inside ServiceStack runs

Why ServiceStack runs on mono not so fast as we can expect? To find answers to the question I turned up profile mode for xsp4 and look into generated profiles. To do it, before running xsp4 execute following command in shell:

export MONO_OPTIONS="--profile=log:noalloc,output=../output.mlpd"

log:noalloc means that we don't want to gather info about allocated objects. We are interested only in method calls timing
output=../output.mlpd sets the name of file for profiling information be gathered. Please note that we set parent directory instead of current for output file. Web server watches for changes in current directory and if we set it web server will get a lot of notification messages that the directory has changed and it draws back on the performance.

After that run the commands:

ab -n 500 -c 10 http://host:port/url
mprof-report output.mlpd > profile.txt

500 method calls is enough for getting profiling information, mprof-report produces human-readable form for the info.

Method call summary
Total(ms) Self(ms)      Calls Method name
   56244        8       1581 (wrapper runtime-invoke) :runtime_invoke_void__this___object (object,intptr,intptr,intptr)
   54344        3        500 Mono.WebServer.XSPWorker:RunInternal (object)
   54240        3        500 (wrapper remoting-invoke-with-check) Mono.WebServer.XSPApplicationHost:ProcessRequest (int,System.Net.IPEndPoint,System.Net.IPEndPoint,string,string,string,string,byte[],string,intptr,Mono.WebServer.SslInformation)
   54237        5        500 Mono.WebServer.XSPApplicationHost:ProcessRequest (int,System.Net.IPEndPoint,System.Net.IPEndPoint,string,string,string,string,byte[],string,intptr,Mono.WebServer.SslInformation)
   53513        4        500 Mono.WebServer.BaseApplicationHost:ProcessRequest (Mono.WebServer.MonoWorkerRequest)
   53390        1        500 Mono.WebServer.MonoWorkerRequest:ProcessRequest ()
   53226        5        500 System.Web.HttpRuntime:ProcessRequest (System.Web.HttpWorkerRequest)
   53173        4        500 System.Web.HttpRuntime:RealProcessRequest (object)
   53157       14        500 System.Web.HttpRuntime:Process (System.Web.HttpWorkerRequest)
   44442       14        500 System.Web.HttpApplication:System.Web.IHttpHandler.ProcessRequest (System.Web.HttpContext)
   44403       18        500 System.Web.HttpApplication:Start (object)
   41356      416        500 System.Web.HttpApplication:Tick ()
   40940      148        500 System.Web.HttpApplication/c__Iterator1:MoveNext ()
   17158       10        500 ServiceStack.WebHost.Endpoints.Support.EndpointHandlerBase:ProcessRequest (System.Web.HttpContext)
   17136       39        500 ServiceStack.WebHost.Endpoints.RestHandler:ProcessRequest (ServiceStack.ServiceHost.IHttpRequest,ServiceStack.ServiceHost.IHttpResponse,string)
   11422       25        500 System.Web.HttpApplication:PipelineDone ()
   11047       12        500 System.Web.HttpApplication:OutputPage ()
   11033       53        500 System.Web.HttpResponse:Flush (bool)
   10996      74       2108 System.Web.Configuration.WebConfigurationManager:GetSection (string,string,System.Web.HttpContext)
   10646       5       1004 System.Configuration.Configuration:GetSectionInstance (System.Configuration.SectionInfo,bool)
    9401     569     130811 System.Collections.Hashtable:GetHash (object)
    9252     596     106531 System.Collections.Hashtable:get_Item (object)
    8405       11        500 System.Web.HttpApplicationFactory:GetApplication (System.Web.HttpContext)
    8082        1        500 System.Web.HttpApplication:GetHandler (System.Web.HttpContext,string)
    8081        9        500 System.Web.HttpApplication:GetHandler (System.Web.HttpContext,string,bool)
    6861    1760      25111 Mono.Globalization.Unicode.SimpleCollator:CompareInternal (string,int,int,string,int,int,bool&,bool&,bool,bool,Mono.Globalization.Unicode.SimpleCollator/Context&)
    6707        8      2500 ServiceStack.WebHost.Endpoints.Extensions.HttpRequestWrapper:get_HttpMethod ()
    6699       13       500 ServiceStack.WebHost.Endpoints.Extensions.HttpRequestWrapper:Param (string)

I bold suspicious methods with both long execution time and large number of calls. As you can see only one is from ServiceStack code it is a property HttpRequestWrapper.HttpMethod. So what can we do, how can we increase performance, when most of long executing calls are related to mono and mono web server?

Lets have a look what methods call long-executing methods. To get info about backtraces, you should run command

mprof-report --traces ../output.mlpd > profile-traces.txt
   10996       74       2108 System.Web.Configuration.WebConfigurationManager:GetSection (string,string,System.Web.HttpContext)
 500 calls from:
  System.Web.HttpApplication:Start (object)
  System.Web.HttpApplication:Tick ()
  System.Web.HttpApplication/c__Iterator1:MoveNext ()
  System.Web.HttpApplication:GetHandler (System.Web.HttpContext,string)
  System.Web.HttpApplication:GetHandler (System.Web.HttpContext,string,bool)
  System.Web.HttpApplication:LocateHandler (System.Web.HttpRequest,string,string)
 500 calls from:
  System.Web.HttpRuntime:RealProcessRequest (object)
  System.Web.HttpRuntime:Process (System.Web.HttpWorkerRequest)
  System.Web.HttpApplication:System.Web.IHttpHandler.ProcessRequest (System.Web.HttpContext)
  System.Web.HttpApplication:Start (object)
  System.Web.HttpApplication:PreStart ()
  System.Web.Configuration.WebConfigurationManager:GetSection (string)
 500 calls from:
  Mono.WebServer.XSPWorkerRequest:SendHeaders ()
  Mono.WebServer.XSPWorkerRequest:GetHeaders ()
  Mono.WebServer.MonoWorkerRequest:get_HeaderEncoding ()
  System.Web.HttpResponse:get_HeaderEncoding ()
  System.Web.Configuration.WebConfigurationManager:SafeGetSection (string,System.Type)
  System.Web.Configuration.WebConfigurationManager:GetSection (string)
 500 calls from:
  System.Web.HttpApplication:System.Web.IHttpHandler.ProcessRequest (System.Web.HttpContext)
  System.Web.HttpApplication:Start (object)
  System.Web.HttpApplication:Tick ()
  System.Web.HttpApplication/c__Iterator1:MoveNext ()
  System.Web.HttpApplication/c__Iterator0:MoveNext ()
  System.Web.Security.UrlAuthorizationModule:OnAuthorizeRequest (object,System.EventArgs)

Look at the first backtrace. Don't you think that locating handler in web.config for every request looking strange? I think, all info about handlers should be loaded only once at application start and then reused for each request. If you look into mono code you will see that handlers are cached by mono, but why is ServiceStack handler is not cached?

The answer in these lines of code:
HttpHandlersSection httpHandlersSection = WebConfigurationManager.GetSection ("system.web/httpHandlers", req.Path, req.Context) as HttpHandlersSection;
ret = httpHandlersSection.LocateHandler (verb, url, out allowCache);

IHttpHandler handler = ret as IHttpHandler;
if (allowCache && handler != null && handler.IsReusable)
        cache [id] = ret;

To be cachable ServiceStack factory handler must implement IHttpHandler interface has IsReusable property set to 'true' and be allowed to cache. In mono source code you can find that allowCache means handler path in configuration section must not be "*" but it allowed to be "servicestack*" for example. So I changed httpHandlers section in web.config by changing attribute path="*" to path="servicestack*" and added implementation of IHttpHandler interface to ServiceStackHttpHandlerFactory

  #region IHttpHandler implementation

  void IHttpHandler.ProcessRequest(HttpContext context)
  {
   throw new NotImplementedException();
  }

  bool IHttpHandler.IsReusable
  {
   get
   {
    return true;
   }
  }

  #endregion

Then I recompiled ServiceStack and performed new benchmarks

UrlWeb serverrequests/secStandart deviationstd dev %
Servicestackxsp41913.74634.841.82
Servicestack reusable handler factoryxsp42003.23835.391.77

Performance is increased by 4.68%. Not so much, but this just a start

In profiler we see that GetSection now called 1624 times instead of 2108

   14158       33       1624 System.Web.Configuration.WebConfigurationManager:GetSection (string,string,System.Web.HttpContext)

Now we will try to remove another overheads of GetSection calling. We can see that this method is called from HttpApplication.PreStart method and HttpResponse.HeaderEncoding property. Looking into source code brings a solution: get globalization section only once and than reuse it. This can be done only by changing mono sources. I did it and get results:

UrlWeb serverrequests/secStandart deviationstd dev %
Servicestack (mono 9eda1b4)xsp41958.3721.541.10
Servicestack (patched mono 9eda1b4)xsp42025.31621.561.06

Performance additionally gained 3.46%. Unfortunately before the patch I have had to update mono to revision 9eda1b4 and this dropped performance by 50 points from previous results

Now profiler shows 611 calls of GetSection and 7500ms what is much better
    7516       13        611 System.Web.Configuration.WebConfigurationManager:GetSection (string,string,System.Web.HttpContext)

Please note that this hack will work only if you don't use different globalization sections in web.config files are located in subdirectories of your site. If you site requires to use own globalizations for each path, don't use this hack

Now lets look to the HashTable:GetHash method. This method is fast, but it called too much times. It is not simply to reduce number of calls, but some hints could help. For example: add key in appSetting section of web.config file and you will reduce several thousands of GetHash calls but you should know this does not boost performance to any significant value

  <add key="MonoAspnetInhibitSettingsMap" value="true"/>

This key is used by mono to map some config sections to another one. If you do not use RoleMembership functionality or SqlServerCache you can disable mappings by adding the key. For more information you can read an article http://www.mono-project.com/ASP.NET_Settings_Mapping

..To be continued