当前位置: 首页 > Linux > 正文

TCP tuning – 10G NIC on linux

Spent a lot of time on TCP tuning for SL6, kernel 2.6.32-358.14.1.el6.x86_64, the been using most of parameters since then until 6.5, so share the experience here.

In the past, I played with several types of 10G NIC, all on SL5, only some of them survived from my test, they fail at either at poor performance, or data corruption during multiple streams transfers.

 

To be noted my test is multiple stream test for storage nodes, receive and deliver data in a large range of RTT(0.1 to 300ms), clients mixed with 1G and 10G NIC.

In my recent test, I used the node has 32GB memory, Mellanox 10G NIC, 12 CPUs, it was a SL5 node, just upgraded to SL6. I mounted two LUNs so it has enough I/O bandwith for the test.

The first driver I tested was 2.0 which came with SL6.4, it was not succeful, it cashed the kernel in 3 minutes with the following error in kernel.

Then, I tried version 1.5.10, which also generates some memory allocation errors, but with some further tunings, it passed my stress tests. Performance is also very good.

 

Tunning in sysctl.conf

To be noted is that, I increased tcp_mem to let TCP has more memory, for my data server is mainly being used for data transfer. So, if you have a server also doing something else, then probably you should lower the number for other applications.

 

For SACK and timestamps

They were blamed for their too much CPU cost. However, after kernel 2.6.25, there are lots of patches for SACK to save avoid too much CPU usage. I did not see significent CPU usage under stress test.

 

Higher syn_backlog,max_backlog

I set syn_backlog, max_backlog to higher number mainly because there could be short time high rate data taking, but I did not set txqueuelen to higher(default is 1000), I did not set Mellanox adaptive-rx to off either. Try the setting if you server traffic patten changes all the time. For example, sometime quiet, then sometime very busy.

 

General suggestion

Carefully chose driver(native or the latest driver may not yield good result), then leave sysctl.conf empty, and check kernel picked number, then start from there.

On different hardware, memory setting may very different, however, the following parameter always good for high throughput environment.(swich low_latency to off if not applicable to you environment)

In most cases, kernel is capable to pick the best number for memory related parameters, if not, tweak them a bit.

Note: you diniftely need to increase min_free_kbytes if you see memory allocation error in /var/log/message.

Some other options may related with 10G NIC performance

Note: The following options are managed/controlled by 10G network card driver, and nowadays most of driver are already tunned according to your host configuration.

 

Network card driver option — Adaptive RX/TX

The network driver uses adaptive interrupt moderation for the receive path, which adjusts the moderation time to the traffic pattern, use netstat

How to query run:

To change it run:

rx-usec and rx-frames

To set interrupt coalescing settings when adaptive moderation is disabled, use:

Note: usec settings correspond to the time to wait after the *last* packet is sent/received before triggering an interrupt.

Offload feature

To query stateless offload status run:

To set stateless offload status run:

Ring size

To query ring size values run:

To modify rings size run:

Number of ring entries

To query ring entries run:

To set ring entries run:

Note: some network card driver don’t support ‘number of ring entries’ and ‘ring size’ operations. And, when you change two options, mostly both value can’t be set to max value at the same time according to Hardware and driver limit.

Good references

Tuning could be very different according to applications, here are some good references with very good explainations.

http://www.acc.umu.se/~maswan/linux-netperf.txt

http://fasterdata.es.net/host-tuning/linux/
http://en.wikipedia.org/wiki/TCP_window_scale_option
http://www.psc.edu/index.php/networking/641-tcp-tune

http://man7.org/linux/man-pages/man7/tcp.7.html
https://www.frozentux.net/ipsysctl-tutorial/ipsysctl-tutorial.html#TCPVARIABLES
http://en.wikipedia.org/wiki/Transmission_Control_Protocol

http://www.linuxvox.com/2009/11/what-is-the-linux-kernel-parameter-tcp_low_latency
http://www.ibm.com/developerworks/library/l-tcp-sack/

More references for other platforms

AIX:  For more information, see section 4.6 in the http://www.redbooks.ibm.com/redbooks/SG247347/wwhelp/wwhimpl/js/html/wwhelp.htm document.

In addition, see the http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/tcp_streaming_workload_tuning.htm document.

HP-UX: For more information, see thendd command information in the following documents:

http://docs.hp.com/en/B2355-91020/B2355-91020.pdf
http://docs.hp.com/en/TKP-90203/index.html

HP-UX: Also, see the _recv_hiwater_def and tcp_xmit_hiwater_def parameter information in the following document: http://docs.hp.com/en/11890/perf-whitepaper-tcpip-v1_1.pdf

Linux: For more information, see the following documents:

http://www.ibm.com/developerworks/linux/library/l-hisock.html
http://fasterdata.es.net/TCP-tuning/linux.html
http://www.onlamp.com/pub/a/onlamp/2005/11/17/tcp_tuning.html?page=2

Solaris For more information, see section 2.2 in the following document: http://www.redbooks.ibm.com/redbooks/SG247584/wwhelp/wwhimpl/java/html/wwhelp.htm

本文固定链接: http://t.yjsec.com/index.php/2018/03/02/408/ | 下一站

该日志由 admin 于2018年03月02日发表在 Linux 分类下, 你可以发表评论,并在保留原文地址及作者的情况下引用到你的网站或博客。
原创文章转载请注明: TCP tuning – 10G NIC on linux | 下一站

TCP tuning – 10G NIC on linux:等您坐沙发呢!

发表评论

快捷键:Ctrl+Enter