Sysctl optimization

You can find a lot of optimization tips for Linux kernel configuration over the Internet, but not all of them are explained well. Here is our optimization setup for your attention.

We will go step by step to have some explanation on what we’re going to do and optimize.

All these configuration lines are being served under /etc/sysctl.conf

net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.send_redirects = 0

These settings say to the Linux kernel to do not receive or send ICMP redirect packets. These ICMP redirects can be used by an attacker to modify routing tables.
So it sounds reasonable to disable it (set to zero/false/0).
Enabling these options is only meaningful for hosts that are used as routers. If we are talking about server optimization, these options are not needed.

net.ipv4.tcp_max_orphans = 65536

tcp_max_orphans parameter specifies the maximum number of TCP sockets allowed in the system that are not associated with any user file id (user file handle).

When this threshold is reached, orphan connections are immediately dropped with a warning. This threshold only helps prevent simple DoS attacks. It is better not to lower the threshold (rather, to increase to meet system requirements — for example, after adding memory). Each orphan connection consumes about 64KB of unswappable memory. So if you put here 65536, then you need to have 4Gb of RAM for these orphans.

net.ipv4.tcp_fin_timeout = 10

Parameter tcp_fin_timeout determines how long sockets are kept in FIN-WAIT-2 state after our (server) side has closed it. The client side (remote browser, etc.) may never close this connection, so that means it should be closed after the timeout has expired. The default timeout is 60 seconds. For example, a Linux kernel of 2.2 series with a value of 180 seconds was typically used. Basically, you can keep this value, just bear in mind that on high-loaded web servers you’re running into the risk of wasting a lot of memory for storing half-broken dead connections.
FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1 since they consume less than 1.5KB of memory, but also they can “live” longer.

net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 5

How often to check if the connection is no use for a long period? That stated in tcp_keepalive_time parameter. This value is meaningful only for sockets created with the SO_KEEPALIVE flag. The integer variable tcp_keepalive_intvl defines the interval tries are being done. Then the multiplication tcp_keepalive_probes * tcp_keepalive_intvl shows the time drop the connection in case of no response. By default, this interval is set to 75 seconds, so we can calculate that the connection will be closed and dropped in approximately 11 minutes.

net.ipv4.tcp_max_syn_backlog = 4096

tcp_max_syn_backlog defines the maximum number of connection requests kept in memory for which we received no acknowledgment from the connecting client. If you find the server is experiencing overloads, you can try increasing this value.

net.ipv4.tcp_synack_retries = 1

tcp_synack_retries controls the number of SYNACK retransmissions for passive TCP connections. The number of attempts should not exceed 255. A value of 1 corresponds to approximately 35 seconds to establish a connection.

net.ipv4.tcp_mem = 50576 64768 98152

The vector variable (minimum, load mode, and maximum) in the tcp_mem file contains the general settings for memory consumption for the TCP protocol. This variable is measured in pages (usually 4Kb), not bytes.

Minimum: While the total memory size for TCP structures is less than a minimum number of pages, the operating system does nothing.

Load mode: As soon as the number of memory pages allocated for TCP operation reaches this value, the under load mode is activated. In this mode, the operating system tries to limit memory allocations. This mode remains until the memory consumption return to the minimum level.

Maximum: This is the maximum number of memory pages allowed for all TCP sockets.

net.ipv4.tcp_rmem = 4096 87380 16777216

Another vector variable (minimum, default, maximum) in the tcp_rmem file. It contains 3 integers specifying the size of the TCP socket receive buffer.

Minimum: every TCP socket has the right to use this memory upon creation. The possibility of using such a buffer is guaranteed even when the limit is reached (moderate memory pressure). The default value of the minimum buffer size is 8 KB (8192).

Default: The amount of memory allowed for the default TCP socket send buffer. This value replaces the /proc/sys/net/core/rmem_default parameter used by other protocols. The default buffer is usually (by default) 87830 bytes. This defines a window size of 65535 with the default tcp_adv_win_scale and tcp_app_win = 0, slightly smaller than the default tcp_app_win.

Maximum: The maximum buffer size that can be automatically allocated to receive on a TCP socket. This value does not override the maximum set in the /proc/sys/net/core/rmem_max file. When allocating memory “statically” using SO_RCVBUF, this parameter is not applicable.

net.ipv4.tcp_wmem = 4096 65536 16777216

Yet another vector variable in the tcp_wmem file. It contains 3 integer values ​​that define the minimum, default, and maximum amount of memory reserved for TCP socket transmit buffers.

Minimum: every TCP socket has the right to use this memory upon creation. The default minimum buffer size is 4KB (4096).

Default: The amount of memory allowed for the default TCP socket send buffer. This value replaces the parameter /proc/sys/net/core/wmem_default used by other protocols and is usually less than value /proc/sys/net/core/wmem_default. The default buffer size is usually (by default) 16 KB (16384).

Maximum: The maximum amount of memory that can be automatically allocated for the TCP socket transmit buffer. This value does not override the maximum specified in the /proc/sys/net/core/wmem_max file. When allocating memory “statically” using SO_SNDBUF, this parameter is not applicable.

net.ipv4.tcp_orphan_retries = 0

tcp_orphan_retries value specifies the number of unsuccessful attempts, after which the TCP connection that was closed from the server-side and is destroyed. The default value is 7. This is approximately 50 seconds to 16 minutes depending on the RTO. On high-loaded servers, it makes sense to decrease the value of this parameter, since closed connections can consume a lot of resources.

net.ipv4.tcp_syncookies = 0

According to the kernel developers’ recommendations, this mode is better to disable, so we put 0 here.

net.ipv4.netfilter.ip_conntrack_max = 16777216

The maximum number of connections for the work of connection tracking mechanism (for example, iptables). If the value is too low, the kernel rejects incoming connections with a respective entry in the system log.

net.ipv4.tcp_timestamps = 1

It enables TCP timestamps (RFC 1323). Their presence allows you to control the operation of the protocol under high-load conditions (see tcp_congestion_control for the details).

net.ipv4.tcp_sack = 1

Allow TCP selective acknowledgments. This option is actually the requirement for the efficient usage of all the available bandwidth of some networks. Say hello to AWS and GCP! 👋

net.ipv4.tcp_congestion_control = htcp

That option is about the protocol used to manage traffic on TCP networks. The default bic and cubic implementations contain bugs in most versions of the RedHat kernel and its clones. It is recommended to use htcp.

net.ipv4.tcp_no_metrics_save = 1

Says to do not store TCP connection measurements in the cache when closed. Sometimes, it helps to improve performance. Just play with this option for better results.

net.ipv4.route.flush = 1

This option is relevant for kernels 2.4. For some strange reason in 2.4 kernels, if occurs retransmission with a reduced window size within a TCP session, all upcoming connections to this host in the next 10 minutes will have the same reduced window size. This option simply flushes this setting. With current Ubuntu versions, we’re having kernel 4.15 and higher, just bear it in mind.

net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.lo.rp_filter = 1
net.ipv4.conf.eth0.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1

These options are activating protection from IP Address Spoofing.

net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.lo.accept_source_route = 0
net.ipv4.conf.eth0.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0

Then disabling Source Routing.

net.ipv4.ip_local_port_range = 1024 65535

With this option, we increasing the range of local ports available for establishing outgoing connections.

net.ipv4.tcp_tw_reuse = 1

Allow reuse of TIME-WAIT sockets if the protocol considers it as safe.

net.ipv4.tcp_window_scaling = 1

Allowing dynamic resizing of the TCP stack window.

net.ipv4.tcp_rfc1337 = 1

Enabling the protection from TIME_WAIT attacks (RFC 1337).

net.ipv4.ip_forward = 0

Disabling packets forwarding, since we’re still not a router.

net.ipv4.icmp_echo_ignore_broadcasts = 1

Say to not respond to ICMP ECHO requests, sent with broadcasting packets.

net.ipv4.icmp_echo_ignore_all = 1

We also can totally disable response to the ICMP ECHO requests and that way server would not respond to PING requests. Decide yourself if you need this.

net.ipv4.icmp_ignore_bogus_error_responses = 1

Do not respond to bogus error responses. Some routers violate RFC1122 by sending bogus responses to broadcast frames. Such violations are normally logged via a kernel warning. If this is set to TRUE, the kernel will not give such warnings, which will avoid log file clutter.

net.core.somaxconn = 65535

The maximum number of open sockets waiting for a connection. It makes sense to increase the defaults to increase server responsiveness.

net.core.netdev_max_backlog = 1000

The parameter defines the maximum number of packets put in the queue for processing if the network interface receives packets faster than the kernel can process them.

net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

The last values are for the default receive buffer size, default send buffer size, the maximum size of the receive data buffer, and maximum data transfer buffer size. All these settings are for all connections.

Applying these changes to the VPS that this site is on yielding the following result.

Server Software: LiteSpeed
Server Hostname:
Server Port: 80

Document Path: /
Document Length: 27437 bytes

Concurrency Level: 200
Time taken for tests: 0.905 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 277140000 bytes
HTML transferred: 274370000 bytes
Requests per second: 11046.15 #/sec
Time per request: 18.106 ms
Time per request: 0.091 [ms] (mean, across all concurrent requests)
Transfer rate: 298957.94 [Kbytes/sec] received

Yes you read that correctly, a VPS with 8GB’s of RAM, 4 x AMD EPYC 7282 Cores, and a nVME drive.

Leave a Reply