实操 Web Cache

http://netkiller.github.io/journal/cache.html

Mr. Neo Chen (陈景峯), netkiller, BG7NYT


中国广东省深圳市龙华新区民治街道溪山美地
518131
+86 13113668890


$Id: 12dfba037a04a025e01285ca3d35a1192d9bb95d

版权声明

转载请与作者联系,转载时请务必标明文章原始出处和作者信息及本声明。

文档出处:
http://netkiller.github.io
http://netkiller.sourceforge.net

微信扫描二维码进入 Netkiller 微信订阅号

QQ群:128659835 请注明“读者”

2017-06-16

摘要

写这篇文章的原因,是我看到网上很多谈这类的文章,多是人云亦云,不求实事,误导读者。

下面文中我会一个一个做实验,并展示给你,说明为什么会这样。只有自己亲自尝试才能拿出有说服力的真凭实据。

2014-03-12 首次发布

2015-08-27 修改,增加特殊数据缓存


目录

1. 测试环境

CentOS 6.5

Nginx安装脚本 https://github.com/oscm/shell/blob/master/nginx/nginx.sh

php安装脚本 https://github.com/oscm/shell/blob/master/php/5.5.8.sh

2. 文件修改日期 If-Modified-Since / Last-Modified

If-Modified-Since 小于 Last-Modified 返回 HTTP/1.1 200 OK, 否则返回 HTTP/1.0 304 Not Modified

每次浏览器请求文件会携带 If-Modified-Since 头,将当前时间发送给服务器,与服务器的Last-Modified时间对对比,如果大于Last-Modified时间,返回HTTP/1.0 304 Not Modified不会重新打开文件,否则重新读取文件并返回内容

2.1. 静态文件

nginx/1.0.15 静态文件自动产生 Last-Modified 头

# nginx -v
nginx version: nginx/1.0.15

# curl -I http://192.168.6.9/index.html
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 07:36:03 GMT
Content-Type: text/html
Content-Length: 6
Last-Modified: Thu, 27 Feb 2014 07:29:50 GMT
Connection: keep-alive
Accept-Ranges: bytes
			

图片文件

# curl -I http://192.168.6.9/image.png
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 07:37:18 GMT
Content-Type: image/png
Content-Length: 41516
Last-Modified: Thu, 27 Feb 2014 07:36:59 GMT
Connection: keep-alive
Accept-Ranges: bytes
			

提示

疑问 nginx/1.4.5 默认没有 Last-Modified

# nginx -v
nginx version: nginx/1.4.5

# curl -I http://192.168.2.15/index.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 02:13:44 GMT
Content-Type: text/html
Connection: keep-alive
				

经过一番周折最终找到答案 Nginx 如果开启 ssi 会禁用Last-Modified 关闭 ssi 后输出如下

# curl -I  http://localhost/index.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 05:44:29 GMT
Content-Type: text/html
Content-Length: 6
Last-Modified: Wed, 25 Dec 2013 03:18:16 GMT
Connection: keep-alive
ETag: "52ba4e78-6"
Accept-Ranges: bytes
				

再测试一次

# curl -H "If-Modified-Since: Fir, 28 Feb 2014 07:42:55 GMT" -I http://192.168.2.15/test.html
HTTP/1.1 304 Not Modified
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 02:34:54 GMT
Last-Modified: Fri, 28 Feb 2014 01:55:50 GMT
Connection: keep-alive
ETag: "530feca6-8b"
			

测试结果成功返回 HTTP/1.1 304 Not Modified, 但又莫名其妙的出现了 ETag。 这就是Nignx本版差异,非常混乱。

既然出现了ETag我们也顺便测试一下

# curl -H 'If-None-Match: "530feca6-8b"' -I http://192.168.2.15/test.html
HTTP/1.1 304 Not Modified
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 02:39:18 GMT
Last-Modified: Fri, 28 Feb 2014 01:55:50 GMT
Connection: keep-alive
ETag: "530feca6-8b"
			

也是成功的

测试图片

# curl -I http://localhost/logo.jpg
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 02:59:04 GMT
Content-Type: image/jpeg
Content-Length: 10103
Last-Modified: Fri, 28 Feb 2014 02:56:37 GMT
Connection: keep-alive
ETag: "530ffae5-2777"
Accept-Ranges: bytes


# curl -H 'If-None-Match: "530ffae5-2777"' -I http://localhost/logo.jpg
HTTP/1.1 304 Not Modified
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 03:03:33 GMT
Last-Modified: Fri, 28 Feb 2014 02:56:37 GMT
Connection: keep-alive
ETag: "530ffae5-2777"

# curl -H "If-Modified-Since: Fri, 28 Feb 2014 12:04:18 GMT" -I http://localhost/logo.jpg
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 03:04:45 GMT
Content-Type: image/jpeg
Content-Length: 10103
Last-Modified: Fri, 28 Feb 2014 02:56:37 GMT
Connection: keep-alive
ETag: "530ffae5-2777"
Accept-Ranges: bytes
			

测试结果,ETag通过测试,If-Modified-Since无论如何也无法返回 304 可能还需要其他的HTTP头,浏览器测试都通过返回 HTTP/1.1 304 Not Modified

现在换成浏览器测试 Chrome Firefox成功, 因为浏览器不会主动发送If-Modified-Since, 浏览器只有发现Last-Modified后,第二次请求才会推送 If-Modified-Since 需要刷新两次页面。

2.1.1. if_modified_since

在开启ssi的情况下,通过参数 if_modified_since 可以开启 Last-Modified

server {
    listen       80;
    server_name  192.168.2.15;
    if_modified_since before;
}
				

测试结果看不到 Last-Modified, 因为 Nginx 的 if_modified_since before;参数只有接收到浏览器发过来的If-Modified-Since头才会发送Last-Modified

# curl -I http://192.168.2.15/test.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 02:39:42 GMT
Content-Type: text/html
Connection: keep-alive
				

最终 if_modified_since before; 数没有起到作用

参数设置为 if_modified_since exact;

# curl -I http://192.168.2.15/test.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 02:45:40 GMT
Content-Type: text/html
Connection: keep-alive

# curl -H 'If-None-Match: "530feca6-8b"' -I http://192.168.2.15/test.html
HTTP/1.1 304 Not Modified
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 02:45:44 GMT
Last-Modified: Fri, 28 Feb 2014 01:55:50 GMT
Connection: keep-alive
ETag: "530feca6-8b"

# curl -H "If-Modified-Since: Fir, 28 Feb 2014 07:42:55 GMT" -I http://192.168.2.15/test.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 02:45:50 GMT
Content-Type: text/html
Connection: keep-alive
				

测试失败,浏览器也是实测失败,ETag却成功

2.2. 通过rewrite伪静态处理

index.php仍然是上面的那个php文件,我们只是做了伪静态

			
location / {
        root   /www;
        index  index.html index.htm;
		rewrite ^/test.html$ /index.php last;
}
			
			

现在我们分别通过curl有chrome/firefox进行测试

			
# curl -H "If-Modified-Since: Fri, 28 Feb 2014 08:42:55 GMT" -I  http://192.168.6.9/test.html
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 08:55:19 GMT
Content-Type: text/html
Connection: keep-alive
Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT
			
			

经过测试无论是 curl 还是 chrome/firefox 均无法返回304.

下面是我的分析,仅供参考。用户请求index.html Nginx 会找到该文件读取 mtime 与 If-Modified-Since 匹配,如果If-Modified-Since大于 Last-Modified返回 304否则返回200.

为什么同样操作经过伪静态的test.html就不行呢? 我分析当用户请求test.html Nginx 首先做Rewrite处理,然后跳转到index.php 整个过程nginx 并没有访问实际物理文件test.html也就没有mtime, 所以Nginx 返回200.

如果 Nginx 按预想的返回304,nginx 需要读取程序返回的HTTP头,Nginx 并没有这样的处理逻辑。

2.3. 动态文件

动态文件没有 Last-Modified 头,我们可以伪造一个

			
# curl -I http://192.168.6.9/index.php
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 07:57:59 GMT
Content-Type: text/html
Connection: keep-alive
			
			

在程序中加入HTTP头推送操作,Last-Modified时间是27号,当前时间是28号,我们要让Last-Modified 小于当前时间才行。

			
# cat index.php
<?php
header('Last-Modified: Thu, 27 Feb 2014 08:39:35 GMT' );
//header('Last-Modified: ' .gmdate('D, d M Y H:i:s') . ' GMT' );
?>
Hello
			
			

现在你将看到 Last-Modified

			
# curl -I http://localhost/modified.php
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 05:59:28 GMT
Content-Type: text/html
Connection: keep-alive
Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT
			
			

注意

虽然我们让动态程序返回了 Last-Modified ,但浏览器不认,经过测试 Chrome / Firefox 均不会承认.php文件,并缓存其内容。

				
# curl -I http://localhost/modified.php
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 05:59:28 GMT
Content-Type: text/html
Connection: keep-alive
Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT

# curl -H "If-Modified-Since: Fri, 28 Feb 2014 08:42:55 GMT" -I  http://localhost/modified.php
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Fri, 28 Feb 2014 05:32:30 GMT
Content-Type: text/html
Connection: keep-alive
Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT
				
				

Last-Modified 对动态程序来说没有起到实际作用

Last-Modified是程序产生的,Nginx无法读到,让程序去处理状态返回是可行的,下面我们修改程序如下。

			
# cat modified.php
<?php
$mtime = 'Fri, 28 Feb 2014 12:04:18 GMT';
cache($mtime);
function cache($mtime)
{
	$http_if_modified_since = null;
	if(array_key_exists ('HTTP_IF_MODIFIED_SINCE',$_SERVER)){
		$http_if_modified_since = $_SERVER['HTTP_IF_MODIFIED_SINCE'];
	}
	echo $http_if_modified_since;
	if ($http_if_modified_since >= $mtime)
	{
		header('Last-Modified: '.$mtime, true, 304);
		exit;
	} else {
		header('Last-Modified: ' . $mtime );
	}

}
print_r($_SERVER);
echo date("Y-m-d H:i:s");
?>
			
			

测试效果

			
# curl -I http://localhost/modified.php
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 05:22:28 GMT
Content-Type: text/html
Connection: keep-alive
			
			

伪造一个 If-Modified-Since 日期小于我们指定的日期程序返回HTTP/1.1 200 OK

			
# curl -H "If-Modified-Since: Fri, 28 Feb 2014 10:04:18 GMT" -I http://localhost/modified.php
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 05:22:13 GMT
Content-Type: text/html
Connection: keep-alive
			
			

伪造一个 If-Modified-Since 日期大于我们指定的日期程序返回HTTP/1.1 304 Not Modified

			
# curl -H "If-Modified-Since: Fri, 28 Feb 2014 20:04:18 GMT" -I http://localhost/modified.php
HTTP/1.1 304 Not Modified
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 05:21:31 GMT
Connection: keep-alive
Last-Modified: Fri, 28 Feb 2014 12:04:18 GMT
			
			

测试成功,并且在浏览器端也测试成功 HTTP/1.1 304 Not Modified

将modified.php伪静态处理

    location / {
        root   /www;
        index  index.html index.htm;
		rewrite ^/modified.html$ /modified.php last;
    }
			

测试

# curl -I http://localhost/modified.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 06:21:10 GMT
Content-Type: text/html
Connection: keep-alive
Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT

# curl -H "If-Modified-Since: Fri, 28 Feb 2014 12:04:18 GMT" -I http://localhost/modified.html
HTTP/1.1 304 Not Modified
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 06:21:22 GMT
Connection: keep-alive
Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT
			

达到预期效果

3. ETag / If-None-Match

上面的Last-Modified测试中发现ETag虽然不限制,但是暗中还是可用的:)

etag on; 开启Nginx etag支持,lighttpd 默认开启

		
server {
    listen       80;
    server_name phalcon;

    charset utf-8;

    access_log  /var/log/nginx/host.access.log  main;
	etag on;
    location / {
        root   /www/phalcon/public;
        index  index.html index.php;
    }
}
		
		

检查ETag输出

# curl -I http://localhost/index.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 03:08:28 GMT
Content-Type: text/html
Connection: keep-alive

# curl -I http://phalcon/img/css.png
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 27 Feb 2014 09:20:49 GMT
Content-Type: image/png
Content-Length: 1133
Last-Modified: Fri, 14 Feb 2014 08:05:03 GMT
Connection: keep-alive
ETag: "52fdce2f-46d"
Accept-Ranges: bytes3
		

即使你开启了 ETag Nginx 对 HTML、CSS文件也不做处理。最终在一个外国网站是找到一个nginx-static-etags模块,有兴趣自己尝试,这里就不讲了。

3.1. 静态文件

首先查询etag值

# curl -I http://phalcon/img/css.png
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 27 Feb 2014 09:25:41 GMT
Content-Type: image/png
Content-Length: 1133
Last-Modified: Fri, 14 Feb 2014 08:05:03 GMT
Connection: keep-alive
ETag: "52fdce2f-46d"
Accept-Ranges: bytes
			

然后向服务器发送If-None-Match HTTP头

# curl -H 'If-None-Match: "52fdce2f-46d"' -I http://phalcon/img/css.png
HTTP/1.1 304 Not Modified
Server: nginx
Date: Thu, 27 Feb 2014 09:25:44 GMT
Last-Modified: Fri, 14 Feb 2014 08:05:03 GMT
Connection: keep-alive
ETag: "52fdce2f-46d"
			

这次比较顺利,成功返回HTTP/1.1 304 Not Modified

3.2. 动态程序

默认情况输出如下

# curl -I http://192.168.6.9/index.php
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 27 Feb 2014 09:29:13 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
			

测试程序

			
<?php
header('Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT' );
header('Etag: "abcdefg"');
#header('Last-Modified: ' .gmdate('D, d M Y H:i:s') . ' GMT' );
?>
Hello
			
			

测试效果

# curl -I http://192.168.6.9/index.php
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 09:41:06 GMT
Content-Type: text/html
Connection: keep-alive
Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT
Etag: "abcdefg"

[root@centos6 ~]# curl -H 'If-None-Match: "abcdefg"' -I http://192.168.6.9/index.php
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 09:41:42 GMT
Content-Type: text/html
Connection: keep-alive
Last-Modified: Thu, 26 Feb 2014 08:39:35 GMT
Etag: "abcdefg"
			

测试情况与之前的Last-Modified结果一样

动态程序返回Etag真的就没有用了吗?

答案是:非也, 有一个方法可以让动态程序返回的 Etag 也能发挥作用,程序修改如下

			
<?php
$etag = md5('http://netkiller.github.io');
cache($etag);
function cache($etag)
{
        $http_if_none_match = null;
        if(array_key_exists ('HTTP_IF_NONE_MATCH',$_SERVER)){
                $http_if_none_match = $_SERVER['HTTP_IF_NONE_MATCH'];
        }

        if ($http_if_none_match == $etag)
        {
                header('Etag: '.$etag, true, 304);
                exit;
        } else {
                header('Etag: '.$etag);
        }

}
print_r($_SERVER);
echo date("Y-m-d H:i:s");
?>
			
			

首先查看Etag值

# curl  -I http://192.168.6.9/test.php
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 10:07:19 GMT
Content-Type: text/html
Connection: keep-alive
Etag: 7467675324d0f7a3e01ce5151848fedb
			

发送If-None-Match头

# curl -H 'If-None-Match: 7467675324d0f7a3e01ce5151848fedb' -I http://192.168.6.9/test.php
HTTP/1.1 304 Not Modified
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 10:07:39 GMT
Connection: keep-alive
Etag: 7467675324d0f7a3e01ce5151848fedb
			

达成预计效果,此种方法同样可以用于 Last-Modified,伪静态后效果更好

Etag 值的运算技巧,我习惯上采用URL同时配合伪静态例如

$etag = $_SERVER['REQUEST_URI']
			

URL类似 http://www.example.com/news/100/1000.html 一次请求便缓存页面,这样带来一个更新的问题,于是又做了这样的处理

http://www.example.com/news/100/1000.1.html
			

.1.是版本号,每次修改后+1操作,.1.没有人格意义rewrite操作是会丢弃这个参数,仅仅是为了始终有新的URL对应内容

4. Expires / Cache-Control

前面所讲 Last-Modified 与 Etag 主要用于分辨文件是否修改过, 无法控制页面在浏览器端缓存的时间。Expires / Cache-Control 可以控制缓存的时间段

Expires 是 HTTP/1.0标准,Cache-Control是 HTTP/1.1标准。都能正常工作,HTTP/1.1规范中max-age优先级高于Expires,有些浏览器会联动设置,例如你设置了Cache-Control随之自动生成Expires,仅仅为了兼容。

4.1. 静态文件

首先配置nginx设置html与png文件缓存1天

location ~ .*\.(html|png)$
{
    expires      1d;
}
			

当前情况

# curl -I http://192.168.6.9/index.html
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 10:47:08 GMT
Content-Type: text/html
Content-Length: 6
Last-Modified: Thu, 27 Feb 2014 07:29:50 GMT
Connection: keep-alive
Accept-Ranges: bytes
			

重启Nginx后的HTTP协议头多出Expires与Cache-Control

# curl -I http://192.168.6.9/index.html
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 10:42:09 GMT
Content-Type: text/html
Content-Length: 3698
Last-Modified: Fri, 26 Apr 2013 20:36:51 GMT
Connection: keep-alive
Expires: Fri, 28 Feb 2014 10:42:09 GMT
Cache-Control: max-age=86400
Accept-Ranges: bytes
			

4.2. 动态文件

默认返回

# curl -I http://192.168.6.9/index.php
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 11:45:05 GMT
Content-Type: text/html
Connection: keep-alive
			

index.php 增加 Cache-Control 输出控制

			
header('Cache-Control: max-age=259200');
			
			

再次查看

# curl -I http://192.168.6.9/index.php
HTTP/1.1 200 OK
Server: nginx/1.0.15
Date: Thu, 27 Feb 2014 11:53:48 GMT
Content-Type: text/html
Connection: keep-alive
Cache-Control: max-age=259200
			

现在使用 Chrome 、Firefox 测试,你会发现始终返回200,并且max-age=259200数值不会改变。

原因是Cache-Control程序输出的,Nginx并不知道,所以Nginx 不会给你返回304

			
header('Last-Modified: ' .gmdate('D, d M Y H:i:s') . ' GMT' );

$offset = 60 * 60 * 24;
header('Expires: ' . gmdate('D, d M Y H:i:s', time() + $offset) . ' GMT');

$ttl=3600;
header("Cache-Control: max-age=$ttl, must-revalidate");
			
			

这种方法不能实现缓存的目的

5. FastCGI 缓存相关

我们做个尝试将 expires 1d;加到location ~ \.php$中,看看能不能实现缓存的目的。

    location ~ \.php$ {
        root           /www;
        fastcgi_pass   127.0.0.1:9000;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME  /www$fastcgi_script_name;
        include        fastcgi_params;
		expires      1d;
    }
		

测试程序

		
# cat expires.php
<?php
echo date("Y-m-d H:i:s");
?>
		
		

测试结果

# curl -I http://localhost/expires.php
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 04:39:57 GMT
Content-Type: text/html
Connection: keep-alive
Expires: Sat, 01 Mar 2014 04:39:57 GMT
Cache-Control: max-age=86400
		

虽然推送 Cache-Control: max-age=86400 但是 IE Chrome Firefox 仍不能缓存页面

6. HTML META 与 Cache

创建一个测试文件如下

		
<html>
<head>
	<title>Hello</title>
	<meta http-equiv="Cache-Control" content="max-age=7200" />
	<meta http-equiv="expires" content="Fri, 28 Feb 2014 12:04:18 GMT" />
</head>
<body>
	Helloworld
</body>
</html>
		
		

测试HTML页面

		
# curl -i http://localhost/test.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 03:30:45 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive

<html>
<head>
	<title>Hello</title>
	<meta http-equiv="Cache-Control" content="max-age=7200" />
	<meta http-equiv="expires" content="Fri, 28 Feb 2014 12:04:18 GMT" />
</head>
<body>
	Helloworld
</body>
</html>
		
		

我们可以看到HTML页面中meta设置缓存对Nginx并不起作用, 很多人会说对浏览器起作用!

这次我测试了 IE11, Chrome, Firefox 发现都无法缓存页面,可能对IE5什么的还有用,我没有环境测试,因为10年前我们在B/S开发经常这样使用

		
<meta http-equiv="cache-control" content="max-age=0" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="expires" content="0" />
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" />
<meta http-equiv="pragma" content="no-cache" />
		
		

至少在当年IE是认这些Meta的,进入HTML5时代很多都发生了变化,所以不能一概而论

7. gzip

defalte 是 Apache httpd 的标准这里只谈gzip

首先创建一个 gzip.html

# curl -I http://localhost/gzip.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Mon, 03 Mar 2014 01:49:45 GMT
Content-Type: text/html
Content-Length: 19644
Last-Modified: Mon, 03 Mar 2014 01:49:02 GMT
Connection: keep-alive
ETag: "5313df8e-4cbc"
Accept-Ranges: bytes
		

开启 gzip on;

server {
    listen       80;
    server_name  localhost;

    #charset utf-8;
    #access_log  /var/log/nginx/log/host.access.log  main;
    #etag on;
    #ssi on;
    gzip on;
		

现在看看效果

# curl -I http://localhost/gzip.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Mon, 03 Mar 2014 01:51:56 GMT
Content-Type: text/html
Content-Length: 19644
Last-Modified: Mon, 03 Mar 2014 01:49:02 GMT
Connection: keep-alive
ETag: "5313df8e-4cbc"
Accept-Ranges: bytes
		

并没有什么不同,现在增加HTTP头Accept-Encoding:gzip,defalte看看

		
# curl -H Accept-Encoding:gzip,defalte  http://localhost/gzip.html
		
		

如果你能看到非文本内容(俗称乱码)就表示成功了。输入内容就是gzip压缩后二进制数据,我们使用gunzip可以解压缩

# curl -H Accept-Encoding:gzip,defalte  http://localhost/gzip.html | gunzip
		

如果能正常看到html输出,表示压缩无误。

7.1. gzip 总结

gzip on; 开启后默认支持 text/html 不能在 gzip_types 再次定义,否则会提示重复MIME类型

Starting nginx: nginx: [warn] duplicate MIME type "text/html" in /etc/nginx/conf.d/localhost.conf:16
			

高级配置参考

    gzip  on;
    gzip_http_version 1.0;
    gzip_types        text/plain text/xml text/css application/xml application/xhtml+xml application/rss+xml application/atom_xml application/javascript application/x-javascript application/json;
    gzip_disable      "MSIE [1-6]\.";
    gzip_disable      "Mozilla/4";
    gzip_comp_level   6;
    gzip_proxied      any;
    gzip_vary         on;
    gzip_buffers      4 8k;
    gzip_min_length   1000;		
			

8. 反向代理与缓存

反向代理服务器缓存方式分为:

强制缓存,指定文件,扩展名,URL设置缓存时间

遵循HTTP协议头标准进行缓存

默认配置,只进行代理,不进行缓存

server {
    listen       80;
    server_name  192.168.2.15;
    #access_log  /var/log/nginx/log/host.access.log  main;

	location / {
	  proxy_pass        http://localhost:80;
	  proxy_set_header  X-Real-IP  $remote_addr;
	}
}
 		

反向代理会产生两条日志(access_log 写入一个文件中,如果分开写,则会分开写入日志)

192.168.2.15 - - [28/Feb/2014:18:09:33 +0800] "HEAD /modified.html HTTP/1.1" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
127.0.0.1 - - [28/Feb/2014:18:09:33 +0800] "HEAD /modified.html HTTP/1.0" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
		

Last-Modified 与 ETag 会透传过去

# curl -H "If-Modified-Since: Fri, 28 Feb 2014 12:04:18 GMT" -I http://192.168.2.15/modified.html
HTTP/1.1 304 Not Modified
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 10:17:30 GMT
Connection: keep-alive
Last-Modified: Fri, 28 Feb 2014 10:04:18 GMT
		

我们可以看到两条日志都返回304

192.168.2.15 - - [28/Feb/2014:18:17:30 +0800] "HEAD /modified.html HTTP/1.1" 304 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
127.0.0.1 - - [28/Feb/2014:18:17:30 +0800] "HEAD /modified.html HTTP/1.0" 304 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
		

下面为反向代理增加缓存功能

proxy_temp_path   /tmp/proxy_temp_dir;
proxy_cache_path  /tmp/proxy_cache_dir  levels=1:2   keys_zone=nginx_cache:200m inactive=3d max_size=30g;

server {
    listen       80;
    server_name  192.168.2.15;

	location / {
		proxy_cache nginx_cache;
		proxy_cache_key $host$uri$is_args$args;
		proxy_set_header  X-Real-IP  $remote_addr;
		proxy_set_header  X-Forwarded-For  $proxy_add_x_forwarded_for;
		proxy_cache_valid 200 10m;
		proxy_pass        http://localhost;
	}

	location ~ .*\.(php|jsp|cgi)?$
	{
	     proxy_set_header Host  $host;
	     proxy_set_header X-Forwarded-For  $remote_addr;
	     proxy_pass http://backend_server;
	}
}
		

# curl  -I http://192.168.2.15/index.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 10:57:35 GMT
Content-Type: text/html
Content-Length: 12
Connection: keep-alive
Last-Modified: Fri, 28 Feb 2014 06:54:45 GMT
ETag: "531032b5-c"
Expires: Sat, 01 Mar 2014 10:57:35 GMT
Cache-Control: max-age=86400
Accept-Ranges: bytes

# curl  -I http://192.168.2.15/index.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 10:57:41 GMT
Content-Type: text/html
Content-Length: 12
Connection: keep-alive
Last-Modified: Fri, 28 Feb 2014 06:54:45 GMT
ETag: "531032b5-c"
Expires: Sat, 01 Mar 2014 10:57:35 GMT
Cache-Control: max-age=86400
Accept-Ranges: bytes

# curl  -I http://192.168.2.15/index.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Fri, 28 Feb 2014 10:57:46 GMT
Content-Type: text/html
Content-Length: 12
Connection: keep-alive
Last-Modified: Fri, 28 Feb 2014 06:54:45 GMT
ETag: "531032b5-c"
Expires: Sat, 01 Mar 2014 10:57:35 GMT
Cache-Control: max-age=86400
Accept-Ranges: bytes
		

上面共请求了3次服务器

192.168.2.15 - - [28/Feb/2014:18:57:35 +0800] "HEAD /index.html HTTP/1.1" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
127.0.0.1 - - [28/Feb/2014:18:57:35 +0800] "GET /index.html HTTP/1.0" 200 12 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "192.168.2.15"
192.168.2.15 - - [28/Feb/2014:18:57:41 +0800] "HEAD /index.html HTTP/1.1" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
192.168.2.15 - - [28/Feb/2014:18:57:46 +0800] "HEAD /index.html HTTP/1.1" 200 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" "-"
		

第一次连接192.168.2.15然后转发给127.0.0.1 返回 HTTP/1.1 200 OK

后面两次连接192.168.2.15没有转发给127.0.0.1 直接返回 HTTP/1.1 200 OK

查看缓存目录,我们可以看到生成的缓存文件

# find /tmp/proxy_*
/tmp/proxy_cache_dir
/tmp/proxy_cache_dir/1
/tmp/proxy_cache_dir/1/79
/tmp/proxy_cache_dir/1/79/b47a0009c531900de2a15ba80c0e3791
/tmp/proxy_temp_dir
		

8.1. gzip 处理

http://localhost/gzip.html 是支持压缩的,192.168.2.15 proxy_pass http://localhost

# curl -H Accept-Encoding:gzip,defalte  http://localhost/gzip.html			
			

运行后输出乱码

# curl -H Accept-Encoding:gzip,defalte  http://192.168.2.15/gzip.html
			

现在透过反向代理请求试试,你会发现gzip压缩无效,输出的是HTML,这是怎么回事呢?这是因为反向代理不清楚后面的服务器是否支持gzip,所以一律按照正常html请求。现在我们开启 gzip_vary on; 每次返回数据会携带Vary: Accept-Encoding 头。

	gzip  on;
	gzip_vary on;
			

reload nginx 后查看Vary: Accept-Encoding输出

# curl -I http://localhost/gzip.html
HTTP/1.1 200 OK
Server: nginx/1.4.5
Date: Mon, 03 Mar 2014 02:09:16 GMT
Content-Type: text/html
Content-Length: 19644
Last-Modified: Mon, 03 Mar 2014 01:49:02 GMT
Connection: keep-alive
Vary: Accept-Encoding
ETag: "5313df8e-4cbc"
Accept-Ranges: bytes
			

有 Vary: Accept-Encoding 头,现在再测试一次

			
# curl -H "Accept-Encoding: gzip" http://192.168.2.15/gzip.html
<html>
<head>
	<title>Hello</title>
			
			

测试失败,并没有出现预期效果,于是到网站找答案,中文与英文资料都看个遍,没有解决.

最后只能让反向代理取到数据后再压缩一次,配置开启 gzip on;

proxy_temp_path   /tmp/proxy_temp_dir;
proxy_cache_path  /tmp/proxy_cache_dir  levels=1:2   keys_zone=nginx_cache:200m inactive=3d max_size=30g;

server {
    listen       80;
    server_name  192.168.2.15;

	gzip on;
	
	location / {
		proxy_set_header X-Real-IP  $remote_addr;
		proxy_set_header X-Forwarded-For  $proxy_add_x_forwarded_for; 
		# proxy_set_header Accept-Encoding "gzip"; 没有任何效果
		proxy_pass       http://localhost;
	}
}
			

Nginx 反向代理作为代理绰绰有余,如果做缓存服务器,还是使用squid, varnish吧。

9. 特殊数据缓存

缓存并非只能缓存静态内容,HTML,CSS,JS以及图片意外的数据一样可以缓存。

只要处理好HTTP头即可。例如Ajax动态内容缓存,JSON数据缓存。

9.1. json

当用户请求json地址时,我们将 json 数据附加HTTP头(Cache-Control, Expires, ETag),然后返回给用户,用户的设备会遵循HTTP的声明,进行缓存操作。

curl -I http://api.example.com/article/json/2/20/0.html
HTTP/1.1 200 OK
Expires: Wed, 26 Aug 2015 05:40:57 GMT
Date: Wed, 26 Aug 2015 05:39:57 GMT
Server: nginx
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Cache-Control: max-age=60
ETag: 4238111283
Age: 69475
X-Via: 1.1 kaifeng45:3 (Cdn Cache Server V2.0)
Connection: keep-alive
			

注意这里使用了伪静态 /article/json/2/20/0.html 伪静态与缓存没有关系,实际起作用的是HTTP头。

我们可以看到 Content-Type: application/json; charset=utf-8 声明,表明这是json数据,而不是HTML。

现在我们来演示一下JSON被缓存的效果,首先要说明 http://api.example.com/article/json/2/20/0.html 不是0.html文件,而是采用phalcon框架开发的一个程序,article是控制器类名称,json是jsonAction方法, 2/20/0 是传递给jsonAction的参数。

$ curl -I http://api.example.com/article/json/2/20/0.html
HTTP/1.1 200 OK
Expires: Thu, 27 Aug 2015 05:24:21 GMT
Date: Thu, 27 Aug 2015 05:23:21 GMT
Server: nginx/1.5.7
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Cache-Control: max-age=60
ETag: 558918903
Age: 1
X-Via: 1.1 kaifeng45:3 (Cdn Cache Server V2.0)
Connection: keep-alive
			

上面第一次请求数据将被缓存。我们第二次请求推送 HTTP 头 If-None-Match。

$ curl -H 'If-None-Match: 558918903' -I http://api.example.com/article/json/2/20/0.html
HTTP/1.0 304 Not Modified
Date: Thu, 27 Aug 2015 05:23:22 GMT
Content-Type: application/json; charset=utf-8
Expires: Thu, 27 Aug 2015 05:24:22 GMT
ETag: 558918903
Cache-Control: max-age=60
Age: 15
X-Via: 1.0 kaifeng45:3 (Cdn Cache Server V2.0)
Connection: keep-alive			
			

数据被缓存并返回结果 HTTP/1.0 304 Not Modified,304代码是告诉用户端该页面或者数据没有变动,无需要再次下载数据。

9.2. XML

这里是指动态生成的XML,处理方式与 JSON一样,XML数据附加HTTP头(Cache-Control, Expires, ETag)后返回给用户。

10. 总结

经过详细的测试我们发现不同的浏览器,不同的Web服务器,甚至每个版本都有所差异。

测试总结 Apache HTTPD 最完善 Lighttpd 其次, Nignx仍在快速发展中,Nignx每个版本差异很大,对HTTP协议实现标准也不太严谨,因为Nignx在大陆是趋势,所以下面给出的例子都是nginx

我比较看好Lighttpd,FastCGI 部分我一般是用php-fpm替代Lighttpd的spawn-fcgi

切记使用Nginx要注意每个本版细微变化,否则升级后会有影响。我习惯使用yum 安装 nginx 随时 yum update 升级。

另外FastCGI 与 mod_php也有所区别

延伸阅读《 Netkiller Web 手札》http://netkiller.github.io/www/index.html