Home | 简体中文 | 繁体中文 | 杂文 | Github | 知乎专栏 | 51CTO学院 | CSDN程序员研修院 | OSChina 博客 | 腾讯云社区 | 阿里云栖社区 | Facebook | Linkedin | Youtube | 打赏(Donations) | About
知乎专栏多维度架构

20.6. wordcloud

https://github.com/amueller/word_cloud

		
pip install wordcloud		
		
	

演示

		
import wordcloud

w = wordcloud.WordCloud()
w.generate(
    "Netkiller Neo Linux Nginx SSH Ubuntu CentOS MySQL PostgreSQL Java Python")
w.to_file("wordcloud.png")		
		
	

更多演示代码 https://amueller.github.io/word_cloud/auto_examples/index.html#example-gallery

20.6.1. wordcloud_cli

		
neo@MacBook-Pro-Neo ~ % wordcloud_cli -h
usage: wordcloud_cli [-h] [--text file] [--regexp regexp] [--stopwords file] [--imagefile file]
                     [--fontfile path] [--mask file] [--colormask file] [--contour_width width]
                     [--contour_color color] [--relative_scaling rs] [--margin width]
                     [--width width] [--height height] [--color color] [--background color]
                     [--no_collocations] [--include_numbers] [--min_word_length min_word_length]
                     [--prefer_horizontal ratio] [--scale scale] [--colormap map] [--mode mode]
                     [--max_words N] [--min_font_size size] [--max_font_size size]
                     [--font_step step] [--random_state seed] [--no_normalize_plurals] [--repeat]
                     [--version]

A simple command line interface for wordcloud module.

optional arguments:
  -h, --help            show this help message and exit
  --text file           specify file of words to build the word cloud (default: stdin)
  --regexp regexp       override the regular expression defining what constitutes a word
  --stopwords file      specify file of stopwords (containing one word per line) to remove from
                        the given text after parsing
  --imagefile file      file the completed PNG image should be written to (default: stdout)
  --fontfile path       path to font file you wish to use (default: DroidSansMono)
  --mask file           mask to use for the image form
  --colormask file      color mask to use for image coloring
  --contour_width width
                        if greater than 0, draw mask contour (default: 0)
  --contour_color color
                        use given color as mask contour color - accepts any value from
                        PIL.ImageColor.getcolor
  --relative_scaling rs
                        scaling of words by frequency (0 - 1)
  --margin width        spacing to leave around words
  --width width         define output image width
  --height height       define output image height
  --color color         use given color as coloring for the image - accepts any value from
                        PIL.ImageColor.getcolor
  --background color    use given color as background color for the image - accepts any value from
                        PIL.ImageColor.getcolor
  --no_collocations     do not add collocations (bigrams) to word cloud (default: add unigrams and
                        bigrams)
  --include_numbers     include numbers in wordcloud?
  --min_word_length min_word_length
                        only include words with more than X letters
  --prefer_horizontal ratio
                        ratio of times to try horizontal fitting as opposed to vertical
  --scale scale         scaling between computation and drawing
  --colormap map        matplotlib colormap name
  --mode mode           use RGB or RGBA for transparent background
  --max_words N         maximum number of words
  --min_font_size size  smallest font size to use
  --max_font_size size  maximum font size for the largest word
  --font_step step      step size for the font
  --random_state seed   random seed
  --no_normalize_plurals
                        whether to remove trailing 's' from words
  --repeat              whether to repeat words and phrases
  --version             show program's version number and exit		
		
		

20.6.2. WordCloud 对象配置参数

			
w = wordcloud.WordCloud(<参数>)

参数	描述
width		指定词云对象生成图片的宽度,默认400像素
height		指定词云对象生成图片的高度,默认200像素
min_font_size	指定词云中字体的最小字号,默认4号
max_font_size	指定词云中字体的最大字号,根据高度自动调节
font_step	指定词云中字体字号的步进间隔,默认为1
font_path	指定字体文件的路径,默认None
max_words	指定词云显示的最大单词数量,默认200
stop_words	指定词云的排除词列表,即不显示的单词列表
mask		指定词云形状,默认为长方形,需要引用imread()函数
background_color	指定词云图片的背景颜色,默认为黑色			
			
		
			
import wordcloud

w = wordcloud.WordCloud(background_color="white")
w.generate(
    "Netkiller Neo Linux Nginx SSH Ubuntu CentOS MySQL PostgreSQL Java Python")
w.to_file("wordcloud.png")
			
			
		

20.6.3. 与分词共用

			
# encoding=utf-8
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import jieba
import jieba.analyse

# background = plt.imread('background.jpg')  # 遮罩图
content = open('text.txt', 'r').read()  # 生成词云的文档

# seg_list = jieba.lcut(f)  # 默认是精确模式
tags = jieba.analyse.extract_tags(content, topK=50)
text = ", ".join(list(tags))
print(text)

wordcloud = WordCloud(
    background_color='white',  # 背景颜色,根据图片背景设置,默认为黑色
    # mask = background, #笼罩图
    font_path='/Library/Fonts/AdobeSongStd-Light.otf',  # 若有中文需要设置才会显示中文
    width=1024,  # 宽度
    height=768,  # 高度
    margin=5  # 边缘空白
).generate(text)

plt.imshow(wordcloud)
plt.axis('off')
plt.show()
plt.close()

# 保存图片
wordcloud.to_file('wordcloud.jpg')
			
			
		

20.6.4. 遮罩图

		
# encoding=utf-8
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from wordcloud import WordCloud
import jieba
import jieba.analyse

content = open('text.txt', 'r').read()
tags = jieba.analyse.extract_tags(content, topK=50)
text = ", ".join(list(tags))
# print(text)

# mask = np.array(Image.open("stormtrooper_mask.png"))
mask = np.array(Image.open("background.png"))

wordcloud = WordCloud(
    background_color='white',
    font_path='/Library/Fonts/AdobeSongStd-Light.otf',
    mask=mask,  # 遮罩图
    width=1024,  # 宽度
    height=768,  # 高度
    margin=5  # 边缘空白
).generate(text)
# 保存图片
wordcloud.to_file('wordcloud.jpg')

plt.imshow(wordcloud)
plt.axis('off')
plt.show()
plt.close()