...Lab/ArtificialIntelligent/DeepFake at master · hiekay/Bit...-免疫在线蚂蚁淘旗下平台-

当前位置：首页 > 新闻动态 >

热卖商品

新闻详情

...Lab/ArtificialIntelligent/DeepFake at master · hiekay/Bit...

来自 : github 发布时间：2021-03-25

网站：www.BitTiger.io原文：github.com/Fabsqrt/BitTigerLab邮箱：Qinyuan@BitTiger.io

观看视频

$\"\"$

简介

什么是DeepFake？它能够在视频中把一个人的脸变成另一个人的脸。

$\"giphy\"$

(来源: Family fun with deepfakes. Or how I got my wife onto the Tonight Show)

你能看出来这是一个合成的视频吗？

$\"\"$

如果你是第一次听说DeepFake，一定要点击上面的视频，亲自感受一下尼古拉斯的脸是如何占据全世界的每一个影片。

$\"\"$

让我们再来看一个歌唱版，仔细感受人物的表情和声音的同步，以及和原始视频的相似与差异。

项目实战

我们要如何实现视频里的变脸呢？

因为视频是连续的图片，那么我们只需要把每一张图片中的脸切换了，就能得到变脸的新视频了。那么如何切换一个视频中的图片呢？这需要我们首先找到视频中的脸，然后把脸进行切换。我们会发现，变脸这个难题可以拆解成如下的流程。

$\"Flow\"$

于是，在我们会在后续按照这五个步骤进行介绍。

视频转图像

FFmpeg

FFmpeg提供了处理音频、视频、字幕和相关源数据的工具库。

核心的库包括：

libavcodec提供了处理编码的能力libavformat实现了流协议、容器类型、基本的I/O访问libavutil包括哈希、解压缩等多样的功能libavfilter提供了链式修改音频和视频的能力libavdevice提供了对设备访问的抽象libswresample实现了混音等能力libswscale实现了颜色和尺度变换的能力

对外主要提供了三个工具

ffmpeg用来处理多媒体内容ffplay是一个极简的播放器ffprobe是多媒体内容的分析工具

于是，我们的视频转图片的功能，可以通过以下命令来实现，

ffmpeg -i clipname -vf fps=framerate -qscale:v 2 \"imagename%04d.jpg\"

具体来说，上面的指令可以把一个视频，按照固定的频率生成图片。

人脸定位

基本算法

人脸定位是一个相对成熟的领域，主要应用dlib库的相关功能。我们虽然可以定制一个人脸识别的算法，但是我们也可以使用已有的通用的人脸识别的函数库。

有两类算法，一类是HOG的脸部标记算法。

$\"Facial\"$

（来源: Facial landmarks with dlib, OpenCV, and Python）

该算法的效果如上图。它将人脸分成了如下的区域：

眼睛（左/右）眉毛（左/右）

基于这些标记，我们不仅能够进行后续的换脸，也能检测脸的具体形态，眨眼状态等。例如，我们可以把这些点连在一起，得到更多的特征。

$\"Facial\"$

（来源: Real-Time Face Pose Estimation）

寻找脸部标记是一个预测问题，输入是一张图片和兴趣区域，输出是兴趣区域的关键点。

HOG是如何找到人脸的呢？这是一个通用的检测算法：

从数据集中找到正样本，并且计算HOG描述从数据集中找到负样本，并且计算HOG描述基于HOG的描述使用分类算法在负样本上在不同的起点和尺度进行分类，并且找到误判的HOG基于上一步的负样本，对模型进行重新的训练

这里有个问题，如何计算HOG的描述呢？我们可以计算每个点的亮度，然后把每个点表示为指向更黑的方向的向量。如下图所示：

$\"face1\"$

（来源: Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning）

$\"face2\"$

（来源: Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning）

我们为什么要这么做呢？因为每个点的绝对值会受到环境的影响，但是相对值则比较稳定。因此，我们通过梯度变化的表示，能够准备出高质量的数据。当然，我们也可以进一步的把相邻的点聚合在一起，从而产生更有代表性的数据。

现在可以进行检测了

首先在新的图片上基于不同的起点和尺度寻找可行的区间基于非极大抑制的方法来减少冗余和重复的，下图就是一个有冗余和去除冗余的情况，这个方法说白了就是找一个最大概率的矩阵去覆盖掉和它过于重合的矩阵，并且不断重复这个过程

$\"Facial\"$

（来源: Histogram of Oriented Gradients and Object Detection）

有了轮廓之后，我们可以找到脸部标记。寻找脸部标记的算法是基于《One Millisecond Face Alignment with an Ensemble of Regression Trees》的论文。简单来说，它利用了已经标记好的训练集来训练一个回归树的组合，从而用来预测。

$\"Facedot\"$

（来源: One Millisecond Face Alignment with an Ensemble of Regression Trees）

在这个基础上，就能够标记出这68个点。

$\"Facedot\"$

（来源: Facial landmarks with dlib, OpenCV, and Python）

基于人脸的68个标记的坐标，可以计算人脸的角度，从而抠出摆正后的人脸。但是dlib要求识别的必须是全脸，因此会减少我们的样本集以及一些特定的样本场景。同时，因为人脸是64*64像素的尺寸，因此也要处理清晰度的问题。

另一种方法是用CNN训练一个识别脸部的模型。CNN能够检测更多的角度，但是需要更多的资源，并且可能在大文件上失效。

数据准备

我们的目标是把原始人脸转换为目标人脸，因此我们需要收集原始人脸的图片和目标人脸的图片。如果你选择的是一个名人，那么可以直接用Google image得到你想要的图片。虽然视频中的图片也能用，但是也可以收集一些多样的数据。当然，我用的是我和我老婆的图片，因此直接从我们的Photo中导出即可。当人脸数据生成后，最好仔细检查一下，避免不应该的脸或者其它的东东出现在你的训练集中。

extract.py

Deepfake用于定位人脸的算法如下：

import cv2 # 开源的计算机视觉库from pathlib import Path # 提供面向对象方式的文件访问from tqdm import tqdm # 提供进度条显示功能import os # 提供操作系统相关的访问import numpy as np # 提供科学计算相关的功能from lib.cli import DirectoryProcessor, rotate_image # 处理一个目录的文件，然后保存到新的目录中；旋转图片，其实是在utils中from lib.utils import get_folder # 获得一个folder，不存在则创建from lib.multithreading import pool_process # 多进程并发计算from lib.detect_blur import is_blurry # 判断图片是否模糊from plugins.PluginLoader import PluginLoader # 加载对应的算法class ExtractTrainingData(DirectoryProcessor): # 从训练集提取头像 def create_parser(self, subparser, command, description): self.optional_arguments = self.get_optional_arguments() self.parser = subparser.add_parser( command, help=\"Extract the faces from a pictures.\", description=description, epilog=\"Questions and feedback: \\ https://github.com/deepfakes/faceswap-playground\" @staticmethod def get_optional_arguments(): # 提取器的参数 \'\'\' Put the arguments in a list so that they are accessible from both argparse and gui \'\'\' argument_list = [] argument_list.append({ \"opts\": (\'-D\', \'--detector\'), \"type\": str, \"choices\": (\"hog\", \"cnn\", \"all\"), #选择hog或者cnn \"default\": \"hog\", \"help\": \"Detector to use. \'cnn\' detects much more angles but will be much more resource intensive and may fail on large files.\"}) # cnn的能够检测更多的角度，但是需要更多的资源，并且可能在大文件上失效 argument_list.append({ \"opts\": (\'-l\', \'--ref_threshold\'), \"type\": float, \"dest\": \"ref_threshold\", \"default\": 0.6, \"help\": \"Threshold for positive face recognition\"}) # 选择脸的阈值 argument_list.append({ \"opts\": (\'-n\', \'--nfilter\'), \"type\": str, \"dest\": \"nfilter\", \"nargs\": \'+\', \"default\": \"nfilter.jpg\", \"help\": \"Reference image for the persons you do not want to process. Should be a front portrait\"}) # 想要过滤掉的脸 argument_list.append({ \"opts\": (\'-f\', \'--filter\'), \"type\": str, \"dest\": \"filter\", \"nargs\": \'+\', \"default\": \"filter.jpg\", \"help\": \"Reference image for the person you want to process. Should be a front portrait\"}) # 想要处理的脸 argument_list.append({ \"opts\": (\'-j\', \'--processes\'), \"type\": int, \"default\": 1, \"help\": \"Number of processes to use.\"}) # 使用的进程数 argument_list.append({ \"opts\": (\'-s\', \'--skip-existing\'), \"action\": \'store_true\', \"dest\": \'skip_existing\', \"default\": False, \"help\": \"Skips frames already extracted.\"}) argument_list.append({ \"opts\": (\'-dl\', \'--debug-landmarks\'), \"action\": \"store_true\", \"dest\": \"debug_landmarks\", \"default\": False, \"help\": \"Draw landmarks for debug.\"}) # 是否画出脸部标记 argument_list.append({ \"opts\": (\'-r\', \'--rotate-images\'), \"type\": str, \"dest\": \"rotate_images\", \"default\": None, \"help\": \"If a face isn\'t found, rotate the images to try to find a face. Can find more faces at the \" \"cost of extraction speed. Pass in a single number to use increments of that size up to 360, \" \"or pass in a list of numbers to enumerate exactly what angles to check.\"}) # 旋转脸的角度 argument_list.append({ \"opts\": (\'-ae\', \'--align-eyes\'), \"action\": \"store_true\", \"dest\": \"align_eyes\", \"default\": False, \"help\": \"Perform extra alignment to ensure left/right eyes lie at the same height\"}) # 是否调齐眼睛的高度 argument_list.append({ \"opts\": (\'-bt\', \'--blur-threshold\'), \"type\": int, \"dest\": \"blur_thresh\", \"default\": None, \"help\": \"Automatically discard images blurrier than the specified threshold. Discarded images are moved into a \\\"blurry\\\" sub-folder. Lower values allow more blur\"}) // 自动移除模糊的图片 return argument_list def process(self): extractor_name = \"Align\" # 对应的是Extract_Align.py self.extractor = PluginLoader.get_extractor(extractor_name)() processes = self.arguments.processes try: if processes != 1: # 多进程处理图片 files = list(self.read_directory()) for filename, faces in tqdm(pool_process(self.processFiles, files, processes=processes), total = len(files)): self.num_faces_detected += 1 self.faces_detected[os.path.basename(filename)] = faces else: # 单进程处理图片 for filename in tqdm(self.read_directory()): try: image = cv2.imread(filename) self.faces_detected[os.path.basename(filename)] = self.handleImage(image, filename) except Exception as e: if self.arguments.verbose: print(\'Failed to extract from image: {}. Reason: {}\'.format(filename, e)) pass finally: self.write_alignments() def processFiles(self, filename): # 处理一个单独的图片的函数 try: image = cv2.imread(filename) return filename, self.handleImage(image, filename) except Exception as e: if self.arguments.verbose: print(\'Failed to extract from image: {}. Reason: {}\'.format(filename, e)) pass return filename, [] def getRotatedImageFaces(self, image, angle): # 得到固定角度旋转后的图片的人脸 rotated_image = rotate_image(image, angle) faces = self.get_faces(rotated_image, rotation=angle) rotated_faces = [(idx, face) for idx, face in faces] return rotated_faces, rotated_image def imageRotator(self, image): # 得到一系列旋转后的人脸 \'\'\' rotates the image through rotation_angles to try to find a face \'\'\' for angle in self.rotation_angles: rotated_faces, rotated_image = self.getRotatedImageFaces(image, angle) if len(rotated_faces) 0: if self.arguments.verbose: print(\'found face(s) by rotating image {} degrees\'.format(angle)) break return rotated_faces, rotated_image def handleImage(self, image, filename): faces = self.get_faces(image) process_faces = [(idx, face) for idx, face in faces] # 没有找到人脸，尝试旋转图片 if self.rotation_angles is not None and len(process_faces) == 0: process_faces, image = self.imageRotator(image) rvals = [] for idx, face in process_faces: # 画出人脸的标记 if self.arguments.debug_landmarks: for (x, y) in face.landmarksAsXY(): cv2.circle(image, (x, y), 2, (0, 0, 255), -1) resized_image, t_mat = self.extractor.extract(image, face, 256, self.arguments.align_eyes) output_file = get_folder(self.output_dir) / Path(filename).stem # 检测图片是否模糊 if self.arguments.blur_thresh is not None: aligned_landmarks = self.extractor.transform_points(face.landmarksAsXY(), t_mat, 256, 48) feature_mask = self.extractor.get_feature_mask(aligned_landmarks / 256, 256, 48) feature_mask = cv2.blur(feature_mask, (10, 10)) isolated_face = cv2.multiply(feature_mask, resized_image.astype(float)).astype(np.uint8) blurry, focus_measure = is_blurry(isolated_face, self.arguments.blur_thresh) # print(\"{} focus measure: {}\".format(Path(filename).stem, focus_measure)) # cv2.imshow(\"Isolated Face\", isolated_face) # cv2.waitKey(0) # cv2.destroyAllWindows() if blurry: print(\"{}\'s focus measure of {} was below the blur threshold, moving to \\\"blurry\\\"\".format(Path(filename).stem, focus_measure)) output_file = get_folder(Path(self.output_dir) / Path(\"blurry\")) / Path(filename).stem cv2.imwrite(\'{}_{}{}\'.format(str(output_file), str(idx), Path(filename).suffix), resized_image) # 生成新图片 f = { \"r\": face.r, \"x\": face.x, \"w\": face.w, \"y\": face.y, \"h\": face.h, \"landmarksXY\": face.landmarksAsXY() rvals.append(f) return rvals

注意，基于特征标记的算法对于倾斜的脸效果不好，也可以引入CNN。

人脸转换

人脸转换的基本原理是什么？假设让你盯着一个人的视频连续看上100个小时，接着又给你看一眼另外一个人的照片，接着让你凭着记忆画出来刚才的照片，你一定画的会很像第一个人的。

我们使用的模型是Autoencoder。有趣的是，这个模型所做的是基于原始的图片再次生成原始的图片。Autoencoder的编码器把图片进行压缩，而解码器把图片进行还原，一个示例如下图：

$\"Autoencoder\"$

（来源: Building Autoencoders in Keras）

在这个基础上，即使我们输入的是另外一个人脸，也会被Autoencoder编码成为一个类似原来的脸。

为了提升我们最终的效果，我们还需要把人脸共性相关的属性和人脸特性相关的属性进行学习。因此，我们对所有的脸都用一个统一的编码器，这个编码器的目的是学习人脸共性的地方；然后，我们对每个脸有一个单独的解码器，这个解码器是为了学习人脸个性的地方。这样当你用B的脸通过编码器，再使用A的解码器的话，你会得到一个与B的表情一致，但是A的脸。

这个过程用公式表示如下：

X\' = Decoder(Encoder(Shuffle(X)))Loss = L1Loss(X\'-X)A\' = Decoder_A(Encoder(Shuffle(A)))Loss_A = L1Loss(A\'-A)B\' = Decoder_B(Encoder(Shuffle(B)))Loss_B = L1Loss(B\'-B)

本文链接： http://bitkay.immuno-online.com/view-730818.html

发布于： 2021-03-25 阅读（0）

没有了