热点

[翻译中] Yahoo 开源:open_nsfw 介绍

字号+ 作者:鼎足而立网 来源:探索 2024-04-29 04:58:51 我要评论(0)

This repo contains code for running Not Suitable for Work (NSFW) classification deep neural network

[翻译中] Yahoo 开源:open_nsfw 介绍

This 翻译repo contains code for running Not Suitable for Work (NSFW) classification deep neural network Caffe models. Please refer our blog post which describes this work and experiments in more detail.
该仓库包含运行NSFW类的深度神经网络Caffe模型代码。想要了解有关该模型的开源更多作品和实验细节,请参阅我们的介绍博客。

Not suitable for work classifier

NSFW分级器

Detecting offensive / adult images is 翻译an important problem which researchers have tackled for decades. With the evolution of computer vision and deep learning the algorithms have matured and we are now able to classify an image as not suitable for work with greater precision.
监测黄色或暴力图片是研发人员解决了几十年的重要问题。随着计算机图像和深度机器学习的开源发展,算法逐渐成熟,介绍我们也能更加精准地识别出黄色或暴力图片。翻译

Defining NSFW material is 开源subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another. For this reason, the model we describe below focuses only on one type of NSFW content: pornographic images. The identification of NSFW sketches, cartoons, text, images of graphic violence, or other types of unsuitable content is not addressed with this model.
定义NSFW的分级是一种主观判断,要识别这些黄色或暴力图片也很麻烦。介绍此外,翻译在某些环境下,开源存在一定异议的介绍东西,在另一个环境下可能是翻译合理的。正因如此,开源我们下面描述的介绍模型仅涉及NSFW内容中的一种:色情图片。

Since images and user generated content dominate the internet today, filtering nudity and other not suitable for work images becomes an important problem. In this repository we opensource a Caffe deep neural network for preliminary filtering of NSFW images.
由于图片和UGC内容主导着当今的互联网,过滤裸露和其他NSFW图片成为至关重要的问题。本仓库将开源一套基于深度神经网络Caffe模型用来初步过滤NSFW图片。


Demo Image

Usage

使用方法

  • The network takes in an image and gives output a probability (score between 0-1) which can be used to filter not suitable for work images. Scores < 0.2 indicate that the image is likely to be safe with high probability. Scores > 0.8 indicate that the image is highly probable to be NSFW. Scores in middle range may be binned for different NSFW levels.
  • 在神经网络模型输入一张图片,将输出一个用于过滤NSFW图片的分值(介于0-1之间)。若分值 < 0.2,意味着该图片很可能是正常的;分值 > 0.8 意味着该图片极有可能属于NSFW图片;介于 0.2-0.8 之间的图片一般要按业务需要分类进行处理。
  • Depending on the dataset, usecase and types of images, we advise developers to choose suitable thresholds. Due to difficult nature of problem, there will be errors, which depend on use-cases / definition / tolerance of NSFW. Ideally developers should create an evaluation set according to the definition of what is safe for their application, then fit a ROC curve to choose a suitable threshold if they are using the model as it is.
  • 我们建议开发人员根据数据集、用例和图片类型选择适当的阈值。由于实际问题的不同,依赖于用例、NSFW 的定义、NSFW的容忍程度等影响因素,会产生一些可能的判断错误。理想状态下,开发人员应当根据其应用的安全性定义创建一个评估数据集,然后在正确使用该模型的条件下,拟合一个ROC曲线来选择一个合适的阈值范围。
  • Results can be improved by fine-tuningthe model for your dataset/ uscase / definition of NSFW. We do not provide any guarantees of accuracy of results. Please read the disclaimer below.
  • 通过微调你模型中的数据集、用例和NSFW的定义范围,能不断优化模型结果。我们对于模型结果的准确性不做任何保证。请阅读下面的免责声明。
  • Using human moderation for edge cases in combination with the machine learned solution will help improve performance.
  • 在机器学习解决方案的基础上,结合边缘情况进行人工调整,有助于优化模型性能。

Description of model

模型说明

We trained the model on the dataset with NSFW images as positive and SFW(suitable for work) images as negative. These images were editorially labelled. We cannot release the dataset or other details due to the nature of the data.
我们用包含NSFW图像和SFW图像的数据集来训练和调教模型。这些图片都被打上了标签,将NSFW作为阳性结果、SFW作为阴性结果。由于这部分数据集的特殊性质,我们无法公开他们的任何细节信息。

We use CaffeOnSpark which is a wonderful framework for distributed learning that brings deep learning to Hadoop and Spark clusters for training models for our experiments. Big thanks to the CaffeOnSpark team!
在我们的实验中,我们利用了一个非常棒的分布式学习框架 CaffeOnSpark,将深度学习应用到Hadoop和Spark集群中训练模型。特别感谢CaffeOnSpark团队!

The deep model was first pretrained on ImageNet 1000 class dataset. Then we finetuned the weights on the NSFW dataset.We used the thin resnet 50 1by2 architecture as the pretrained network. The model was generated using pynetbuilder tool and replicates the residual network paper's 50 layer network (with half number of filters in each layer). You can find more details on how the model was generated and trained here
我们首先利用ImageNet1000多个数据集对模型进行了预训练。在此基础上对NSFW数据集的比重进行了微调。利用少量的resnet 50 1by2框架作为预训练网站。模型工具由pynetbuilder和50层残留网站的副本(每层中包含一半的过滤器)生成。想要查看更多关于生成和训练模型的信息,请点击这里。

Please note that deeper networks, or networks with more filters can improve accuracy. We train the model using a thin residual network architecture, since it provides good tradeoff in terms of accuracy, and the model is light-weight in terms of runtime (or flops) and memory (or number of parameters).
注意,网络层次越深,或用更多的过滤器,将会提升模型的准确性。由于苛刻的网络架构提供了良好的精度,并且在运行时间(或浮点运算)和内存(或大量参数)上非常的轻量,我们用了这套苛刻的网络架构来训练模型。

Docker Quickstart

Docker 快速入门

This Docker quickstart guide can be used for evaluating the model quickly with minimal dependency installation.
Install Docker Engine:

  • Windows Installation
  • Mac OSX Installation
  • Ubuntu Installation
    快速入门手册能帮助你在最小依赖安装的条件下快速评估模型。
    安装Docker引擎:
  • Windows Installation
  • Mac OSX Installation
  • Ubuntu Installation

Build a caffe docker image (CPU)
编译一个 caffe docker镜像(CPU)

Check the caffe installation
检查Caffe是否已安装

Run the docker image with a volume mapped to your * open_nsfw * repository. Your * test_image.jpg * should be located in this same directory.
运行与open_nsfw库对应的docker镜像。注意* test_image.jpg *应处于同一个文件目录下。

We will get the NSFW score returned:
我们将得到返回的NSFW分值:

Running the model

如何运行模型

To run this model, please install Caffe and its python extension and make sure pycaffe is available in your PYTHONPATH.
运行该模型,请安装Caffe和python扩展组件,并确保pycaffe在你的环境变量PYTHONPATH下是可用的。

We can use the classify.py script to run the NSFW model. For convenience, we have provided the script in this repo as well, and it prints the NSFW score.
利用classify.py脚本可运行NSFW模型。为了方便,我们在仓库中已经提供了这个脚本,用它能输出NSFW的分值。

Disclaimer

免责声明

The definition of NSFW is subjective and contextual. This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images. We do not provide guarantees of accuracy of output, rather we make this available for developers to explore and enhance as an open source project. Results can be improved by fine-tuning the model for your dataset.
由于对于“限制级”内容的定义与主观判断和所处上下文有关,该模型仅作为常规参考模型来初步过滤色情图片。我们不保证任何输出结果的准确性,仅作为广大开发者探索和学习的开源项目。通过微调您模型中的数据集可以优化模型结果。

授权

代码基于 BSD 2 clause license 的许可,详情见链接的授权文件。

Contact

联系方式

The model was trained by [Jay Mahadeokar] (https://github.com/jay-mahadeokar/), in collaboration with Sachin Farfade , Amar Ramesh Kamat, Armin Kappeler and others. Special thanks to Gerry Pesavento for taking the initiative for open-sourcing this model. If you have any queries, please raise an issue and we will get back ASAP.
该模型由Jay Mahadeokar,Sachin Farfade , Amar Ramesh Kamat,Armin Kappeler 等人合作训练。特别鸣谢Gerry Pesavento带头倡议开源了该模型。若有任何问题,我们将尽快给您答复。

1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。

相关文章
  • 【医者心声】行医路上 勿忘初心

    【医者心声】行医路上 勿忘初心

    2024-04-29 04:45

  • iphoneQQ主题/透明皮肤制作图文教程 ios7完美兼容

    iphoneQQ主题/透明皮肤制作图文教程 ios7完美兼容

    2024-04-29 03:27

  • 如何给自己的酷我音乐电脑播放器更换好看的皮肤?

    如何给自己的酷我音乐电脑播放器更换好看的皮肤?

    2024-04-29 03:23

  • 为什么lol无法连接服务器(网络正常lol无法连接服务器)

    为什么lol无法连接服务器(网络正常lol无法连接服务器)

    2024-04-29 02:46

网友点评