Home Ciencia y Tecnología Optimización de gráficos TensorFlow con Grappler

Optimización de gráficos TensorFlow con Grappler

47
0

Descripción common del contenido

  • Descripción common
  • Optimizadores de gráficos disponibles
  • Configuración
  • Evaluate el rendimiento de la ejecución con y sin luchador
  • Optimizador plegable constante
  • Optimizador de stripper de depuración
  • Resumen

Descripción common

TensorFlow utiliza ejecuciones gráficas y ansiosas para ejecutar cálculos. A tf.Graph contiene un conjunto de tf.Operation objetos (OPS) que representan unidades de cálculo y tf.Tensor Objetos que representan las unidades de datos que fluyen entre OPS.

Grappler es el sistema de optimización de gráficos predeterminado en el tiempo de ejecución de TensorFlow. Grappler aplica optimizaciones en modo gráfico (dentro de tf.perform) para mejorar el rendimiento de sus cálculos de flujo de tensor a través de simplificaciones gráficas y otras optimizaciones de alto nivel, como insertar cuerpos de funciones para habilitar las optimizaciones interprocedurales. Optimizando el tf.Graph También scale back el uso de la memoria máxima del dispositivo y mejora la utilización del {hardware} al optimizar la asignación de nodos gráficos para calcular los recursos.

Usar tf.config.optimizer.set_experimental_options() para un management más fino sobre su tf.Graph Optimizaciones.

Optimizadores de gráficos disponibles

Grapler realiza optimizaciones de gráficos a través de un controlador de nivel superior llamado MetaOptimizer. Los siguientes optimizadores de gráficos están disponibles con TensorFlow:

  • Optimizador de plegado constante – Infiere estáticamente el valor de los tensores cuando es posible plegando nodos constantes en el gráfico y materializa el resultado usando constantes.
  • Optimizador aritmético – Simplifica las operaciones aritméticas eliminando las subexpresiones comunes y simplificando las declaraciones aritméticas.
  • Optimizador de diseño – Optimiza los diseños del tensor para ejecutar operaciones dependientes del formato de datos, como las convoluciones de manera más eficiente.
  • Optimizador de retrapas – Remapas subgraphs en implementaciones más eficientes reemplazando los subgraphs que ocurren con núcleos monolíticos fusionados optimizados.
  • Optimizador de memoria – Analiza el gráfico para inspeccionar el uso de la memoria máxima para cada operación e inserta operaciones de copia de memoria CPU-GPU para intercambiar memoria de GPU a CPU para reducir el uso de la memoria máxima.
  • Optimizador de dependencia – Elimina o reorganiza las dependencias de management para acortar la ruta crítica para un paso de modelo o habilita otras optimizaciones. También elimina los nodos que son efectivamente no-opo como la identidad.
  • Optimizador de poda – Nodos de circulación que no tienen efecto en la salida del gráfico. Por lo common, se ejecuta primero para reducir el tamaño del gráfico y acelerar el procesamiento en otros pases de luchador.
  • Optimizador de funciones – Optimiza la biblioteca de funciones de un programa TensorFlow e ingresa los cuerpos de funciones para habilitar otras optimizaciones interprocedurales.
  • Optimizador de forma – Optimiza los subgrafías que funcionan con información relacionada con la forma y la forma.
  • Optimizador de AutoParalelo – Paraleliza automáticamente los gráficos dividiendo a lo largo de la dimensión de lotes. Este optimizador se desactiva de forma predeterminada.
  • Optimizador de bucle – Optimiza el flujo de management de gráficos al elevar los subgrafos invariantes de bucle fuera de los bucles y eliminando las operaciones de pila redundantes en bucles. También optimiza los bucles con recuentos de viajes estáticamente conocidos y elimina las ramas muertas estáticamente conocidas en condicionales.
  • Optimizador de asignador de alcance – Presenta los asignadores de alcance para reducir el movimiento de datos y consolidar algunas operaciones.
  • Pin para host Optimizer – Cambia de pequeñas operaciones a la CPU. Este optimizador se desactiva de forma predeterminada.
  • Optimizador de precisión auto mixta – Convierte los tipos de datos en FLOAT16 donde corresponda para mejorar el rendimiento. Actualmente se aplica a las GPU y a las últimas CPU de Intel Xeon.
  • Stripper de depuración – Notos de tiros relacionados con operaciones de depuración como tf.debugging.Assert, tf.debugging.check_numericsy tf.print del gráfico. Este optimizador se desactiva de forma predeterminada.

Configuración

import numpy as np
import timeit
import traceback
import contextlib


import tensorflow as tf
2024-10-22 01:21:40.497936: E exterior/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT manufacturing unit: Making an attempt to register manufacturing unit for plugin cuFFT when one has already been registered
2024-10-22 01:21:40.518495: E exterior/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN manufacturing unit: Making an attempt to register manufacturing unit for plugin cuDNN when one has already been registered
2024-10-22 01:21:40.524546: E exterior/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS manufacturing unit: Making an attempt to register manufacturing unit for plugin cuBLAS when one has already been registered

Cree un administrador de contexto para alternar fácilmente los estados optimizadores.

@contextlib.contextmanager
def choices(choices):
  old_opts = tf.config.optimizer.get_experimental_options()
  tf.config.optimizer.set_experimental_options(choices)
  strive:
    yield
  lastly:
    tf.config.optimizer.set_experimental_options(old_opts)

Evaluate el rendimiento de la ejecución con y sin luchador

TensorFlow 2 y Past se ejecuta ansiosamente por defecto. Usar tf.perform Para cambiar la ejecución predeterminada al modo gráfico. Grappler se ejecuta automáticamente en segundo plano para aplicar las optimizaciones de gráficos anteriores y mejorar el rendimiento de la ejecución.

Optimizador plegable constante

Como ejemplo preliminar, considere una función que realiza operaciones con constantes y devuelve una salida.

def test_function_1():
  @tf.perform
  def simple_function(input_arg):
    print('Tracing!')
    a = tf.fixed(np.random.randn(2000,2000), dtype = tf.float32)
    c = a
    for n in vary(50):
      c = c@a
    return tf.reduce_mean(c+input_arg)

  return simple_function

Apague el optimizador de plegado constante y ejecute la función:

with choices({'constant_folding': False}):
  print(tf.config.optimizer.get_experimental_options())
  simple_function = test_function_1()
  # Hint as soon as
  x = tf.fixed(2.2)
  simple_function(x)
  print("Vanilla execution:", timeit.timeit(lambda: simple_function(x), quantity = 1), "s")
WARNING: All log messages earlier than absl::InitializeLog() is named are written to STDERR
I0000 00:00:1729560103.034816   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.038481   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.042253   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.046045   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.057719   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.061186   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.064513   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.068051   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.071607   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.075025   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.078392   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560103.081893   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.352532   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.354590   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.356672   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.358683   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.360804   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.362737   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.364643   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.366576   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.368663   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.370608   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.372559   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.374467   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.414463   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.416494   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.418461   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.420413   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.422446   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.424357   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.426261   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.428204   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.430256   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.433836   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.436197   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
I0000 00:00:1729560104.438538   10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at 
{'constant_folding': False, 'disable_model_pruning': False, 'disable_meta_optimizer': False}
Tracing!
Vanilla execution: 0.002112719000024299 s

Habilite el optimizador de plegamiento constante y ejecute la función nuevamente para observar una aceleración en la ejecución de funciones.

with choices({'constant_folding': True}):
  print(tf.config.optimizer.get_experimental_options())
  simple_function = test_function_1()
  # Hint as soon as
  x = tf.fixed(2.2)
  simple_function(x)
  print("Fixed folded execution:", timeit.timeit(lambda: simple_function(x), quantity = 1), "s")
{'constant_folding': True, 'disable_model_pruning': False, 'disable_meta_optimizer': False}
Tracing!
Fixed folded execution: 0.0007726810000576734 s

Optimizador de stripper de depuración

Considere una función easy que verifica el valor numérico de su argumento de entrada y lo devuelve.

def test_function_2():
  @tf.perform
  def simple_func(input_arg):
    output = input_arg
    tf.debugging.check_numerics(output, "Unhealthy!")
    return output
  return simple_func

Primero, ejecute la función con el optimizador Stripper de depuración apagado.

test_func = test_function_2()
p1 = tf.fixed(float('inf'))
strive:
  test_func(p1)
besides tf.errors.InvalidArgumentError as e:
  traceback.print_exc(restrict=2)
2024-10-22 01:22:00.656591: E tensorflow/core/kernels/check_numerics_op.cc:299] abnormal_detected_host @0x7f5f72c00100 = {0, 1} Unhealthy!
Traceback (most up-to-date name final):
  File "/tmpfs/tmp/ipykernel_10375/3616845043.py", line 4, in 
    test_func(p1)
  File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    increase e.with_traceback(filtered_tb) from None
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node CheckNumerics outlined at (most up-to-date name final):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main

  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code

tf.debugging.check_numerics plantea un error de argumento no válido debido a la Inf argumentar test_func.

Habilite el optimizador de stripper de depuración y ejecute la función nuevamente.

with choices({'debug_stripper': True}):
  test_func2 = test_function_2()
  p1 = tf.fixed(float('inf'))
  strive:
    test_func2(p1)
  besides tf.errors.InvalidArgumentError as e:
    traceback.print_exc(restrict=2)

El optimizador de stripper de depuración despoja el tf.debug.check_numerics nodo desde el gráfico y ejecuta la función sin recaudar ningún error.

Resumen

El tiempo de ejecución de TensorFlow usa Grapler para optimizar los gráficos automáticamente antes de la ejecución. Usar tf.config.optimizer.set_experimental_options para habilitar o deshabilitar los diversos optimizadores de gráficos.

Para obtener más información sobre Grappler, consulte las optimizaciones del gráfico TensorFlow.

Publicado originalmente en el Flujo tensor Sitio internet, este artículo aparece aquí en un nuevo titular y tiene licencia bajo CC por 4.0. Muestras de código compartidas bajo la licencia Apache 2.0.

fuente