Descripción common del contenido
- Descripción common
- Optimizadores de gráficos disponibles
- Configuración
- Evaluate el rendimiento de la ejecución con y sin luchador
- Optimizador plegable constante
- Optimizador de stripper de depuración
- Resumen
Descripción common
TensorFlow utiliza ejecuciones gráficas y ansiosas para ejecutar cálculos. A tf.Graph
contiene un conjunto de tf.Operation
objetos (OPS) que representan unidades de cálculo y tf.Tensor
Objetos que representan las unidades de datos que fluyen entre OPS.
Grappler es el sistema de optimización de gráficos predeterminado en el tiempo de ejecución de TensorFlow. Grappler aplica optimizaciones en modo gráfico (dentro de tf.perform
) para mejorar el rendimiento de sus cálculos de flujo de tensor a través de simplificaciones gráficas y otras optimizaciones de alto nivel, como insertar cuerpos de funciones para habilitar las optimizaciones interprocedurales. Optimizando el tf.Graph
También scale back el uso de la memoria máxima del dispositivo y mejora la utilización del {hardware} al optimizar la asignación de nodos gráficos para calcular los recursos.
Usar tf.config.optimizer.set_experimental_options()
para un management más fino sobre su tf.Graph
Optimizaciones.
Optimizadores de gráficos disponibles
Grapler realiza optimizaciones de gráficos a través de un controlador de nivel superior llamado MetaOptimizer
. Los siguientes optimizadores de gráficos están disponibles con TensorFlow:
- Optimizador de plegado constante – Infiere estáticamente el valor de los tensores cuando es posible plegando nodos constantes en el gráfico y materializa el resultado usando constantes.
- Optimizador aritmético – Simplifica las operaciones aritméticas eliminando las subexpresiones comunes y simplificando las declaraciones aritméticas.
- Optimizador de diseño – Optimiza los diseños del tensor para ejecutar operaciones dependientes del formato de datos, como las convoluciones de manera más eficiente.
- Optimizador de retrapas – Remapas subgraphs en implementaciones más eficientes reemplazando los subgraphs que ocurren con núcleos monolíticos fusionados optimizados.
- Optimizador de memoria – Analiza el gráfico para inspeccionar el uso de la memoria máxima para cada operación e inserta operaciones de copia de memoria CPU-GPU para intercambiar memoria de GPU a CPU para reducir el uso de la memoria máxima.
- Optimizador de dependencia – Elimina o reorganiza las dependencias de management para acortar la ruta crítica para un paso de modelo o habilita otras optimizaciones. También elimina los nodos que son efectivamente no-opo como la identidad.
- Optimizador de poda – Nodos de circulación que no tienen efecto en la salida del gráfico. Por lo common, se ejecuta primero para reducir el tamaño del gráfico y acelerar el procesamiento en otros pases de luchador.
- Optimizador de funciones – Optimiza la biblioteca de funciones de un programa TensorFlow e ingresa los cuerpos de funciones para habilitar otras optimizaciones interprocedurales.
- Optimizador de forma – Optimiza los subgrafías que funcionan con información relacionada con la forma y la forma.
- Optimizador de AutoParalelo – Paraleliza automáticamente los gráficos dividiendo a lo largo de la dimensión de lotes. Este optimizador se desactiva de forma predeterminada.
- Optimizador de bucle – Optimiza el flujo de management de gráficos al elevar los subgrafos invariantes de bucle fuera de los bucles y eliminando las operaciones de pila redundantes en bucles. También optimiza los bucles con recuentos de viajes estáticamente conocidos y elimina las ramas muertas estáticamente conocidas en condicionales.
- Optimizador de asignador de alcance – Presenta los asignadores de alcance para reducir el movimiento de datos y consolidar algunas operaciones.
- Pin para host Optimizer – Cambia de pequeñas operaciones a la CPU. Este optimizador se desactiva de forma predeterminada.
- Optimizador de precisión auto mixta – Convierte los tipos de datos en FLOAT16 donde corresponda para mejorar el rendimiento. Actualmente se aplica a las GPU y a las últimas CPU de Intel Xeon.
- Stripper de depuración – Notos de tiros relacionados con operaciones de depuración como
tf.debugging.Assert
,tf.debugging.check_numerics
ytf.print
del gráfico. Este optimizador se desactiva de forma predeterminada.
Configuración
import numpy as np
import timeit
import traceback
import contextlib
import tensorflow as tf
2024-10-22 01:21:40.497936: E exterior/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT manufacturing unit: Making an attempt to register manufacturing unit for plugin cuFFT when one has already been registered
2024-10-22 01:21:40.518495: E exterior/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN manufacturing unit: Making an attempt to register manufacturing unit for plugin cuDNN when one has already been registered
2024-10-22 01:21:40.524546: E exterior/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS manufacturing unit: Making an attempt to register manufacturing unit for plugin cuBLAS when one has already been registered
Cree un administrador de contexto para alternar fácilmente los estados optimizadores.
@contextlib.contextmanager
def choices(choices):
old_opts = tf.config.optimizer.get_experimental_options()
tf.config.optimizer.set_experimental_options(choices)
strive:
yield
lastly:
tf.config.optimizer.set_experimental_options(old_opts)
Evaluate el rendimiento de la ejecución con y sin luchador
TensorFlow 2 y Past se ejecuta ansiosamente por defecto. Usar tf.perform
Para cambiar la ejecución predeterminada al modo gráfico. Grappler se ejecuta automáticamente en segundo plano para aplicar las optimizaciones de gráficos anteriores y mejorar el rendimiento de la ejecución.
Optimizador plegable constante
Como ejemplo preliminar, considere una función que realiza operaciones con constantes y devuelve una salida.
def test_function_1():
@tf.perform
def simple_function(input_arg):
print('Tracing!')
a = tf.fixed(np.random.randn(2000,2000), dtype = tf.float32)
c = a
for n in vary(50):
c = c@a
return tf.reduce_mean(c+input_arg)
return simple_function
Apague el optimizador de plegado constante y ejecute la función:
with choices({'constant_folding': False}):
print(tf.config.optimizer.get_experimental_options())
simple_function = test_function_1()
# Hint as soon as
x = tf.fixed(2.2)
simple_function(x)
print("Vanilla execution:", timeit.timeit(lambda: simple_function(x), quantity = 1), "s")
WARNING: All log messages earlier than absl::InitializeLog() is named are written to STDERR
I0000 00:00:1729560103.034816 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.038481 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.042253 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.046045 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.057719 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.061186 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.064513 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.068051 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.071607 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.075025 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.078392 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560103.081893 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.352532 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.354590 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.356672 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.358683 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.360804 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.362737 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.364643 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.366576 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.368663 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.370608 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.372559 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.374467 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.414463 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.416494 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.418461 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.420413 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.422446 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.424357 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.426261 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.428204 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.430256 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.433836 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.436197 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
I0000 00:00:1729560104.438538 10375 cuda_executor.cc:1015] profitable NUMA node learn from SysFS had adverse worth (-1), however there have to be at the least one NUMA node, so returning NUMA node zero. See extra at
{'constant_folding': False, 'disable_model_pruning': False, 'disable_meta_optimizer': False}
Tracing!
Vanilla execution: 0.002112719000024299 s
Habilite el optimizador de plegamiento constante y ejecute la función nuevamente para observar una aceleración en la ejecución de funciones.
with choices({'constant_folding': True}):
print(tf.config.optimizer.get_experimental_options())
simple_function = test_function_1()
# Hint as soon as
x = tf.fixed(2.2)
simple_function(x)
print("Fixed folded execution:", timeit.timeit(lambda: simple_function(x), quantity = 1), "s")
{'constant_folding': True, 'disable_model_pruning': False, 'disable_meta_optimizer': False}
Tracing!
Fixed folded execution: 0.0007726810000576734 s
Optimizador de stripper de depuración
Considere una función easy que verifica el valor numérico de su argumento de entrada y lo devuelve.
def test_function_2():
@tf.perform
def simple_func(input_arg):
output = input_arg
tf.debugging.check_numerics(output, "Unhealthy!")
return output
return simple_func
Primero, ejecute la función con el optimizador Stripper de depuración apagado.
test_func = test_function_2()
p1 = tf.fixed(float('inf'))
strive:
test_func(p1)
besides tf.errors.InvalidArgumentError as e:
traceback.print_exc(restrict=2)
2024-10-22 01:22:00.656591: E tensorflow/core/kernels/check_numerics_op.cc:299] abnormal_detected_host @0x7f5f72c00100 = {0, 1} Unhealthy!
Traceback (most up-to-date name final):
File "/tmpfs/tmp/ipykernel_10375/3616845043.py", line 4, in
test_func(p1)
File "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
increase e.with_traceback(filtered_tb) from None
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node CheckNumerics outlined at (most up-to-date name final):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
tf.debugging.check_numerics
plantea un error de argumento no válido debido a la Inf
argumentar test_func
.
Habilite el optimizador de stripper de depuración y ejecute la función nuevamente.
with choices({'debug_stripper': True}):
test_func2 = test_function_2()
p1 = tf.fixed(float('inf'))
strive:
test_func2(p1)
besides tf.errors.InvalidArgumentError as e:
traceback.print_exc(restrict=2)
El optimizador de stripper de depuración despoja el tf.debug.check_numerics
nodo desde el gráfico y ejecuta la función sin recaudar ningún error.
Resumen
El tiempo de ejecución de TensorFlow usa Grapler para optimizar los gráficos automáticamente antes de la ejecución. Usar tf.config.optimizer.set_experimental_options
para habilitar o deshabilitar los diversos optimizadores de gráficos.
Para obtener más información sobre Grappler, consulte las optimizaciones del gráfico TensorFlow.
Publicado originalmente en el