Section 3 - Identify and Resolve LTM Device Issues
===================================================
|
.. raw:: html
|
====
|
Objective - 3.01 Interpret log file messages and/or command line output to identify LTM device issues
-----------------------------------------------------------------------------------------------------
|
|
**3.01 - Interpret log file messages to identify LTM device issues**
https://support.f5.com/csp/article/K14426
**Identifying hardware issues**
This can be a very broad topic because there are a very large number of
hardware errors that can occur. Every log message will contain an ID
number and will be followed by a description in the log.
Example:
Back in version 11.4 the pendsect feature was added to the TMOS software
that periodically checks for pending sector alerts and resolves them.
The pendsect feature is configured to run daily. The pendsect messages
provide improved disk error detection and correction. The system logs
the pendsect messages to the /var/log/user.log file.
When the pendsect process runs and no errors are detected or corrected,
the system logs messages that appear similar to the following example:
.. code-block:: console
warning pendsect[21788]: pendsect: /dev/sdb no Pending Sectors detected
When the pendsect process detects and corrects an error, the system logs
messages that appear similar to the following example:
.. code-block:: console
warning pendsect[19772]: Recovered LBA:230000007
warning pendsect[19772]: Drive /dev/sda partition UNKNOWN
warning pendsect[19772]: File affected NONE
When the pendsect process detects an error and is unable to correct the
error, the system logs messages that appear similar to the following
example:
.. code-block:: console
warning pendsect[20702]: seek(1) error[25] recovery of LBA:226300793 not complete
warning pendsect[20702]: Drive: /dev/sda filesystem type: Undetermined
warning pendsect[20702]: File affected: NONE
Recommended Actions
If pendsect reports an uncorrectable error, or if you suspect a possible
disk failure, you can perform the End-User Diagnostic (EUD) SMART test
to test the drive. For information about the EUD utility, and links to
the latest release notes, refer to K7172: Overview of the End User
Diagnostics software.
Beginning in BIG-IP 11.4.0, you can also use the platform_check command
to collect the SMART test data from the drive. The disk portion of the
command output indicates a Pass or Fail status for the drive and logs
detailed information to the /var/log/platform_check file.
----
|
**3.01 - Interpret the qkview heuristic results**
https://support.f5.com/kb/en-us/products/big-iq-centralized-mgmt/manuals/product/bigiq-central-mgmt-monitoring-reports-5-3-0/9.html
**Troubleshooting using iHealth**
The F5 iHealth server is a tool that helps you troubleshoot potential
issues. It does this by analyzing configuration, logs, command output,
password security, license compliance, and so on.
From F5 BIG-IQ Centralized Management, you can create a snapshot of a
configuration in the form of a QKView file and then upload it to the F5
iHealth server. The file is compared to the iHealth database, which
contains known issues, common configuration errors, and F5 published
best practices. F5 returns an iHealth report you can use to identify any
potential issues that you need to attend to.
*Troubleshoot potential issues by viewing an iHealth device report*
After you upload a QKView file for one or more BIG-IP devices, the F5
iHealth server returns a device report.
Review the device report so you can address any potential issues or
vulnerabilities. From the report, you can access and sort heuristics
associated with a device.
1. At the top of the screen, click Monitoring.
2. On the left, click REPORTS > Device > iHealth > Device Reports .
3. Click the Open link next to the report you want to view.
4. To sort the heuristics for a report you've opened, select an option
from the All Importance and/or the All Flags list.
5. You can add a flag to a specific heuristic by selecting the check box
next to it, and selecting a flag from the All Flags list.
6. To view more details about a specific heuristic, click on its link.
7. To view an article on the AskF5 Knowledge Center database to get more
information about this heuristic, click the solution link.
----
|
**3.01 - Identify appropriate methods to troubleshoot NTP**
https://support.f5.com/csp/article/K14120
**NTP**
NTP is a protocol for synchronizing the clocks of computer systems over
the network. On BIG-IP systems, accurate timestamps are essential to
guarantee the correct behavior of a number of features. While in most
cases it is sufficient to configure a couple of time servers that the
BIG-IP system will use to update its system time, it is also possible to
define more advanced NTP configurations on the BIG-IP system.
----
https://support.f5.com/csp/article/K10240
When the BIG-IP system clock is not showing the correct time zone, or
the date and time is not synchronized correctly, this could be caused by
incorrect NTP configuration or a communication issue with a valid NTP
peer server. The procedures in this article show how you may check the
NTP daemon process, verify the NTP configuration, query the NTP peer
server, and check the network connectivity to the NTP peer server.
When verifying the NTP peer server communication, you can use the ntpq
utility. The command generates output with the fields that are explained
in the following table.
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **Field** | **Definition** |
+==============================+========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+
| prefix to the **remote** | - An asterisk (*) character indicates that the peer has been declared the system peer and lends its variables to the system variables. |
| field | |
| | - A plus sign (+) indicates that the peer is a survivor and a candidate for the combining algorithm. |
| | |
| | - A space, x, period (.), dash (-), or hash (#) character indicates that this peer is not being used for synchronization because it either does not meet the requirements, is unreachable, or is not needed. |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **remote** | The **remote** field is the address of the remote peer. |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **refid** | The **refid** field is the Reference ID which identifies the server or reference clock with which the remote peer synchronizes, and its interpretation depends on the value of the stratum field (explained in the **st** definition). For stratum 0 (unspecified or invalid), the refid is an ascii value used for debugging. Example: INIT or STEP. For stratum 1 (reference clock), the refid is an ascii value used to specify the type of external clock source. Example: NIST refers to NIST telephone modem. For strata 2 through 15, the refid is the address of the next lower stratum server used for synchronization. |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **st** | The **st** field is the stratum of the remote peer. Primary servers (servers with an external reference clock such as GPS) are assigned stratum 1. A secondary NTP server which synchronizes with a stratum 1 server is assigned stratum 2. A secondary NTP server which synchronizes with a stratum 2 server is assigned stratum 3. Stratum 16 is referred to as "MAXSTRAT," is customarily mapped to stratum value 0, and therefore indicates being unsynchronized. Strata 17 through 255 are reserved. |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **t** | The **t** field is the type of peer: local, unicast, multicast, or broadcast. |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **when** | The **when** field is the time since the last response to a poll was received (in seconds). |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **poll** | The **poll** field is the polling interval (in seconds). This value starts low (example: 64) and over time, as no changes are detected, this polling value increases incrementally to the configured max polling value (example: 1024). |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **reach** | The **reach** field is the reachability register. The octal shift register records results of the last eight poll attempts. |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **delay** | The **delay** field is the current estimated delay; the transit time between these peers in milliseconds. |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **offset** | The **offset** field is the current estimated offset; the time difference between these peers in milliseconds. |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **jitter** | The **jitter** field is the current estimated dispersion; the variation in delay between these peers in milliseconds. |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
----
|
**3.01 - Identify license problems based on the log file messages and statistics**
https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/related/log-messages.html#A01010044
**Licensing based log messages**
There are multiple types of log messages that could occur around
licensing.
.. code-block:: console
01010044 : "%s feature %s licensed"
*Location:*
.. code-block:: console
/var/log/ltm
*Conditions:*
This message does not necessarily denote a problem. It displays the
license status of BIG-IP device's component.
When status for component X is "licensed", this log displays the message:
.. code-block:: console
Component X is licensed.
When the component is not licensed, the message is:
.. code-block:: console
Component X is NOT licensed.
*Impact:*
If the message is "Component X is licensed", there is no impact. It
is an informative message.
If the message is "Component X is not licensed", then you cannot use
the mentioned component/feature.
*Recommended Action:*
If you want to use a component that is not currently licensed, you
need to activate the license.
----
When the system statistics show bandwidth of the licensed feature is
running at the max level you may see logs reflecting that the system is
exceeding the licensed limit.
.. code-block:: console
01010045 : Bandwidth utilization is %d Mbps, exceeded %d%% of Licensed %d Mbps
*Location:*
/var/log/ltm
*Conditions:*
This message appears when the system is using more bandwidth that it was licensed to use.
*Impact:*
The system will not perform at its full potential with a limited license.
*Recommended Action:*
A license with better bandwidth utilization would stop this message from appearing.
|
.. raw:: html
|
====
|
Objective - 3.02 Identify the appropriate command to use to determine the cause of an LTM device problem
--------------------------------------------------------------------------------------------------------
|
|
**3.02 - Identify hardware problems based on the log file messages and statistics**
https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/related/log-messages.html
**Identify Hardware Problems**
This can be a very broad topic because there are a very large number of
hardware errors that can occur. Every log message will begin with an ID
number and will be followed by a description in the log. The list of
possible log messages is long and memorizing them is not required, but
understanding how to read the messages and where logs can be found are
important. You will find many hardware related log messages in
/var/log/ltm and when you see LCD in the location that means it will
echo to the LCD screen of the device.
**Log Message Example**
.. code-block:: console
012a0028 : %s
*Location:*
/var/log/ltm, LCD
*Conditions:*
AOM has indicated that a temperature sensor has crossed a 'warning' threshold.
*Impact:*
Integrity of the hardware could be at risk if overheating is not mitigated.
*Recommended Action:*
- Check the fan status of the unit using 'tmsh show sys hardware'.
- Inspect the LCD and/or /var/log/ltm for any fan related problems.
- Ensure that ambient room temperature in which the device is located
has sufficient cooling.
- Inspect /var/log/ltm and /var/log/sel around the time of the message
for any additional indications as to why the unit might be starting
to overheat.
You can also correlate information in the performance statistics to
hardware errors in the logs.
----
|
**3.02 - Identify resource exhaustion problems based on the log file messages and statistics**
https://support.f5.com/csp/article/K14813
**Identify resource exhaustion problems**
There can be many types of resource exhaustion issues to troubleshoot.
This example is based on memory exhaustion due to a SYN flood. Your exam
may contain other types.
Detecting DoS and DDoS attacks
The BIG-IP system provides methods to detect ongoing or previous DoS and
DDoS attacks on the system. To detect these attacks, perform the
following procedures:
The BIG-IP SYN cookie feature protects the system against SYN flood
attacks and allows the BIG-IP system to maintain connections when the
SYN queue begins to fill up during an attack.
Reviewing SYN cookie threshold log messages
The BIG-IP system may log one or more error messages that relate to SYN
cookie protection to the /var/log/ltm file. Messages that relate to SYN
cookie protection appear similar to the following examples:
- When the virtual server exceeds the SYN Check Activation Threshold,
the system logs an error message similar to the following example:
.. code-block:: console
warning tmm5[18388]: 01010038:4: Syncookie threshold 0 exceeded,
virtual = 10.11.16.238:80
- When hardware SYN cookie mode is active for a virtual server, the
system logs an error message similar to the following example:
.. code-block:: console
notice tmm5[18388]: 01010240:5: Syncookie HW mode activated, server
= 10.11.16.238:80, HSB modId = 1
- When hardware SYN cookie mode is not active for a virtual server, the
system logs an error message similar to the following example:
.. code-block:: console
notice tmm5[18388]: 01010241:5: Syncookie HW mode exited, server =
10.11.16.238:80, HSB modId = 1 from HSB
Reviewing maximum reject rate log messages
The tm.maxrejectrate db key allows you to adjust the number of TCP RSTs
or ICMP unreachable packets that the BIG-IP system sends in response to
incoming client-side or server-side packets that cannot be matched with
existing connections to BIG-IP virtual servers, self IP addresses, or
Secure Network Address Translations (SNATs). A high number of maximum
reject rate messages may indicate that the BIG-IP system is experiencing
a DoS/DDoS attack.
The BIG-IP system may log error messages that relate to SYN cookie
protection to the /var/log/ltm file. Messages that relate to SYN cookie
protection appear similar to the following examples:
- When the number of packets that match a virtual IP address or a self
IP address exceeds the tm.maxrejectrate threshold, but the packets
specify an invalid port, the system stops sending RST packets in
response to the unmatched packets and logs an error message to the
/var/log/ltm file that appears similar to the following example:
.. code-block:: console
011e0001:4: Limiting closed port RST response from 299 to 250
packets/sec
- When the number of packets that match a virtual address and port, or
a self IP address and port, exceeds the tm.maxrejectrate threshold,
but the packet is not a TCP SYN packet and does not match an
established connection, the system stops sending RST packets in
response to the unmatched packets. The system also logs an error
message to the /var/log/ltm file that appears similar to the
following example:
.. code-block:: console
011e0001:4: Limiting open port RST response from 251 to 250
packets/sec
----
|
**3.02 - Identify connectivity problems based on the log files**
https://support.f5.com/csp/article/K53419416
**Virtual Server Processing Order**
There can be many types of connectivity issues to troubleshoot. This
Error Message example is based on connectivity failure between an HA
pair. Your exam may contain other types.
Error Message
.. code-block:: console
01071431:5: Attempting to connect to CMI peer port
In this error message, note the following:
- is the remote BIG-IP system's configured failover IP
address, used for failover operations.
- is the remote BIG-IP system's configured failover TCP service
port, used for failover operations.
For example:
.. code-block:: console
01071431:5: Attempting to connect to CMI peer 192.168.10.100 port 6699
Message Location
You may encounter this message in the following location:
- /var/log/ltm
Description
This message occurs when all of the following conditions are met:
- You have multiple BIG-IP systems in a high availability (HA)
configuration.
- The master control process daemon (mcpd) starts and attempts to
connect to a peer BIG-IP system in the trust domain or general
network issues exist, such as routing or switching failures, which
prevent connectivity between BIG-IP systems in the trust domain.
A trust domain is a collection of BIG-IP devices that trust each other.
The devices can synchronize, fail over their BIG-IP configuration data,
and exchange status and failover messages on a regular basis.
Impact
If this error message appears unaccompanied by other messages, then
there is no impact on the BIG-IP system. If other messages are logged
along with this error message, you can use those messages to
troubleshoot the impact on the BIG-IP system. For example, if a general
network issue occurs and the local BIG-IP system is unable to connect to
a remote peer BIG-IP system, a message appearing similar to the
following example is logged:
.. code-block:: console
01071431:5: Attempting to connect to CMI peer 192.168.10.100 port 6699
0107142f:3: Can't connect to CMI peer 192.168.10.100, port:6699, Transport endpoint is not connected
Recommended Actions
If logged messages indicate that the BIG-IP system is impacted, ensure
that the self IP addresses for the BIG-IP devices in the cluster are
correct and that the network allows proper connectivity between the
devices.
----
|
**3.02 - Determine the appropriate log file to examine to determine the cause of the problem**
https://support.f5.com/csp/article/K16197
**Logging**
BIG-IP log files include important diagnostic information about the
events that are occurring on the BIG-IP system. Some of the events
pertain to the Linux host. For example, the Linux host generates system
messages that pertain to the Linux host operating system, including
messages that are logged during system startup, and information logged
by the background daemons that run on the system. Other events are
specific to the BIG-IP operating system. For example, the BIG-IP
operating system generates messages that pertain to local and global
traffic events, and configuration changes (audit logging).
|
Local logging
By default, the BIG-IP system logs events locally and stores messages in
the /var/log directory. For BIG-IP events, the system routes messages
from the errdefs subsystem through syslog-ng to the local log files. For
non-BIG-IP events, the system routes messages directly through syslog-ng
to the local log files. In addition, you can configure the system to use
the high-speed logging mechanism (HSL) to store the logs in either the
syslog or the MySQL database.
|
Remote logging
You can configure the system to use the HSL mechanism to log messages to
a pool of remote log servers. If the BIG-IP system processes a high
volume of traffic or generates an excessive amount of log files, F5
recommends that you configure remote logging.
----
**BIG-IP log types**
Each type of event is stored locally in a separate log file, and the
information stored in each log file varies depending on the event type.
All log files for these event types are in the /var/log directory.
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| **Type** | **Description** | **Log file** |
+=================+==================================================================================================================================================================+====================================+
| audit | The audit event messages are messages that the BIG-IP system logs as a result of changes to the BIG-IP system configuration. Logging audit events is optional. | **/var/log/audit** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| boot | The boot messages contain information that is logged when the system boots. | **/var/log/boot.log** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| cron | When the **cron** daemon starts a **cron** job, the daemon logs the information about the **cron** job in this file. | **/var/log/cron** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| daemon | The daemon messages are logged by various daemons that run on the system. | **/var/log/daemon.log** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| dmesg | The dmesg messages contain kernel ring buffer information that pertains to the hardware devices that the kernel detects during the boot process. | **/var/log/dmesg** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| GSLB | The GSLB messages pertain to global traffic management events. | **/var/log/gtm** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| httpd | The httpd messages contain the Apache Web server error log. | **/var/log/httpd/httpd_errors** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| kernel | The kernel messages are logged by the Linux kernel. | **/var/log/kern.log** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| local traffic | The local traffic messages pertain specifically to the BIG-IP local traffic management events. | **/var/log/ltm** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| mail | The mail messages contain the log information from the mail server that is running on the system. | **/var/log/maillog** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| packet filter | The packet filter messages are those that result from the use of packet filters and packet-filter rules. | **/var/log/pktfilter** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| security | The secure log messages contain information related to authentication and authorization privileges. | **/var/log/secure** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| system | The system event messages are based on global Linux events, and are not specific to BIG-IP local traffic management events. | **/var/log/messages** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| TMM | The TMM log messages are those that pertain to Traffic Management Microkernel events. | **/var/log/tmm** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| user | The user log messages contain information about all user level logs. | **/var/log/user.log** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
| webui | The webui log messages display errors and exception details that pertain to the Configuration utility. | **/var/log/webui.log** |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+
----
**Log message format**
Log messages are formatted differently depending on the type of log and
the component that generated the event messages. The log formats are
discussed in the following sections.
|
Local traffic log message format
The local traffic (ltm) log messages generated by the BIG-IP system
include the following types of information:
.. code-block:: console