Section 3 - Identify and Resolve LTM Device Issues =================================================== | .. raw:: html | ==== | Objective - 3.01 Interpret log file messages and/or command line output to identify LTM device issues ----------------------------------------------------------------------------------------------------- | | **3.01 - Interpret log file messages to identify LTM device issues** https://support.f5.com/csp/article/K14426 **Identifying hardware issues** This can be a very broad topic because there are a very large number of hardware errors that can occur. Every log message will contain an ID number and will be followed by a description in the log. Example: Back in version 11.4 the pendsect feature was added to the TMOS software that periodically checks for pending sector alerts and resolves them. The pendsect feature is configured to run daily. The pendsect messages provide improved disk error detection and correction. The system logs the pendsect messages to the /var/log/user.log file. When the pendsect process runs and no errors are detected or corrected, the system logs messages that appear similar to the following example: .. code-block:: console warning pendsect[21788]: pendsect: /dev/sdb no Pending Sectors detected When the pendsect process detects and corrects an error, the system logs messages that appear similar to the following example: .. code-block:: console warning pendsect[19772]: Recovered LBA:230000007 warning pendsect[19772]: Drive /dev/sda partition UNKNOWN warning pendsect[19772]: File affected NONE When the pendsect process detects an error and is unable to correct the error, the system logs messages that appear similar to the following example: .. code-block:: console warning pendsect[20702]: seek(1) error[25] recovery of LBA:226300793 not complete warning pendsect[20702]: Drive: /dev/sda filesystem type: Undetermined warning pendsect[20702]: File affected: NONE Recommended Actions If pendsect reports an uncorrectable error, or if you suspect a possible disk failure, you can perform the End-User Diagnostic (EUD) SMART test to test the drive. For information about the EUD utility, and links to the latest release notes, refer to K7172: Overview of the End User Diagnostics software. Beginning in BIG-IP 11.4.0, you can also use the platform_check command to collect the SMART test data from the drive. The disk portion of the command output indicates a Pass or Fail status for the drive and logs detailed information to the /var/log/platform_check file. ---- | **3.01 - Interpret the qkview heuristic results** https://support.f5.com/kb/en-us/products/big-iq-centralized-mgmt/manuals/product/bigiq-central-mgmt-monitoring-reports-5-3-0/9.html **Troubleshooting using iHealth** The F5 iHealth server is a tool that helps you troubleshoot potential issues. It does this by analyzing configuration, logs, command output, password security, license compliance, and so on. From F5 BIG-IQ Centralized Management, you can create a snapshot of a configuration in the form of a QKView file and then upload it to the F5 iHealth server. The file is compared to the iHealth database, which contains known issues, common configuration errors, and F5 published best practices. F5 returns an iHealth report you can use to identify any potential issues that you need to attend to. *Troubleshoot potential issues by viewing an iHealth device report* After you upload a QKView file for one or more BIG-IP devices, the F5 iHealth server returns a device report. Review the device report so you can address any potential issues or vulnerabilities. From the report, you can access and sort heuristics associated with a device. 1. At the top of the screen, click Monitoring. 2. On the left, click REPORTS > Device > iHealth > Device Reports . 3. Click the Open link next to the report you want to view. 4. To sort the heuristics for a report you've opened, select an option from the All Importance and/or the All Flags list. 5. You can add a flag to a specific heuristic by selecting the check box next to it, and selecting a flag from the All Flags list. 6. To view more details about a specific heuristic, click on its link. 7. To view an article on the AskF5 Knowledge Center database to get more information about this heuristic, click the solution link. ---- | **3.01 - Identify appropriate methods to troubleshoot NTP** https://support.f5.com/csp/article/K14120 **NTP** NTP is a protocol for synchronizing the clocks of computer systems over the network. On BIG-IP systems, accurate timestamps are essential to guarantee the correct behavior of a number of features. While in most cases it is sufficient to configure a couple of time servers that the BIG-IP system will use to update its system time, it is also possible to define more advanced NTP configurations on the BIG-IP system. ---- https://support.f5.com/csp/article/K10240 When the BIG-IP system clock is not showing the correct time zone, or the date and time is not synchronized correctly, this could be caused by incorrect NTP configuration or a communication issue with a valid NTP peer server. The procedures in this article show how you may check the NTP daemon process, verify the NTP configuration, query the NTP peer server, and check the network connectivity to the NTP peer server. When verifying the NTP peer server communication, you can use the ntpq utility. The command generates output with the fields that are explained in the following table. +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **Field** | **Definition** | +==============================+========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ | prefix to the **remote** | - An asterisk (*) character indicates that the peer has been declared the system peer and lends its variables to the system variables. | | field | | | | - A plus sign (+) indicates that the peer is a survivor and a candidate for the combining algorithm. | | | | | | - A space, x, period (.), dash (-), or hash (#) character indicates that this peer is not being used for synchronization because it either does not meet the requirements, is unreachable, or is not needed. | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **remote** | The **remote** field is the address of the remote peer. | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **refid** | The **refid** field is the Reference ID which identifies the server or reference clock with which the remote peer synchronizes, and its interpretation depends on the value of the stratum field (explained in the **st** definition). For stratum 0 (unspecified or invalid), the refid is an ascii value used for debugging. Example: INIT or STEP. For stratum 1 (reference clock), the refid is an ascii value used to specify the type of external clock source. Example: NIST refers to NIST telephone modem. For strata 2 through 15, the refid is the address of the next lower stratum server used for synchronization. | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **st** | The **st** field is the stratum of the remote peer. Primary servers (servers with an external reference clock such as GPS) are assigned stratum 1. A secondary NTP server which synchronizes with a stratum 1 server is assigned stratum 2. A secondary NTP server which synchronizes with a stratum 2 server is assigned stratum 3. Stratum 16 is referred to as "MAXSTRAT," is customarily mapped to stratum value 0, and therefore indicates being unsynchronized. Strata 17 through 255 are reserved. | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **t** | The **t** field is the type of peer: local, unicast, multicast, or broadcast. | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **when** | The **when** field is the time since the last response to a poll was received (in seconds). | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **poll** | The **poll** field is the polling interval (in seconds). This value starts low (example: 64) and over time, as no changes are detected, this polling value increases incrementally to the configured max polling value (example: 1024). | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **reach** | The **reach** field is the reachability register. The octal shift register records results of the last eight poll attempts. | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **delay** | The **delay** field is the current estimated delay; the transit time between these peers in milliseconds. | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **offset** | The **offset** field is the current estimated offset; the time difference between these peers in milliseconds. | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **jitter** | The **jitter** field is the current estimated dispersion; the variation in delay between these peers in milliseconds. | +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ---- | **3.01 - Identify license problems based on the log file messages and statistics** https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/related/log-messages.html#A01010044 **Licensing based log messages** There are multiple types of log messages that could occur around licensing. .. code-block:: console 01010044 : "%s feature %s licensed" *Location:* .. code-block:: console /var/log/ltm *Conditions:* This message does not necessarily denote a problem. It displays the license status of BIG-IP device's component. When status for component X is "licensed", this log displays the message: .. code-block:: console Component X is licensed. When the component is not licensed, the message is: .. code-block:: console Component X is NOT licensed. *Impact:* If the message is "Component X is licensed", there is no impact. It is an informative message. If the message is "Component X is not licensed", then you cannot use the mentioned component/feature. *Recommended Action:* If you want to use a component that is not currently licensed, you need to activate the license. ---- When the system statistics show bandwidth of the licensed feature is running at the max level you may see logs reflecting that the system is exceeding the licensed limit. .. code-block:: console 01010045 : Bandwidth utilization is %d Mbps, exceeded %d%% of Licensed %d Mbps *Location:* /var/log/ltm *Conditions:* This message appears when the system is using more bandwidth that it was licensed to use. *Impact:* The system will not perform at its full potential with a limited license. *Recommended Action:* A license with better bandwidth utilization would stop this message from appearing. | .. raw:: html | ==== | Objective - 3.02 Identify the appropriate command to use to determine the cause of an LTM device problem -------------------------------------------------------------------------------------------------------- | | **3.02 - Identify hardware problems based on the log file messages and statistics** https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/related/log-messages.html **Identify Hardware Problems** This can be a very broad topic because there are a very large number of hardware errors that can occur. Every log message will begin with an ID number and will be followed by a description in the log. The list of possible log messages is long and memorizing them is not required, but understanding how to read the messages and where logs can be found are important. You will find many hardware related log messages in /var/log/ltm and when you see LCD in the location that means it will echo to the LCD screen of the device. **Log Message Example** .. code-block:: console 012a0028 : %s *Location:* /var/log/ltm, LCD *Conditions:* AOM has indicated that a temperature sensor has crossed a 'warning' threshold. *Impact:* Integrity of the hardware could be at risk if overheating is not mitigated. *Recommended Action:* - Check the fan status of the unit using 'tmsh show sys hardware'. - Inspect the LCD and/or /var/log/ltm for any fan related problems. - Ensure that ambient room temperature in which the device is located has sufficient cooling. - Inspect /var/log/ltm and /var/log/sel around the time of the message for any additional indications as to why the unit might be starting to overheat. You can also correlate information in the performance statistics to hardware errors in the logs. ---- | **3.02 - Identify resource exhaustion problems based on the log file messages and statistics** https://support.f5.com/csp/article/K14813 **Identify resource exhaustion problems** There can be many types of resource exhaustion issues to troubleshoot. This example is based on memory exhaustion due to a SYN flood. Your exam may contain other types. Detecting DoS and DDoS attacks The BIG-IP system provides methods to detect ongoing or previous DoS and DDoS attacks on the system. To detect these attacks, perform the following procedures: The BIG-IP SYN cookie feature protects the system against SYN flood attacks and allows the BIG-IP system to maintain connections when the SYN queue begins to fill up during an attack. Reviewing SYN cookie threshold log messages The BIG-IP system may log one or more error messages that relate to SYN cookie protection to the /var/log/ltm file. Messages that relate to SYN cookie protection appear similar to the following examples: - When the virtual server exceeds the SYN Check Activation Threshold, the system logs an error message similar to the following example: .. code-block:: console warning tmm5[18388]: 01010038:4: Syncookie threshold 0 exceeded, virtual = 10.11.16.238:80 - When hardware SYN cookie mode is active for a virtual server, the system logs an error message similar to the following example: .. code-block:: console notice tmm5[18388]: 01010240:5: Syncookie HW mode activated, server = 10.11.16.238:80, HSB modId = 1 - When hardware SYN cookie mode is not active for a virtual server, the system logs an error message similar to the following example: .. code-block:: console notice tmm5[18388]: 01010241:5: Syncookie HW mode exited, server = 10.11.16.238:80, HSB modId = 1 from HSB Reviewing maximum reject rate log messages The tm.maxrejectrate db key allows you to adjust the number of TCP RSTs or ICMP unreachable packets that the BIG-IP system sends in response to incoming client-side or server-side packets that cannot be matched with existing connections to BIG-IP virtual servers, self IP addresses, or Secure Network Address Translations (SNATs). A high number of maximum reject rate messages may indicate that the BIG-IP system is experiencing a DoS/DDoS attack. The BIG-IP system may log error messages that relate to SYN cookie protection to the /var/log/ltm file. Messages that relate to SYN cookie protection appear similar to the following examples: - When the number of packets that match a virtual IP address or a self IP address exceeds the tm.maxrejectrate threshold, but the packets specify an invalid port, the system stops sending RST packets in response to the unmatched packets and logs an error message to the /var/log/ltm file that appears similar to the following example: .. code-block:: console 011e0001:4: Limiting closed port RST response from 299 to 250 packets/sec - When the number of packets that match a virtual address and port, or a self IP address and port, exceeds the tm.maxrejectrate threshold, but the packet is not a TCP SYN packet and does not match an established connection, the system stops sending RST packets in response to the unmatched packets. The system also logs an error message to the /var/log/ltm file that appears similar to the following example: .. code-block:: console 011e0001:4: Limiting open port RST response from 251 to 250 packets/sec ---- | **3.02 - Identify connectivity problems based on the log files** https://support.f5.com/csp/article/K53419416 **Virtual Server Processing Order** There can be many types of connectivity issues to troubleshoot. This Error Message example is based on connectivity failure between an HA pair. Your exam may contain other types. Error Message .. code-block:: console 01071431:5: Attempting to connect to CMI peer port In this error message, note the following: - is the remote BIG-IP system's configured failover IP address, used for failover operations. - is the remote BIG-IP system's configured failover TCP service port, used for failover operations. For example: .. code-block:: console 01071431:5: Attempting to connect to CMI peer 192.168.10.100 port 6699 Message Location You may encounter this message in the following location: - /var/log/ltm Description This message occurs when all of the following conditions are met: - You have multiple BIG-IP systems in a high availability (HA) configuration. - The master control process daemon (mcpd) starts and attempts to connect to a peer BIG-IP system in the trust domain or general network issues exist, such as routing or switching failures, which prevent connectivity between BIG-IP systems in the trust domain. A trust domain is a collection of BIG-IP devices that trust each other. The devices can synchronize, fail over their BIG-IP configuration data, and exchange status and failover messages on a regular basis. Impact If this error message appears unaccompanied by other messages, then there is no impact on the BIG-IP system. If other messages are logged along with this error message, you can use those messages to troubleshoot the impact on the BIG-IP system. For example, if a general network issue occurs and the local BIG-IP system is unable to connect to a remote peer BIG-IP system, a message appearing similar to the following example is logged: .. code-block:: console 01071431:5: Attempting to connect to CMI peer 192.168.10.100 port 6699 0107142f:3: Can't connect to CMI peer 192.168.10.100, port:6699, Transport endpoint is not connected Recommended Actions If logged messages indicate that the BIG-IP system is impacted, ensure that the self IP addresses for the BIG-IP devices in the cluster are correct and that the network allows proper connectivity between the devices. ---- | **3.02 - Determine the appropriate log file to examine to determine the cause of the problem** https://support.f5.com/csp/article/K16197 **Logging** BIG-IP log files include important diagnostic information about the events that are occurring on the BIG-IP system. Some of the events pertain to the Linux host. For example, the Linux host generates system messages that pertain to the Linux host operating system, including messages that are logged during system startup, and information logged by the background daemons that run on the system. Other events are specific to the BIG-IP operating system. For example, the BIG-IP operating system generates messages that pertain to local and global traffic events, and configuration changes (audit logging). | Local logging By default, the BIG-IP system logs events locally and stores messages in the /var/log directory. For BIG-IP events, the system routes messages from the errdefs subsystem through syslog-ng to the local log files. For non-BIG-IP events, the system routes messages directly through syslog-ng to the local log files. In addition, you can configure the system to use the high-speed logging mechanism (HSL) to store the logs in either the syslog or the MySQL database. | Remote logging You can configure the system to use the HSL mechanism to log messages to a pool of remote log servers. If the BIG-IP system processes a high volume of traffic or generates an excessive amount of log files, F5 recommends that you configure remote logging. ---- **BIG-IP log types** Each type of event is stored locally in a separate log file, and the information stored in each log file varies depending on the event type. All log files for these event types are in the /var/log directory. +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | **Type** | **Description** | **Log file** | +=================+==================================================================================================================================================================+====================================+ | audit | The audit event messages are messages that the BIG-IP system logs as a result of changes to the BIG-IP system configuration. Logging audit events is optional. | **/var/log/audit** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | boot | The boot messages contain information that is logged when the system boots. | **/var/log/boot.log** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | cron | When the **cron** daemon starts a **cron** job, the daemon logs the information about the **cron** job in this file. | **/var/log/cron** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | daemon | The daemon messages are logged by various daemons that run on the system. | **/var/log/daemon.log** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | dmesg | The dmesg messages contain kernel ring buffer information that pertains to the hardware devices that the kernel detects during the boot process. | **/var/log/dmesg** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | GSLB | The GSLB messages pertain to global traffic management events. | **/var/log/gtm** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | httpd | The httpd messages contain the Apache Web server error log. | **/var/log/httpd/httpd_errors** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | kernel | The kernel messages are logged by the Linux kernel. | **/var/log/kern.log** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | local traffic | The local traffic messages pertain specifically to the BIG-IP local traffic management events. | **/var/log/ltm** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | mail | The mail messages contain the log information from the mail server that is running on the system. | **/var/log/maillog** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | packet filter | The packet filter messages are those that result from the use of packet filters and packet-filter rules. | **/var/log/pktfilter** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | security | The secure log messages contain information related to authentication and authorization privileges. | **/var/log/secure** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | system | The system event messages are based on global Linux events, and are not specific to BIG-IP local traffic management events. | **/var/log/messages** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | TMM | The TMM log messages are those that pertain to Traffic Management Microkernel events. | **/var/log/tmm** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | user | The user log messages contain information about all user level logs. | **/var/log/user.log** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ | webui | The webui log messages display errors and exception details that pertain to the Configuration utility. | **/var/log/webui.log** | +-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ ---- **Log message format** Log messages are formatted differently depending on the type of log and the component that generated the event messages. The log formats are discussed in the following sections. | Local traffic log message format The local traffic (ltm) log messages generated by the BIG-IP system include the following types of information: .. code-block:: console