ts-7000
[Top] [All Lists]

[ts-7000] Re: Diagnosing a system lockup

To:
Subject: [ts-7000] Re: Diagnosing a system lockup
From: "Robert" <>
Date: Wed, 14 Jan 2009 16:47:05 -0000
I'm dealing with a similar problem with my 7800...no response to console logins, no ssh, no ftp. The way I am approaching this is to try and record the overall system state leading up to the lockup, as opposed to inspecting the system while it is locked up.

I am running a stats script from cron once a minute to print out things like memory usage, cpu usage, open file handles, disk usage, etc. Then I can grab this file after the system has locked up and been rebooted.

Here's my script: (Note: all these variables may not be applicable, also my SD card is mounted at /var )

#!/bin/bash

###########################################
# File Versioning Information
###########################################
# Version Date   &nbs p; Author  Comments
# 1.0     01/07/09 RU      Initial Release
###########################################

# uncomment this line for debugging
#set -vx

# This script will take a snapshot of the current system resources and print an entry into a stats file. If no stats file
# exists for the current day, then one will be created.
VERSION=1.0

if [ "${1}" = "-v" ]; then
  echo ${VERSION}
  exit 0
fi

timestamp=`date +%s`
year=`date +%Y`
mon=`date +%m`
date=`date +%d`
file=/var/log/stats/${year}-${mon}-${date}.csv

# check to see is the stats file for today exists
if [ ! -e $file ]
then
    # create a new stats file for today
    touch $file
   
    #print header into the the stats file
    echo -n "Timestamp, User CPU time, System CPU time, CPU idle time, IO wait time, Total processes, Max user processes, Runnable processes, " >> $file
    echo -n "Uninterruptable sleeping processes, Processes waiting on IO, IOs in progress (SD card), Total time spent in IO (SD card), Context switches/second, Pages paged out, Pages paged in, " >> $file
    echo -n "Total open files, Max open files, Total open file desc, Max open file desc, Free memory, Actual Free memory (free + buffers + cache), Flash file system usage, SD file system usage" >> $file
    echo " " >> $file
fi

#############
# gather all the stats
#############

tot _ open_files=`cat /proc/sys/fs/file-nr  | awk '{print $1}'`
max_open_files=`cat /proc/sys/fs/file-nr  | awk '{print $3}'`

free_mem=`free | grep 'Mem' | awk '{print $4}'`
act_free_mem=`free | grep '-' | awk '{print $4}'`

#tot_open_file_desc=`lsof | wc -l`
max_open_file_desc=`ulimit -n`

tot_processes=`ps -eaf | wc -l`
max_processes=`ulimit -u`

processes_blocked=`cat /proc/stat | grep procs_blocked | awk '{print $2}'`

IOs_in_progress=`cat /proc/diskstats | grep tssdcard | awk '{print $12}'`
time_spent_in_IO=`cat /proc/ d iskstats | grep tssdcard | awk '{print $14}'`

vmstat=`vmstat 1 2 | sed -n '4 p'`
processes_runnable=`echo $vmstat | awk '{print $1}'`
processes_uninterruptable=`echo $vmstat | awk '{print $2}'`
context_switches=`echo $vmstat | awk '{print $12}'`
cpu_time_user=`echo $vmstat | awk '{print $13}'`
cpu_time_sys=`echo $vmstat | awk '{print $14}'`
cpu_time_idle=`echo $vmstat | awk '{print $15}'`
cpu_time_wait=`echo $vmstat | awk '{print $16}'`

pages_po=`vmstat -s | grep 'pages paged out' | awk '{print $1}'`
pages_pi=`vmstat -s | grep 'pages p a ged in' | awk '{print $1}'`

flash_usage=`df / | sed -n '2 p' | awk '{print $5}'`
sd_usage=`df /var | sed -n '2 p' | awk '{print $5}'`


################
# print the stats to the file
################

echo -n "$timestamp, $cpu_time_user, $cpu_time_sys, $cpu_time_idle, $cpu_time_wait, $tot_processes, $max_processes, $processes_runnable, " >> $file
echo -n "$processes_uninterruptable, $processes_blocked, $IOs_in_progress, $time_spent_in_IO, $context_switches, $pages_po, $pages_pi, " >> $file
echo -n "$tot_open_files, $max_open_files, ??, $max_open_file_desc, $free_mem, $act_free_mem, $flash_usage, $sd_usage " >> $file
echo "" >> $file

--- In sean <> wrote:
>
> Hi All,
>
> I'm testing some fairly simple C code on a TS-7350, but after about 5 days
> of running the program my board totally locks up, no response to the serial
> console
> or network pings. Can anyone recommend a way of diagnosing what is wrong
> when the system is in this state?
> I kept an eye on free RAM and free NANDFlash and neither where exhausted
> when the lockup occured.
>
> Thanks,
> Sean
>
__._,_.___

Your email settings: Individual Email|Traditional
Change settings via the Web (Yahoo! ID required)
Change settings via email: =Email Delivery: Digest | m("yahoogroups.com?subject","ts-7000-fullfeatured");=Change Delivery Format: Fully Featured">Switch to Fully Featured
Visit Your Group | Yahoo! Groups Terms of Use | =Unsubscribe

__,_._,___
<Prev in Thread] Current Thread [Next in Thread>
Admin

Disclaimer: Neither Andrew Taylor nor the University of NSW School of Computer and Engineering take any responsibility for the contents of this archive. It is purely a compilation of material sent by many people to the birding-aus mailing list. It has not been checked for accuracy nor its content verified in any way. If you wish to get material removed from the archive or have other queries about the archive e-mail Andrew Taylor at this address: andrewt@cse.unsw.EDU.AU