Thanks Robert,
Nice
script! I'm running it on my board now, hopefully it'll capture some
useful info.
I'm dealing with a similar problem with my 7800...no response to
console logins, no ssh, no ftp. The way I am approaching this is to try
and record the overall system state leading up to the lockup, as
opposed to inspecting the system while it is locked up.
I am running a stats script from cron once a minute to print out things
like memory usage, cpu usage, open file handles, disk usage, etc. Then
I can grab this file after the system has locked up and been rebooted.
Here's my script: (Note: all these variables may not be applicable,
also my SD card is mounted at /var )
#!/bin/bash
###########################################
# File Versioning Information
###########################################
# Version Date Author
Comments
# 1.0 01/07/09 RU
Initial Release
###########################################
# uncomment this line for
debugging
#set -vx
# This script will take a
snapshot of the current system resources and print an entry into a
stats file. If no stats file
# exists for the current day,
then one will be created.
VERSION=1.0
if [ "${1}" = "-v" ]; then
echo ${VERSION}
exit 0
fi
timestamp=`date +%s`
year=`date +%Y`
mon=`date +%m`
date=`date +%d`
file=/var/log/stats/${year}-${mon}-${date}.csv
# check to see is the stats file
for today exists
if [ ! -e $file ]
then
# create a new stats file for
today
touch $file
#print header into the the
stats file
echo -n "Timestamp, User CPU
time, System CPU time, CPU idle time, IO wait time, Total processes,
Max user processes, Runnable processes, " >> $file
echo -n "Uninterruptable
sleeping processes, Processes waiting on IO, IOs in progress (SD card),
Total time spent in IO (SD card), Context switches/second, Pages paged
out, Pages paged in, " >> $file
echo -n "Total open files,
Max open files, Total open file desc, Max open file desc, Free memory,
Actual Free memory (free + buffers + cache), Flash file system usage,
SD file system usage" >> $file
echo " " >> $file
fi
#############
# gather all the stats
#############
tot_open_files=`cat
/proc/sys/fs/file-nr | awk '{print $1}'`
max_open_files=`cat
/proc/sys/fs/file-nr | awk '{print $3}'`
free_mem=`free | grep 'Mem' | awk
'{print $4}'`
act_free_mem=`free | grep
'-' | awk '{print $4}'`
#tot_open_file_desc=`lsof |
wc -l`
max_open_file_desc=`ulimit
-n`
tot_processes=`ps -eaf | wc
-l`
max_processes=`ulimit -u`
processes_blocked=`cat
/proc/stat | grep procs_blocked | awk '{print $2}'`
IOs_in_progress=`cat
/proc/diskstats | grep tssdcard | awk '{print $12}'`
time_spent_in_IO=`cat
/proc/diskstats | grep tssdcard | awk '{print $14}'`
vmstat=`vmstat 1 2 | sed -n '4 p'`
processes_runnable=`echo
$vmstat | awk '{print $1}'`
processes_uninterruptable=`echo
$vmstat | awk '{print $2}'`
context_switches=`echo
$vmstat | awk '{print $12}'`
cpu_time_user=`echo $vmstat
| awk '{print $13}'`
cpu_time_sys=`echo $vmstat |
awk '{print $14}'`
cpu_time_idle=`echo $vmstat
| awk '{print $15}'`
cpu_time_wait=`echo $vmstat
| awk '{print $16}'`
pages_po=`vmstat -s | grep 'pages
paged out' | awk '{print $1}'`
pages_pi=`vmstat -s | grep 'pages
paged in' | awk '{print $1}'`
flash_usage=`df / | sed -n
'2 p' | awk '{print $5}'`
sd_usage=`df /var | sed -n '2 p'
| awk '{print $5}'`
################
# print the stats to the file
################
echo -n "$timestamp,
$cpu_time_user, $cpu_time_sys, $cpu_time_idle, $cpu_time_wait,
$tot_processes, $max_processes, $processes_runnable, " >>
$file
echo -n "$processes_uninterruptable,
$processes_blocked, $IOs_in_progress, $time_spent_in_IO,
$context_switches, $pages_po, $pages_pi, " >> $file
echo -n "$tot_open_files,
$max_open_files, ??, $max_open_file_desc, $free_mem,
$act_free_mem, $flash_usage, $sd_usage" >> $file
echo "" >> $file
--- In .com, sean <smachin1000@...>
wrote:
>
> Hi All,
>
> I'm testing some fairly simple C code on a TS-7350, but after
about 5 days
> of running the program my board totally locks up, no response to
the serial
> console
> or network pings. Can anyone recommend a way of diagnosing what is
wrong
> when the system is in this state?
> I kept an eye on free RAM and free NANDFlash and neither where
exhausted
> when the lockup occured.
>
> Thanks,
> Sean
>
__._,_.___
__,_._,___
|
|