ts-7000
[Top] [All Lists]

[ts-7000] major Linux 2.4 kernel bug

To:
Subject: [ts-7000] major Linux 2.4 kernel bug
From: "Jesse Off" <>
Date: Wed, 25 Jan 2006 12:06:25 -0700 (MST)
A lot of people have been noticing "__alloc_pages: 0-order allocation 
failed" being printed by the kernel and subsequent kernel freezes in low 
memory conditions.  It turns out that the kernel actually has a lot more 
memory free when these messages appear and the intentional Linux behavior 
on out-of-memory condition is to randomly kill processes (*not* to lock 
up with these messages).

This patch fixes a bug originally fixed by a Linux contracting firm for 
one of our customers, but who both were unwilling to give the GPL'ed code 
back to the community (they wanted to sell it) so we chose to engineer our 
own fix.  The bug exists in all versions of Linux 2.4 that use the 
CONFIG_DISCONTIGMEM option and actually has nothing to do with the 
modifications Cirrus Logic or Technologic Systems applied for TS-7xxx 
support. (other than the fact that CONFIG_DISCONTIGMEM is uniquely enabled 
on these processors)

Also, a note to anyone thinking of hiring expensive contracters to work on 
Linux for the TS-7xxx board.  Keep in mind TS has very competent 
professional services people that can often save you money by 
accomplishing tasks in a fraction of the time it would otherwise. (This 
patch took 2 hours once we realized the original fix wasn't going to be 
freely published)  We don't advertise this capability much since we 
usually are so busy, but it does exist.

If you can, please try the below patch.  If everything goes well with 
testing this experimental patch, it will likely be in the next kernel. 
(-ts10 now I think?)  Due to the severity of the bug, I wanted to get this 
patch out here as soon as I wrote it.

//Jesse Off


Index: numa.c
===================================================================
RCS file: /cvsroot/ts-7200/dist/linux24/mm/numa.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 numa.c
--- numa.c      22 Jul 2004 19:48:53 -0000      1.1.1.1
+++ numa.c      25 Jan 2006 18:11:51 -0000
@@ -85,7 +85,7 @@
  static struct page * alloc_pages_pgdat(pg_data_t *pgdat, unsigned int 
gfp_mask,
        unsigned int order)
  {
-       return __alloc_pages(gfp_mask, order, pgdat->node_zonelists + (gfp_mask 
& GFP_ZONEMASK));
+       return __alloc_pages(gfp_mask, order, pgdat->node_zonelists);
  }

  /*
@@ -95,31 +95,50 @@
  struct page * _alloc_pages(unsigned int gfp_mask, unsigned int order)
  {
        struct page *ret = 0;
-       pg_data_t *start, *temp;
-#ifndef CONFIG_NUMA
+       pg_data_t *temp, *temp2;
+       int min, class_idx, retried = 0;
        unsigned long flags;
        static pg_data_t *next = 0;
-#endif
+

        if (order >= MAX_ORDER)
                return NULL;
-#ifdef CONFIG_NUMA
-       temp = NODE_DATA(numa_node_id());
-#else
        spin_lock_irqsave(&node_lock, flags);
        if (!next) next = pgdat_list;
-       temp = next;
+       temp2 = temp = next;
        next = next->node_next;
        spin_unlock_irqrestore(&node_lock, flags);
-#endif
-       start = temp;
+retry:
+       /* First attempt nodes with ample free pages */
        while (temp) {
-               if ((ret = alloc_pages_pgdat(temp, gfp_mask, order)))
-                       return(ret);
+               class_idx = zone_idx(temp->node_zonelists->zones[0]);
+               if ((temp->node_zonelists->zones[0]->free_pages - (1UL << 
order)) > 
+                 temp->node_zonelists->zones[0]->watermarks[class_idx].low) {
+                       if ((ret = alloc_pages_pgdat(temp, gfp_mask, order)))
+                               return(ret);
+               } 
+               temp = temp->node_next;
+       }
+       temp = temp2;
+       /* Next, try harder */
+       while (temp) {
+               class_idx = zone_idx(temp->node_zonelists->zones[0]);
+               min = temp->node_zonelists->zones[0]->watermarks[class_idx].min;
+               if (!(gfp_mask & __GFP_WAIT)) min >>= 2;
+               if ((temp->node_zonelists->zones[0]->free_pages - (1UL << 
order)) > min) {
+                       if ((ret = alloc_pages_pgdat(temp, gfp_mask, order)))
+                               return(ret);
+               }
                temp = temp->node_next;
        }
-       temp = pgdat_list;
-       while (temp != start) {
+       if (!retried) {
+               temp2 = temp = pgdat_list;
+               retried = 1;
+               goto retry;
+       }
+       temp = temp2;
+       /* Last chance for success */
+       while (temp) {
                if ((ret = alloc_pages_pgdat(temp, gfp_mask, order)))
                        return(ret);
                temp = temp->node_next;


//Jesse Off


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/ts-7000/

<*> To unsubscribe from this group, send an email to:
    

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 


<Prev in Thread] Current Thread [Next in Thread>
Admin

Disclaimer: Neither Andrew Taylor nor the University of NSW School of Computer and Engineering take any responsibility for the contents of this archive. It is purely a compilation of material sent by many people to the birding-aus mailing list. It has not been checked for accuracy nor its content verified in any way. If you wish to get material removed from the archive or have other queries about the archive e-mail Andrew Taylor at this address: andrewt@cse.unsw.EDU.AU