?

Log in

No account? Create an account

Join Limit

Regarding the problem where we have a limit of 32 joins, this is the change I recently made to ticket 1516:

************************************************************************************************

 Jeff and I have have been able to reason that any attempt to reduce the
 number of joins must be done in the database side of things; i.e. you
 cannot split the queries, execute the sub-queries then and use set
 theoretic operations to build the resulting set yourself.  [This is because
you need to be able to do orderings and you will need the target table to
be joined, but you don't know what the target table is so all the joins need
to be present. You could probably do some work to figure this out, but
it's really hacky and not what good programmers do.]

 We have also reasoned mathematically that, with the current database
 schema, it is not possible to reduce the number of joins by more than a
 factor of two.  So the worst case filter limit would be 16 filters.

 Options:

  * Change the database schema to have fewer tables and joins
  * Locate the join limit and change it

 The second option is clearly retarded.  Moreover, the limit is set by
 SQLite, not Alchemy, so it "can't" be done.

 We tried changing the schema and discovered that the one we have is quite
 elegant and we run into much greater problems if we use anything
 different.  Besides, changing the schema will most likely only give a
 constant multiplicative benefit; i.e. we can't do infinitely many joins
 without changing the query mechanism.

************************************************************************************************

In high school I was naive enough to think that programmers didn't need to know much math. Boy was I wrong. But Jeff says that the problem might not occur in Postgres... but we know there are portals using SQLite.

What to do?

I've made this a low priority problem and will move on to test the system starting next week. Luke is also testing, but since I know the ticket system like the back of my hand, I should also test. I will also create a screen cast.

There are some bugs that need to be fixed (e.g. tags, email notification preferences, etc), and some CSS that should be made pretty, but I'm comfortable handing all of this off to anyone else.

Ringing the Death Knell

As of today, Wednesday, July 9th, all the large components of the new ticketing system have seen light. The ticketing system is ready to move from core development into pre-QA and QA.

What needs to be done:
  • There is a join cap set by Alchemy of 32 joins. Because of the very hefty new database schema, this can easily be reached with just 8 ticket queries. Therefore, the join mechanism in the TicketQuery needs to be upgraded. This is actually the highest priority item right now.
  • Add the Dojo font-end for creating, deleting, and changing fields. This will be done by Jeff and is probably a small step in the development process despite the fact that it is central to the system
  • Integrate the Dojo upgrade into newt
  • There needs to be extensive functional testing
  • The unit tests should be run to ensure that newt fails exactly when trunk fails and for the same reasons
Bug issues seen so far:
  • The field name and the field preview on the Ticket New and Ticket View pages should be wrapped and scrollable, not overflow the table
  • Tags do not work for new tickets
  • Users cannot change email notification defaults in project preferences
  • The ticket query sub-system needs major testing
Enhancements I foresee in the near future:
  • The kid templates make judicious use of CSS, but there are still a few tables used (namely when assignig field values); perhaps layout can be implemented with CSS, not tables.
  • There is currently no option to view the cross-project ticket query results in other formats (e.g. comma-delimited)
  • The tags box on the new ticket page is awkwardly placed and will probably be changed by someone
  • Many unexpected errors can be raised in the myriad of sub-components and only DrProject errors (and some others) are handled by the request object majestically. Perhaps the req object will be retrofitted to catch any unexpected errors (I anticipate mainly parse errors) and warn the user without crashing the portal
  • A brief code review would be useful
  • We use JSON to communicate between JavaScript and Python. This will likely be changed to something else until JSON is adopted. This is not so hard since the use of JSON is *highly* localized.
In the past half-a-week, I've made great strides in making the code much more modular and readable, and I've tried to anticipate security concerns as wells future extension. Security really doesn't seem to be a big issue (or maybe I'm just not well versed enough in the field).

Enhancements I forsee in the distant future:
  • Cross-project fields. Fields that are, not only named the same across projects, but actually are the same in the database. Probably useful for cross-project ticket queries. I have made no effort to support this.
  • Cross-project ticket queries can be filtered by a very minute set of fields since fields cannot be assumed to be the same across multiple projects. This may be changed in the future by playing some database tricks. Changes here are either easy or they're not; I can't anticipate what will be done and so I haven't made any effort to this end.
  • The ticket query sub-system and its associated controller permit somewhat narrow query formulations. The user can select a field, then select an operation (*is*, for example), then she can select any number of possible values. Notce that she cannot query a particular field in more than one mode (e.g. A *is* x _or_ A *is not* y). Moreover, there was never any need for AND in the old system (almost), but in the new system AND can be very useful (e.g. A *starts with* x _and_ A *ends with* y). This is currently not possible without boolean logic "tricks". Changes of this nature should not be difficult given the new modularity of the query sub-system.
  • A user is currently notified about a ticket creation/change if she is mentioned in any of the fields of type user for that ticket. This may change so that a user can choose which field changes she would be liked to notified about. I've have made the system extensible to this end.
  • The fate of the 'All' project and the 'Annon' & 'Nobody' users are not known and the system may need to change to accomodate this. I've made the code extensible to this end.
I've also made a great effort to separate 'model', 'view' and 'controller'. Recently I moved all the business logic out of the controllers and the databse schemas into the ticket and milestone APIs. "All the business logic" has a subjective conotation of course. I've removed all the mapper extensions and have moved user input validation into the controllers. I've moved GUI stuff into the kid templates and the CSS files. I feel the system is much better segmented now into logical divisions based on purpose.

Update

I haven't posted in a while since I've been rushing to meet my July 18 deadline.

All I have to do is complete the ticket query; both in the project-specific mode and the cross-project mode. I have 2.5 weeks, but even that might not be enough.

After this is finished, most everything will be done except that project managers still need an interface to add, remove, rename, reorder, etc ticket fields. But this comes under the jurisdiction of Jeff's form editor.

A lot of the DrP screens have fallen under my jurisdiction, but I've made an effort make a lot of them look the same in the new system as they id in the old. So even though my code touches a huge portion of DrP, the screens that are affected number about 5.

Here's a look at all the screens that I've changed...

Creating a new ticket



View / Edit Existing Ticket



Ticket Notifications



Milestone Statistics



Project Statistics


Optimizing Database Transaction Tests

Using the profiler, Jeff and I discovered, as expected, that the unit tests were taking as long as they were because of the setup. The test fixtures involved setting up the in-memory database and populating it with some values. The mechanics of this involved a lot more than one would expect. A mock environment is created, a test database is created, system calls are invoked (which, as we know, take very long), and much else is done.

If the database and environment were setup only once before running all tests, then that would provide a significant speedup.

So we know that nose has hooks like setup_module, setup_class, etc that get run at the respective levels. So the class setup function gets run first and then takes a seat and all the tests run. So we can setup the database and mock environment in that function.

But what about classes that don't just query the database, but create records and delete records as well? Well my solution to this was to create a savepoint using Alchemy before each test then do a rollback after th test.

A First Solution
import unittest
from elixir import objectstore, session

class MyTestCase(unittest.TestCase):
    @classmethod
    def setup_class(self):
       <setup>

    def setUp(self):
       session.begin()
    def tearDown(self):
       objectstore.clear()
       session.rollback()

    <Test Cases>

But there was a problem with this. If somewhere in the middle of a test I expected an exception to be raised, the exception would cause the session to come out of the sub-transaction I had set up and the rest of the test function would then be running outside the sub-transaction, thus messing up the database for other tests. You get an error from Alchemy that tells you to close the session.

A Second Solution
import unittest
from elixir import objectstore, session

class MyTestCase(unittest.TestCase):
    @classmethod
    def setup_class(self):
       <setup>

    def setUp(self):
       session.begin()
    def tearDown(self):
       objectstore.clear()
       session.rollback()

    def test_x(self):
       self.assertRaises(<Exception>, <callable>, <args>, ...)
       session.close()

    <Other Test Cases>


I knew this was bad but I loved the speedup of the tests so I tried to convince Jeff to allow this. But deep down I knew that it's horrible to have the programmer worry about the fixture within the tests case function.

So this was my solution: I'll create an intermediate class between unittest.TestCase and the test class. It'll override the assertRaises function and close the session if the expected exception is raised.

A Working Solution
import unittest
from elixir import objectstore, session

class TestCase(unittest.TestCase):

    def setup_func(self):
       pass
    def teardown_func(self):
       pass

    def setUp(self):
       session.begin()
       self.setup_func()

    def tearDown(self):
       objectstore.clear()
       session.rollback()
       self.teardown_func()

    def failUnlessRaises(self, excClass, callableObj, *args, **kwargs):
       try:
          callableObj(*args, **kwargs)
       except excClass:
          session.close()
          return
       else:
          if hasattr(excClass, '__name__')
             excName = excClass.__name__
          else:
             excName = str(excClass)
          raise self.failureException, "%s not raised." % excName

    assertRaises = failUnlessRaises

------

from drproject.test import TestCase

class MyTestCase(TestCase):
    @classmethod
    def setup_class(self):
       <setup fixture>

    def setup_func(self):
      
<setup fixture>

    <Test Cases>

But Jeff made a good point in that we want to be able to use the existing API, not have our own functions called setup_func or whatever. Instead of going through the details of how I arrived at this solution, here is the solution itself

The Final Solution
import unittest
from elixir import objectstore, session

class TestCase(unittest.TestCase):

    def __init__(self, methodName='runTest'):
       unittest.TestCase.__init__(self, methodName)
      
       old_setup = self.setUp
       def our_setup():
          session.begin
          old_setup()
       self.setUp = our_setup

       old_teardown = self.tearDown
       def our_teardown():
          objectstore.clear()
          session.rollback()
          old_teardown()
       self.tearDown = old_teardown

    def failUnlessRaises(self, excClass, callableObj, *args, **kwargs):
       try:
          callableObj(*args, **kwargs)
       except excClass:
          session.close()
          return
       else:
          if hasattr(excClass, '__name__')
             excName = excClass.__name__
          else:
             excName = str(excClass)
          raise self.failureException, "%s not raised." % excName

    assertRaises = failUnlessRaises

------

from drproject.test import TestCase

class MyTestCase(TestCase):
    @classmethod
    def setup_class(self):
       <setup fixture>

    def setUp(self):
      
<setup fixture>

    <Test Cases>

Based on the current number of tests, their resource usage, and this computer, the speedup was 5-fold. They're lightning fast now!

NEWT Update

The Database Schema
Here is (probably) the final version of the ticketing system's new database schema, baring any difficulties in developing the UI.



Files in the New System
The following is a cheap diagram giving a general idea of which files are associated with which functionality on a DrP portal.

The red files are core files, the files under template/ directories have been excluded because it's all too obvious what they do, and the other files have poor quality images beside them describing which screens they're associated with. Of course, the admin system has functionality for a lot more than just tickets, so those images have been excluded.



Progress and Next Steps
I have finished:
  1. initial design and functionality decisions
  2. database schema design
  3. back end implementation and core functions
  4. top layer functions in drproject/ticket
  5. testing of core model and top layer functions
  6. fixing whatever newt broke in drproject; this includes email notifications, users' per-project statistics, and some admin system functions
  7. using profiler to optimize code and tests
  8. using coverage analysis to ensure tests are thorough (this is a gray area but I consider this done)
  9. ensuring code style is compliant with PEP 8
Now I have to:
  1. include functions in the admin system to create ticket fields, edit, delete, etc
  2. test those functions via unittest
  3. move on to build/modify the interface
I anticipate 1 and 2 will collectively take about a week, and 3 will (and might as well) take the entire rest of my time here.

Supplemental

Previewing Ticket ChangesSo I solved the problem of name collisions easily by assigning the fields to a dictionary.

web_ui.py is actually coming along now. The problem with assigning objects, as I mentioned before has also been solved, albeit in a roundabout way. The problem, more specifically, is what happens when a user decides to preview a new ticket or changes to an existing one. In the previous version, the ticket was created/updated but no changes were flushed to the database. If it was a preview, the ticket (and its associated milestone) are expunged so the database sees no change while the request object sees this newly updated ticket. Everyone's happy. But in the new version, creating a ticket is very involved; you have to update it with all the existing fields, many of which may be foreign keys. So you can't just expunge a ticket. So here's my roundabout solution (better ones are welcome with open arms):

When a user opts to create a new ticket, the ticket is created in the database. Necessary information is then passed back to the request object, and the ticket is deleted if it was a preview.

When a user opts to update a ticket, the changes are applied in the database and the necessary data is passed to the request object. If it was a preview, we do a rollback and assign the old values of the ticket fields.

I have to test what happens when the user clicks preview multiple times one after the other.

June 16

Working from home
It's about 12:30 right now and I'm working from home because I'm sick. I started work at about 10:00 this morning. Although I'm feeling well enough to think about DrP problems, I don't think I would've been able to give up the comfort of my home and actually travel to work.

A Problem Solved
I don't work or think about work over the weekend, so the ticketing system was on hold with a certain problem. The HTML sends a request object to Python. The object has many fields including two dictionaries, req.data and req.args. The name of fields are in req.args and their values are stored in different ways (e.g. strings, integers). When I was extracting the values of the fields, sometimes I was getting strings which I could parse (by sending them through the parse functions for the data types), and sometimes they were coming out as objects like Enums (which I couldn't really send to any parse function).

This caused a problem since the TIcket.setvalue function takes strings. I could try converting the objects I was getting to their string representation, but that turned out to be pretty much impossible (and besides, I would be changing objects to strings back to objects).

So what I did was circumvent the setvalue() function altogether. The setvalue() function changes the value of a ticket field by assigning an object to it, then it creates a TicketChange instance to log the change. So I created a function that assigns the fields from the request object to a ticket all at once and returns a dict of changes. I then used that dict to log the changes.

Name Collisions
I still need to find a place in the req object to put the values of ticket fields. In the current situation, users can create fields called "action" for example. This will collide with the existing req.data["action"] which is not a ticket field. I don't think the problem is that serious right now, but it is something I have to look into.

Adopting a DrP Component
Since I adopted the DrP ticketing system, I've had the overwhelming feeling that Ticket is now mine! I.e. even though it's open source stuff, ticket is now in my branch, it's mine, I will do with it as I please, and no one else can touch it until it's merged into trunk. I am above the law and free to ignore style conventions etc. When it's merged into trunk, people may change my style to comply with conventions and I can't stop them, but until then it's my world and I'm boss.

That's bad, right?

Jun. 13th, 2008

It seems the following (Python) files contain work that needs to be done once I finish newt:
  • drproject/admin/api.py
  • drproject/userpages/tests/test_api.py
  • drproject/userpages/api.py
That's a lot fewer than I thought, but I may be incorrect and in for a surprise; you never know.

I'm about half a week ahead of schedule.

My Challenges and Their Resolutions

Setting up DrProject on Windows
Problem: I need to use a screen magnifier which has drivers only for Windows.

Solution: So in the first week of my job I spent time modifying the DrProject install and documenting instructions on installing DrProject on Windows. Here's a more permanent link.

Tools for Developing DrProject
Problem: DrProject is larger than most software code bases I've worked with and some big-boy tools were in order.

Solution: Here are the tools I use:
  • Eclipse as my IDE, and I've set it to convert tabs to 4-spaces (Windows -> Preferences -> General -> Editors -> Text Editors)
  • Python 2.5, the bindings for SVN, Java 6, and Cygwin
  • IPython and the corresponding readline tool for debugging, testing out Python code theories, running parts of the DrProject code under a controlled environment, and many other necessary debugging features
  • nose (attainable by easy_install nose) for testing and profiling
  • SVN plugin for Eclipse (http://subclipse.tigris.org/update_1.2.x)
  • PyDev plugin for Python in Eclipse (http://pydev.sourceforge.net/updates/)
Shortcuts
Problem: Even with the given tools, you need ways to do things quickly.

Solution: I'm working on the ticketing system for DrProject so I've added the following line to my ~/.bashrc in Cygwin so I'm where I want to be as soon as I open Cygwin:

cd /DrProject/drproject/ticket/tests

I've set up alias in ~/.bashrc as follows:

alias tst='nosetests'
    (-v for verbose output which is often useful)
    (follow tst with the file name to run a particular test file only)
    (follow the filename with :<ClassName>.<test_function> to run a particular test)
alias profile='tst --with-profile --profile-stats-file=<output filename>'
    (here's how to analyze the data in the output file)
alias setup='/setup.sh'
alias init='/init.sh'
alias go='/go.sh'
    (these aliases point to shell scripts which record the CWD so I can setup DrProject from any directory)
    (here is an example of the contents of one of these scripts; this could be done in other ways)
    #!/bin/sh
  DIR=`pwd`
  cd /DrProject/
  python setup.py develop
  cd $DIR
alias flake='c:/cygwin/Pyflakes/bin/pyflakes'

I find it very useful to fold and unfold functions and classes in Eclipse, which can be done using CTRL+9 and CTRL+0

I also have a macro for IPython to set up a fixture so I can immediately start using the DrProject code there. Using TAB for completion is very useful and can shed light on available functions. "?" following a a class name or function name will give the docstring and "??" will give the code if available. Here's my macro as an exmaple:

from datetime import *
from drproject import *
from drproject.api import *
from drproject.db.util import flush
from drproject.scripting import *
from drproject.ticket.model import *
from drproject.ticket.api import *
from drproject.ticket.types import *
from drproject.ticket.roadmap import *
from elixir import *
from sqlalchemy import *

env = Environment()
env.configure('/cygwin/DrProject/hacking/drp_root/')
setup_all()

p = Project.query.first()

t1 = TicketSystem().add_ticket(p, 'me1')
t2 = TicketSystem().add_ticket(p, 'me2')
t3 = TicketSystem().add_ticket(p, 'me3')
t4 = TicketSystem().add_ticket(p, 'me4')
t5 = TicketSystem().add_ticket(p, 'me5')
t6 = TicketSystem().add_ticket(p, 'me6')
t7 = TicketSystem().add_ticket(p, 'me7')
t8 = TicketSystem().add_ticket(p, 'me8')
t9 = TicketSystem().add_ticket(p, 'me9')

It can be created using %macro <macro name> <IPython line numbers with the instructions to include; can be given sequentially with spaces between them or as a range sing -; e.g. 1 2 3 4-10>

Then you can save the macro using store <macro name>, and edit it using notepad by ed <macro name>; don't forget to store it again when you change it.

Managing Imports
Problem: Imports can sometimes cause problems

Solution: Try to avoid from X import * because you want to be explicit in what imports you're using, and importing everything may lead to cyclic imports and other problems. Use PyFlakes to help with determining which imports are being used, which are unused, etc.

If you get an error like "cannot import X", you may have a cyclic import. If this cannot be avoided, you can move the import from one of the files down from the top of the file into the function that's using it.

When using Elixir, you can create ManyToOne, OneToMany, etc relationships. When you do this, you need to pass in the name of the class which represents the table being referenced. E.g. f = ManyToOne('Y', primary_key=True). Even though "Y" is a string here, that class still needs to be imported.

Merging Trunk into your Branch
Problem: SVN's built-in merging is retarded.

Solution: Use the svnmerge script.

Libraries that DrProject Uses
Problem: DrProject uses a lot of libraries and third-party stuff; learning to use all this is very necessary since these tools make life a lot easier.

Solution: DrProject uses at least the following:
  • the Python tutorial
  • SQLAlchemy for interacting with the database in an object oriented way
  • Elixir for creating the database schema using Python classes
  • Dojo as a layer on top of JavaScript; here are some demos
  • Kid to send Python objects back and forth between Python and HTML files and resolve Python code in HTML
Making Tests Run Faster
Problem: Some tests were just too slow.

Solution: In query.py, all I needed to do was create the database once since I wasn't doing any updates (just queries). But the setUp() function gets run for each test in unittest.TestCase. You canreplace the setUp() function signature with this:

    @classmethod
    def setup_class(self)
       ...

Eclipse will no longer be able to run the tests correctly, but you can use nose to run them and it'll run the setup only once per class (not per test method). There is a similar function for module level setup.

Defining Default behaviour in Elixir
Problem: Elixir allows many options (including table names) using the using_options and using_table_options functions. E.g.

class MilestoneChange(Entity):
    milestone = ManyToOne('Milestone', primary_key=True)
    time = elixir.Field(DateTime, default=datetime.now, primary_key=True)
    ...

    using_options(tablename='milestone_change', order_by=['time'])

But some options persist for each class and you don't want to declare them each time.

Solution: You can import and use (at the top of the file outside each class) the function options_defaults. E.g.

    options_defaults['shortnames']=True
    options_defaults['inheritance']='multi'

Database Polymorphism
Problem: Sometimes you want classes which extend Elixir's Entity class (and hence are part of the database schema) to use polymorphism, inheritance, all that good stuff.

Solution: Simply extend the parent class and remember to set the 'multi' option default. The parent class will automatically be given a a row called 'row_type' which it will use to differentiate between between its descendants.

You can have many layers of inheritance. E.g.

class Field(Entity):
    details = ManyToOne('FieldDescription')
    ticket = ManyToOne('Ticket')

    def parse(self, val):
       ...

    def html(self):
       ...

    def __unicode__(self):
       ...

class PrimitiveField(Field):
    pass

class ComplexField(Field):
    def get_filter_col(self):
       ...

class WikiField(PrimitiveField):
    value = elixir.Field(UnicodeText)

    def parse(self, val):
        ...

    def html(self):
       ...

    def __unicode__(self):
       ...

class TicketField(ComplexField):
    value = ManyToOne('Ticket')
    filter_col = 'id'

    def parse(self, val):
        ...

    def __unicode__(self):
       ...

    def get_filter_col(self):
       ...

Querying Polymorphic Types
Problem: If I had a variable f of type Field and I knew that it was more specifically of type WikiField, I couldn't just assume that; the queries wouldn't work.

Solution: When dealing with polymorphism, use Alchemy's of_type(), with_parent(), and any() functions. Alchemy will do a join with the appropriate tables. E.g.

Assume the Ticket class has a row called "fields" of type Field and I want THE WikiField called X with value Y:
        Ticket.fields.of_type(WikiField).any(and_(WikiField.name==X, value==Y))

If I want ANY field of type WikiField with value Y:
        Ticket.fields.of_type(WikiField).any(value==Y)

Getting Specific Module Members
Problem: I wanted all classes that extended Field in a dictionary that maps the name of the class to the class itself; e.g. 'WikiField' => WikiField.

Solution:
from inspect import isclass
def get_field_classes(cls, members=locals()):
    return dict((c.__name__, c) for c in filter(isclass, members.values())
                if issubclass(c, cls) and c is not cls)

Notice the named parameter "members" assigned locals(). This is done because locals() within the function is different from locals() outside of it.

Merging Dictionaries
Problem: Sometimes I have two Python dictionaries which I want to combine into one big one.

Solution: I use the following idiom to merge dict a and b

c = dict(a.items() + b.items())

Shadowing
Problem: I wanted to call a class Field but Elixir already uses that name.

Solution: So I called my class Field and imported elixir. I then used elixir.Field for elixir's version.

Accessing Table Properties
Problem: I want to access information about a database table.

Solution:
  • list of column names: <Table>.c.keys()
  • a particular column object: <Table>.c.<column name>
  • the value of a column for a particular row: <table_instance>.<column name>
  • the table name: <Table>.table.name
  • the table properties: <Table>.table
  • pretty list of columns: <Table>._descriptor._columns
  • primary keys: <Table>._descriptor.primary_keys
  • natural ordering of the table: <Table>._descriptor.order_by
  • foreign keys: <Table>._descriptor.relationships
  • foreign key table object: <Table>._descriptor.relationships[<#X>].target
Tables with Non-Primitive Values (Foreign Keys)
Problem: If I wanted to dynamically get values from tables, it was fine so long as the `value` field I was looking for was a primitive data type. But if it was a foreign key, I didn't know which of the columns of the foreign table to return (or do whatever with).

Solution: I added a variable (table column) to the class that held the reference. It was a string which represented the column of the foreign table to use. Its parent class had a function which resolved this string and rturned the appropriate foreign column object.

 I also added a __unicode__ function to the classes which returned the desired column from the foreign table. So unicode(X) where X is of type MilestoneField will return the name of the milestone which X is referencing.