Design notes Foundation 2/7/86 Problem: References to mutable objects get screwed up when realloc makes them bigger through copying. Solution: Refer to mutable objects through one known place which will get the new pointer. Or else signal relocation up through the tree. Hmmmm. That may work even when offsets change. But it may take a while for relocations to get propagated... ---- 2/10/86 It occurs to me that arrays should be mutable, AND they should be typed. In this way we can deal with mutations in the following way: When something in an array moves, the owner of the array is signaled. The owner is responsible for taking action on the change depending on the type of array it is. I think that perhaps a RAW should actually go into an array of raw. Types of arrays: Raw characters Pointers to nodes Pointers to procedures (so far) ---- 2/12/86 I REALLY don't like the way Regions are shaping up. When one little Array gets a new first element, Every Region that references every element in the Array must be changed. It is easier to walk up the tree and find all the parents in such a case. Provided Raw text can only have one parent Node! Here is an idea for enforcing the hierarchy: Require that a particular TYPE be defined to have numerical significance. A document is 10, A chapter is 19, a Section is 18, a Paragraph is 17, and so on. When nesting is attempted, compare the tens digit (or whatever) for equality so that nesting can occur at all, and then test to see if the units digit (or whatever) is smaller than the parent. If not, the nesting is invalid. ---- 2/13/86 I noticed the confusion I got into in the design document talking about raw data and the Raw abstract data type. I think it is best to describe raw character data as char * and get rid of the typedef to Raw. Graphics, bitmaps and text have fundamentally different ways of being viewed, being changed, and being utilized. Therfore, although there will be a conceptual similarity between the three owing to their being raw data, the system should be designed to make the DIFFERENCES between the three most clear. This means that the old idea of generic procedures manipulating Raw data of all types is not valid. In fact, just the opposite happens. I am re-tinking raws and nodes. I am trying to simplify the nesting and the operations of nodes. How about: Region of text Array of blocks of raw text symbolic name for region Generic type of region Specific type of region Other Regions that care about this Region Operations Each operation can run on one or n characters at a time. insert delete copyout (to another block of raw or another node or another reigion?) transform like a pipe operation? replace text in this region, or copyout Position cursor (for insert, etc) offset in region find (match a string and position cursor there) The insert option might want an optional transformation, The delete operation might want an optional transformation Copyout might want an optional transformation. There might want to be a replace operation. These four ideas are jsut different ways of organizing the basic operations given. ---- 2/13/86 I noticed the confusion I got into in the design document talking about raw data and the Raw abstract data type. I think it is best to describe raw character data as char * and get rid of the typedef to Raw. Graphics, bitmaps and text have fundamentally different ways of being viewed, being changed, and being utilized. Therfore, although there will be a conceptual similarity between the three owing to their being raw data, the system should be designed to make the DIFFERENCES between the three most clear. This means that the old idea of generic procedures manipulating Raw data of all types is not valid. In fact, just the opposite happens. I am begining to think that having a float in the middle of a block of text is a bad idea. It makes all the accessing code more complex. Special routines need to be called to read across the hole, and the information on where the hole is isn't kept in the array. Putting the float at the end would make things simpler: anything that knows how to scan a C string would work unmodified. That is a considerable savings in programmer effort and understanding to use the system. Putting the float at the end of the array of characters can be made arbitrarily efficient. If, whenever we insert new text in a new place we open a new array, and then consolidate arrays when the system is not doing anything, we will probably end up winning. This consolidation and new application of float requires more thought... There is no good reason to use the zeroeth element of the Array as a size parameter. It is a bad idea for the following reason: Some routine responsible for filling arrays needs to know both how many elements there are and how filled the array is. Knowing one without the other is almost useless. The only time it helps is when iterating over all elements of the array, it tells when to stop. Either both must be maintained in a user visible way, or routines for iterating and accessing must be provided. What should the effect be of accessing beyond the end of an array? What cost are we willing to bear in preventing the accessing beyond the end? In Clu, we would just write routines to do it. In C, we would just let people access beyond the end of the routine. I think the solution may be to use macros which expand to in-line code. It affords protection, and it is simple to use, and it is fast. More on operations on low level data: Funny Copyout - needs destination and optional transformation. (source, destination, offset) (insert bacwards) Transform - in insert, in-line, to other area??? (source, destination, offset, transform) onto itself? do we check s/d (or have a flag for overwrite) or allocate (or let the proc do it) or assume (if overwrite is needed, the proc will do it) is order important? forw/back through characters? most things go forward, but... Specify: Source Destination Transformation if any Blocksize - number of chars in a unit block count ALL First Last One Cases: Insert Overwrite Source and Destination same Source and Destination different -------- Final decision: Primops: Find (source) -> offset (direction)? or range? Delete (destination, offset) Insert (source, destination, offset) Size Create Destroy The primops are hooked into the storage allocator for each mutable data type. The user is free to mung the data as she sees fit. A special debugging mode will be added to the storage allocator which will write a magic constant into ALL the "unused" words of a block, and then check to see if they are touched. This is the only range checking you will get! Low level data: Array, char *, graph *, bitmap *. ---- 2/14/86 In doing the design on the generic primop procedures for the dynamic data I decided that Insert was too complicated to use, and not in keeping with the C philosophy. Instead it should have a widen command which puts in zeros into a hole it opens in a block: Widen Enlarge a data block shifting data down if necessary. Takes pointer to block pointer, offset to begin opening new space, and count of space to add. Returns -1 if offset is past the end of the block, or if the widen fails to get more space. ---- 2/19/86 Additional debugging mode for allocator: Check freed pointer against array of allocated pointers. Tell of free of non-allocated block. ---- 2/20/86 With the basic text storage management out of the way, it is time to think of I/O. What will we want to do with I/O? We shall want a central control over File descriptors at least for input. We shall want to read text into buffers. We shall want to write text from buffers. We shall want to re-format text into buffers. Can a Region function as a buffer? hmmmmmmmmm... ---- 2/27/86 When designing re-display, additional things had to be added to regions to create blocks of copied text derrived from a region. This addtion seems to work ok for redisplay, but right now seems a bit too ad-hoc. It may turn out to help for outline and annotation modes and for diff. If ever fancy diff is added...