+ Reply to Thread
Results 1 to 8 of 8
  1. #1
    Member
    Join Date
    Nov 2000
    Location
    Malden, MA
    Posts
    140

    Excel Macro Help

    Hi all,

    I was wondering if any of you programming mavens could extend some kindness?

    I am using a column of data: 15 digit numbers treated essentially as 15 letter words, I am looking for cells with less than a 3 letter difference. I've got a macro that will search a column, after selecting a cell, and highlight like values; and then print the integer difference offset by two columns. I'd like for is to search the column automatically.

    Which I don't know how to do. This is the problem.

    I'd like highlight Column A, or take (start with) the first value in Column A, and then have it search the whole column for like numbers within that 3 integer threshold. What I have now on will do that and then display the integer difference value about two columns over.

    What'd be new would be this:
    1) After the search (of the first value in colA) anything with a Integer Difference <=3 would get a group number or counter. So all "like" values would display say, a 1 (for group 1), instead of their integer difference value.
    2) Then the macro would then drop down to the next unlike (not like Cell1) cell, an Integer Diff >=4, and then do a search for like cells (IntDiff <=3)among the rest of the unchosen, those like cells would be given a group or number 2 offset.
    3) And then so on and so on. Until all cells have group value. If the cells are unique- they'd have no group. Or rather they'd be unique groups of one.

    Example:
    ColA ------------- ColB -- Explanation
    515563435640731 X1 unique
    777777777720771 X2 Group 1
    777777777760731 X3 "
    776377777760771 X4 "
    777777777760771 X5 "
    777777777721110 X6 Group 2
    777777777731110 X7 "

    I'll get this:
    colA ---------------colB ---- colC
    515563435640731 X1 (unique, could be given a group number, too)
    777777777720771 X2 1
    777777777760731 X3 1
    776377777760771 X4 1
    777777777760771 X5 1
    777777777721110 X6 2
    777777777731110 X7 2

    X1 has no likes.
    X2 has X3,4,5 within 3. Group 1.
    X6 has X7 within 3. Group 2.

    It'd be great if anyone could help-- I'd appreciate it very much. Please reply here, or my email.

    Thanks, Mike


    [This message has been edited by radio1_mike (edited 06-28-2001).]

    [This message has been edited by radio1_mike (edited 06-28-2001).]

  2. #2
    Member
    Join Date
    Nov 2000
    Location
    Malden, MA
    Posts
    140
    The 15 digit number is treated as text because Excel will turn a number into scientific notation.

    I am dealing with just one column of data, that I'd like to be lumped into "related" groups. A group ID number placed a column over would help when filtering groups.

    The classic bubble sort. Ah yes, if this was 15 years ago and I was in my college Pascal class--- I might be able to help myself.

    But, thanks for the help. If you're interested I could send you a column of the data, and the macro I am already using.


  3. #3
    Banned qball's Avatar
    Join Date
    Oct 1999
    Location
    Rockaway, NJ 07866
    Posts
    1,730
    I am using a column of data: 15 digit numbers treated essentially as 15 letter words, I am looking for cells with less than a 3 letter difference
    You basically want to match 15 character strings with less than 3 'place mismatches'. Great, I can state the obvious.

    Anyway, one can do this, but it is slow and tedious. One problem is data integrity. Are they 15 digit numbers or 15 character strings? I cannot stress the importance of the last statement.

    Onward. To avoid much delay and confusion, let's just say every column (how many rows, btw?) has 15 characters that resolve to a 15 digit number. Examples:
    123456789012345
    098765432112345

    Not:
    17
    12345678901234A
    abcvdgeksbnfhfk

    We'll (really, I'll), call this the fundamental assumption, or fa. Assuming fa,
    you need to do this:
    get data in first row;
    loop through all remaining rows;
    for each loop, grab data in row, do another loop from character 1 to 15 performing comparison between the character in row one and the row you are comparing, keeping a count of non-matching characters. Once count reaches 4, you can stop comparison and move to data in next row. If count less than 4 when done with 15th character comparison, you have found a winner and need to make note of row that matches.
    Get data in row 2 and do the same thing you did with data from row 1, with one exception, no need to compare with row 1, you already did that.
    By the time you get to the data in the last row, your done, as it has already been compared with all other values.

    Ya having fun yet? Anyway, a great example of the looping and comparison of values is the classic bubble sort. You can find this code anywhere, though you don't need a character by character comparison in a bubble sort.

    Putting the data in a real DB is not gonna change the processing needed, as the condition of less than 3 character matches is a tough requirement.



  4. #4
    Banned qball's Avatar
    Join Date
    Oct 1999
    Location
    Rockaway, NJ 07866
    Posts
    1,730
    The 15 digit number is treated as text because Excel will turn a number into scientific notation.
    This is true if the cells have a 'General' format. Highlight the whole column (or cells, or whole spreadsheet)and format as 'Number' with 0 decimal places. This won't work with '000000000000001' (or any number of preceeding '0's), as number format will make '1'.

    Now, I believe I understand what you want, but I would like to ask why? Bear with me. You have a bunch of numbers with 15 digits and you would like to find 'like' values and group them. What do the numbers represent? And why does 'like' (less than 4 digit differences) have to do with this? What about data that fits into more than one group? For example (row 1-3):
    111111111111111
    111111111111555
    111111111555555

    Row 1 and row 2 are 'like', thus create group. Row 1 and row 3 are not 'like', thus nuffin. Row 2 and row 3 are 'like', thus create another group???? How do you represent data with multiple groups?

    Given the given data and requirements, you have created a rather daunting task. The relevance of the 15 digit data, may make this easier.

    With the given reqs, there is no way around a lot of processing. What I would do (with given reqs, and no other knowledge, though amount of data is relevant) is write a java (or C, or VB, or PB, or FORTRAN) class (with nifty methods for comparison) that would process the data from a text file (saved from spreadsheet). It would read the file into an array and then perform computations as described in previous post. I would then output to another text file the data and relationships. From my above example:
    111111111111111 g1
    111111111111555 g1,g2
    111111111555555 g2

    This could get messy, depending upon data. Now, my examples implies sorting by nature of my data, this most likely is not the case with the real data.

    Still, not done. One will need to then process this new file with relationships to make any sense of found relationships.

    As you may be finding, this is a data analysis problem. This may or may not be easily done in a spreadsheet. Heck, I'm claiming it taint easy using any resources, as cross groupings are problematic. If someone knows, or you find an easy way to do this, more power to ya and post solution.

    If you need someone to write a program to analyze this data, I can do that. If you neee someone to write an excel macro to accomplish this, I can do that also. The excel macro will take thrice as long as the program, due to reqs. Unfortunately, for you, this is what I do for a living, so it would cost you.

  5. #5
    Member
    Join Date
    Nov 2000
    Location
    Malden, MA
    Posts
    140
    Well, I would not want to take up any of your professional time.

    But to explain what I am doing is this:

    I am running a DNA analysis (ID) on some clinical isolates. The test in question tests for 43 different pieces of DNA. A unique sample will generate a unique pattern based on the presence of each of the 43 different fragments.

    This is a 43 digit binomial. This is also rather unwieldy to work with. So, we have chopped it up into an octal code. 3 places will give an octal number. '111' = 7

    So, 43 digits = 14 x 3 + 1
    14 octal numbers plus the leftover being 1 or 0. This 15 digit number represents a shortened but still spatial code.

    The macro I use right now will check this data column for like (less than 3 place diffs) cells based on a manually selected cell. I just want to automate it with a FIFO stack sort using the macro I already have.



  6. #6
    Banned qball's Avatar
    Join Date
    Oct 1999
    Location
    Rockaway, NJ 07866
    Posts
    1,730
    Ahh, DNA, for what it is worth, I was guessing that.

    Anyway, iffin you already have a macro that checks a manually selected cell for matches with all other cells, then improving macro to automatically do each cell is easy. The problem still lies in the output. Each run will create output, since it runs on a manually selected cell, the output is most likely put into the columns to the right. For example, 10 rows of one column 15 digit data, needs to be string (15 chars) as octal 1 is 001. I diverge, anyway, if we choose row 1 column 1 and test all other 9 rows we will have relationship data in columns to the right. If we then choose row 2 column 1 and run the same macro, that data will be overwritten (maybe!!) for this last run.

    Herein lies a data analysis quandary. The raw data needs to be analyzed, creating relationship data, which thus has to be analyzed. Either that or I'm missing the point. I don't see your process as one step, it is multiple steps. Still could be done in a spreadsheet. The data analysis needed demands a high degree of computation and that is best done programatically.

  7. #7
    Senior Member
    Join Date
    May 1999
    Location
    Denver, CO, USA
    Posts
    775
    jeez, I just typed for like 15 mins b4 I noticed that ur given example answered my question for you... sheesh

    anyway, it seems to me that there are two primary ways to do this. you could do a digit by digit comparison from the original, but as qball said, this would take FOREVER in excel. not as long if you wrote a program to do it, but it would still stink as a technique. I think that perhaps another way to do it that is essentially the same, but better is to write a program that generates a hash code based on an algorithm for the sample, and then compares that with hash codes for each other sample based on another algorithm.

    as I've been thinking about this though, I think that it would be better to have this 15 digit "code" stored as a character string instead of as a number because numbers have inherent structure in terms of decimal places. since "decimal place" is not siginificant with ur data, dealing with this issue may be more trouble than it's worth.

    finally, perhaps another way to do this is to have your macro, essentially have 2 extremely different steps

    1)start at cell 1, then proceed through each cell making the comparison, if no other cells match, label the group appropriately. proceed to cell 2 and and then check through each cell (not previous cells and not cells with group numbers already assigned) and assign group numbers accordingly. continue in this manner until all of the cells are assigned to their respective groups.

    2) sort the group row in ascending order after step 1 is completed. this requires that the output of step 1 is a number and not a character.


    regardless, I think that qball is right in saying that a program would be best for doing this. In fact, in java, there is such a thing as a vector, which could be removed from the vector array right after it is assigned a group number and placed in another vector array. Although vectors are slower than dealing with straight arrays, they have the convience of being able to have a dynamic size. I would assume that something similar to this exists in C and other languages, but I don't know how to use anything but java.

    anyway, please keep us updated with what you're doing!

    -Z

  8. #8
    Member
    Join Date
    Nov 2000
    Location
    Malden, MA
    Posts
    140
    Hey, thanks, for the ideas.

    Well, the program is actually ending. So, if I can't get this part off the ground, it won't hurt.

    Just something, that's make the analysis a little easier.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts







New Security Features Planned for Firefox 4
Another Laptop Theft Exposes 21K Patients' Data
Oracle Hits to Road to Pitch Data Center Plans
Microsoft Preps Array of Windows Patches
Microsoft Nears IE9 Beta With Final Preview
Simplified Analytics Improve CRM, BI Tools
Android Passes RIM as Top Mobile OS in 2Q
VMware Updates Hyperic System Management
File Monitoring Key to Enterprise Security
LinkedIn Snaps Up SaaS Player mSpoke