Removing Duplcates In Filemaker The Easy Way

Removing Duplcates In Filemaker The Easy Way

am 16.06.2007 04:18:29 von squeed2000

If you need to be able to identify duplicate records in your Filemaker
database, read on, here is a simple way to accomplish this with 5
minutes worth of work and all done within a script.

One of the things that Filemaker Pro (all versions) is missing is a
good duplicate record identification and removal/merger feature.
I've seen a lot of posts here on how to identify and remove duplicates
but they all seem very complicated and involve linking and alternate
tables.

I've figured out a way to create a simple duplicate tagging script
that you can use to identify duplcates in your Filemaker database.

For this example, the database is called "DATA.fp7" and the script is
called "Flag Dupes". This script determines what records are
duplicate records based on records having an identical address.
Again, this is a simple example, if you wanted to identify duplicate
records based on more than just the value of one field, you could
modify this method to use multiple fields, very easily.

The database that I am using has the following fields :

Contact Name, Address, City, State, ZIP and two fields that I created
for the purpose of identifying duplicates. Those fields are
"Duplicate" and "global".

The global field needs to have a storage option of global. i.e. the
same value for every record in the database.

The script is below. What isn't shown is what the sort option is.
The sort option is "Address" because address is the field that I am
searching on for duplicates.

The result of running this script will be that all records that are
duplicates except for one unique of the duplicates will be marked
"Dupe". After the script has been run, you need to do a search on
"Dupe" and delete those found records.

Here is how it works :

The script starts by showing all records in the database, then sorts
the records by address. It then begins to look through all of the
addresses and where the address of a record equals the address of the
prior record, that record gets flagged as a "Dupe". That's it.

If you try this and you like it, let me know, post back here, I'm
interested in your comments or suggestions.


Show All Records
Sort Records [Restore; No dialog]
Go to Record/Request/Page [First]
Copy [Select; DATA::Address]
Paste [Select; DATA::global]
Loop
Go to Record/Request/Page [Next; Exit after last]
If [DATA::Address = DATA::global]
Set Field [DATA::Duplicate; "Dupe"]
End If
Copy [Select; DATA:Address]
Paste [Select; DATA::global]
End Loop


-Squeed

Re: Removing Duplcates In Filemaker The Easy Way

am 16.06.2007 05:42:30 von Helpful Harry

In article <1181960309.788392.267950@q75g2000hsh.googlegroups.com>,
squeed2000@yahoo.com wrote:


>
> Show All Records
> Sort Records [Restore; No dialog]
> Go to Record/Request/Page [First]
> Copy [Select; DATA::Address]
> Paste [Select; DATA::global]
> Loop
> Go to Record/Request/Page [Next; Exit after last]
> If [DATA::Address = DATA::global]
> Set Field [DATA::Duplicate; "Dupe"]
> End If
> Copy [Select; DATA:Address]
> Paste [Select; DATA::global]
> End Loop

There's no real problems with this script and it should work fine as
is, but there are a few suggestions I can think of to make the script
run faster when processing lOTS of records or on a slow computer /
network.

First, don't bother with Copy and Paste - it's a waste of time since
the If command won't test for changes in font, colour, etc. and means
the Global field needs to be on the current layout. Instead use a
single Set Field command.

The If test inside the Loop could do with an Else command. Instead of
changing the content of Global on every record, just change it when the
Address changes.

Lastly a Freeze Window command would mean that FileMaker doesn't have
to keep updating the display each time it moves to the next record. The
window will automatically unfreeze when the script ends.

That would give you a script something like ...

Show All Records
Sort Records [ Restore, No Dialog]
Go To Record/Request/Page [First]
Set Field [Global, Address]
Freeze Window
Loop
Go To Record/Request/Page [Next, Exit After last]
If [Address = Global]
Set Field [Duplicate, "Dupe"]
Else
Set Field [Global, Address]
End If
End Loop



Of course, you could cut down the number of records the script has to
process by letting FileMaker find the duplicates first using the "!"
symbol. Adding a little error capture for when there are no duplicates,
you get something like ...

Enter Find Mode []
Set Field [Address, "!"]
Set Error Capture [On]
Perform Find []
Sort Records [ Restore, No Dialog]
Go To Record/Request/Page [First]
If [Status(CurrentFoundCount) = 0] <--- see note below
Message ["There are no duplicates"]
Else
Freeze Window
Set Field [Global, Address]
Loop
Go To Record/Request/Page [Next, Exit After last]
If [Address = Global]
Set Field [Duplicate, "Dupe"]
Else
Set Field [Global, Address]
End If
End Loop
End If


Note: The Status(CurrentFoundCount) function is for older versions of
FileMaker. For newer versions it has probably been changed to
Get(CurrentFoundCount).





Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)

Re: Removing Duplcates In Filemaker The Easy Way

am 16.06.2007 10:14:28 von ursus.kirk

>
> There's no real problems with this script and it should work fine as
> is, but there are a few suggestions I can think of to make the script
> run faster when processing lOTS of records or on a slow computer /
> network.
>
> First, don't bother with Copy and Paste - it's a waste of time since
> the If command won't test for changes in font, colour, etc. and means
> the Global field needs to be on the current layout. Instead use a
> single Set Field command.
>
> The If test inside the Loop could do with an Else command. Instead of
> changing the content of Global on every record, just change it when the
> Address changes.
>
> Lastly a Freeze Window command would mean that FileMaker doesn't have
> to keep updating the display each time it moves to the next record. The
> window will automatically unfreeze when the script ends.
>
> That would give you a script something like ...
>
> Show All Records
> Sort Records [ Restore, No Dialog]
> Go To Record/Request/Page [First]
> Set Field [Global, Address]
> Freeze Window
> Loop
> Go To Record/Request/Page [Next, Exit After last]
> If [Address = Global]
> Set Field [Duplicate, "Dupe"]
> Else
> Set Field [Global, Address]
> End If
> End Loop
>
>
>
> Of course, you could cut down the number of records the script has to
> process by letting FileMaker find the duplicates first using the "!"
> symbol. Adding a little error capture for when there are no duplicates,
> you get something like ...
>
> Enter Find Mode []
> Set Field [Address, "!"]
> Set Error Capture [On]
> Perform Find []
> Sort Records [ Restore, No Dialog]
> Go To Record/Request/Page [First]
> If [Status(CurrentFoundCount) = 0] <--- see note below
> Message ["There are no duplicates"]
> Else
> Freeze Window
> Set Field [Global, Address]
> Loop
> Go To Record/Request/Page [Next, Exit After last]
> If [Address = Global]
> Set Field [Duplicate, "Dupe"]
> Else
> Set Field [Global, Address]
> End If
> End Loop
> End If
>
>
> Note: The Status(CurrentFoundCount) function is for older versions of
> FileMaker. For newer versions it has probably been changed to
> Get(CurrentFoundCount).
>
Finding duplicates is never going to be easy.
There are no problems in itself with this solution. But there is no way the
solution can test on East Fifth Street and E5th st. In other words spelling
differences or spelling mistakes will never, ever be found. I in my
experience there will allways be some (or even many) records that should be
trapped but simply aren't.

So after such a script that finds all duplicate addresses you may also try
duplicate posat codes (zipcodes) and mark these as to be reviewed. You will
find false marked records, since postalcodes may be shared. But at least you
might get rid of some more double records. As said: it is never going to be
straight forward.

Keep well, Ursus

Keep well, Ursus

Re: Removing Duplcates In Filemaker The Easy Way

am 16.06.2007 23:35:05 von Helpful Harry

In article <46739bcd$0$99407$dbd4f001@news.wanadoo.nl>, "Ursus"
wrote:
>
> Finding duplicates is never going to be easy.
> There are no problems in itself with this solution. But there is no way the
> solution can test on East Fifth Street and E5th st. In other words spelling
> differences or spelling mistakes will never, ever be found. I in my
> experience there will allways be some (or even many) records that should be
> trapped but simply aren't.
>
> So after such a script that finds all duplicate addresses you may also try
> duplicate posat codes (zipcodes) and mark these as to be reviewed. You will
> find false marked records, since postalcodes may be shared. But at least you
> might get rid of some more double records. As said: it is never going to be
> straight forward.
>
> Keep well, Ursus

True, you'll never be able to account for human error (typos) or
inconsistancy in data entry. That's simply one of the facts of working
with databases that you have to live with.

I think the "Address" field in the scripts was really just an example
field - you can easily swap it to Name or Phone Number or whatever to
find duplicates in other fields.

There's also the problem that the same address (again as an example)
might belong to different people that you do want to keep in the
system. This is why you should NEVER have a script that simply deletes
supposed "duplicates". Marking them is a much better idea and then
someone can manually go through making sure they really are duplicates
and deleting them when appropriate.









Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)

Re: Removing Duplcates In Filemaker The Easy Way

am 17.06.2007 00:02:25 von d-42

I usually that technique, although with Harry's improvements. (And
they ARE improvements.)

That said, for applications with LOTS of records there are even faster
techniques:

define a self-join relationship on the field you are going to detect
duplicates.
define an indexed field to hold the duplicate flag.
perform a find on ! to reduce the record set to just the records with
duplicates.
perform a replace on the field to hold:

self-join::pkey = pkey

(Its a boolean comparison not assignment, and the result will be 1 or
0)

Then just perform a find on the field for the 0s.

This works because if the field is unique, the record will be related
to itself, and it will find its own key on the other side, and the
comparison will be true. If its the first duplicate, it will still see
itself, and return a true. But the 2nd and beyond duplicates will see
the 'first one' through the relationship, and thus return 0's.

Its faster in big cases because a lot more of the activity can
potentially take place on the server, saving you from having to shunt
records back and forth with the client with each loop cycle. It also
eliminates the need for a sort which which is requires super linear
time, while all of the operations above operate in linear time or
faster (on 'normal' datasets). On a big enough data set the sort
operation takes more time than everything else combined.

Its also more limited; in that due to the way relationships work its
not suitable for multiline fields, fields that exceed the maximum key
length, and so forth.

-cheers,
Dave