Removing non matching files
Removing non matching files
am 17.09.2007 14:21:45 von dsphunxion
I've got an issue hoping someone could provide a better fix for
(current method works but is sloppy). I've
20,000+ directories with the following in them:
-rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
-rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
However at times I get this:
-rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
Which is giving me issues (when there is a txt file and no gsm or wav
to match), I need to be able to look from the top directory, go into
every single directory and remove the *.txt file if there is no
matching *.gsm or wav file.
-rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
-rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
-rw------- 1 root root 232 Sep 13 00:47 msg0011.txt
*11.txt would have to go and so on.
for i in `sed 1 20000`
do
for j in `cd $i`
do
# List both text files and wav files
ls *.txt > txtFile ; ls *.wav > wavFile ; ls *.gsm > gsmFile
# Get rid of file descriptors to leave numbers only
perl -pi -e 's:.txt::g;s:.wav::g;s:.gsm::g' txtFile wavFile gsmFile
# Since I want to rid only text files, diff both lists
# and for those *.txt files without a matching *.wav
# file, delete the *.txt file
diff wavFile txtFile |grep ">"|sed 's:>::g;s:$:.txt:g'|xargs rm
diff gsmFile txtFile |grep ">"|sed 's:>::g;s:$:.txt:g'|xargs rm
done
done
Works, but is sloppy and likely wasting resources, Ugly yes but I
tried to re-butcher something uglier someone else started.
Re: Removing non matching files
am 17.09.2007 15:14:59 von William James
On Sep 17, 7:21 am, sil wrote:
> I've got an issue hoping someone could provide a better fix for
> (current method works but is sloppy). I've
> 20,000+ directories with the following in them:
>
> -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>
> However at times I get this:
>
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>
> Which is giving me issues (when there is a txt file and no gsm or wav
> to match), I need to be able to look from the top directory, go into
> every single directory and remove the *.txt file if there is no
> matching *.gsm or wav file.
>
> -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
> -rw------- 1 root root 232 Sep 13 00:47 msg0011.txt
>
> *11.txt would have to go and so on.
>
> for i in `sed 1 20000`
>
> do
>
> for j in `cd $i`
>
> do
> # List both text files and wav files
>
> ls *.txt > txtFile ; ls *.wav > wavFile ; ls *.gsm > gsmFile
>
> # Get rid of file descriptors to leave numbers only
>
> perl -pi -e 's:.txt::g;s:.wav::g;s:.gsm::g' txtFile wavFile gsmFile
>
> # Since I want to rid only text files, diff both lists
> # and for those *.txt files without a matching *.wav
> # file, delete the *.txt file
>
> diff wavFile txtFile |grep ">"|sed 's:>::g;s:$:.txt:g'|xargs rm
> diff gsmFile txtFile |grep ">"|sed 's:>::g;s:$:.txt:g'|xargs rm
> done
>
> done
>
> Works, but is sloppy and likely wasting resources, Ugly yes but I
> tried to re-butcher something uglier someone else started.
#!ruby
# Assumes directories are named "1", "2", etc.
1.upto( 20_000 ){|i|
dir = i.to_s
break unless File.exists?( dir )
# Create list of txt files and other files.
# Remove the 3-character extension.
txt, other = Dir[ "#{ dir }/*" ].partition{|x| x =~ /txt$/}.
map{|a| a.map{|s| s[ /.*\./ ] }}
txt.each{|f|
if not other.include?( f )
command = "rm " + f + "txt"
puts command
## Uncomment next line when sure.
# system( command )
end
}
}
Re: Removing non matching files
am 17.09.2007 16:02:58 von Janis Papanagnou
On 17 Sep., 14:21, sil wrote:
> I've got an issue hoping someone could provide a better fix for
> (current method works but is sloppy). I've
> 20,000+ directories with the following in them:
>
> -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>
> However at times I get this:
>
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>
> Which is giving me issues (when there is a txt file and no gsm or wav
> to match), I need to be able to look from the top directory, go into
> every single directory and remove the *.txt file if there is no
> matching *.gsm or wav file.
If you have the regular filenames as shown above...
find . -name *.txt |
while read -r f
do [[ -f "${f%.txt}.gsm" ]] ||
[[ -f "${f%.txt}.wav" ]] ||
rm -f "$f"
done
Though I'd take care with such a requirement, *.txt files are just too
common; you may accidentally remove *.txt files you didn't intend if
you'll call it from the wrong directory. Better use msg*.txt and build
in some control to start the find operation in a determined directory
subtree.
Janis
>
> -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
> -rw------- 1 root root 232 Sep 13 00:47 msg0011.txt
>
> *11.txt would have to go and so on.
>
> for i in `sed 1 20000`
>
> do
>
> for j in `cd $i`
>
> do
> # List both text files and wav files
>
> ls *.txt > txtFile ; ls *.wav > wavFile ; ls *.gsm > gsmFile
>
> # Get rid of file descriptors to leave numbers only
>
> perl -pi -e 's:.txt::g;s:.wav::g;s:.gsm::g' txtFile wavFile gsmFile
>
> # Since I want to rid only text files, diff both lists
> # and for those *.txt files without a matching *.wav
> # file, delete the *.txt file
>
> diff wavFile txtFile |grep ">"|sed 's:>::g;s:$:.txt:g'|xargs rm
> diff gsmFile txtFile |grep ">"|sed 's:>::g;s:$:.txt:g'|xargs rm
> done
>
> done
>
> Works, but is sloppy and likely wasting resources, Ugly yes but I
> tried to re-butcher something uglier someone else started.
Re: Removing non matching files
am 17.09.2007 16:06:37 von John L
"sil" wrote in message news:1190031705.593445.167780@y42g2000hsy.googlegroups.com.. .
> I've got an issue hoping someone could provide a better fix for
> (current method works but is sloppy). I've
> 20,000+ directories with the following in them:
>
> -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>
> However at times I get this:
>
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>
> Which is giving me issues (when there is a txt file and no gsm or wav
> to match), I need to be able to look from the top directory, go into
> every single directory and remove the *.txt file if there is no
> matching *.gsm or wav file.
>
>
> -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
> -rw------- 1 root root 232 Sep 13 00:47 msg0011.txt
>
> *11.txt would have to go and so on.
>
The approach I would take is this: use find to list the files
with -printf to pad each name to the same length, and
then pipe to uniq which should compare the names
only up to but not including their .txt/.wav suffixes.
Note that I am assuming Gnu utilities: your system's native
find and uniq (and xargs) commands may differ.
find topdir -type f -printf "%10h:%f\n" |uniq --check-chars=17
You might need to change the details (especially the 10 and
17 depending on how long the name of topdir is) depending on
your actual needs (and remember the \n). When you are happy,
you can pipe the output to xargs --no-run-if-empty rm
--
John.
Re: Removing non matching files
am 17.09.2007 16:15:25 von John L
"John L" wrote in message news:46ee89cb$0$760$bed64819@news.gradwell.net...
>
> "sil" wrote in message news:1190031705.593445.167780@y42g2000hsy.googlegroups.com.. .
> > I've got an issue hoping someone could provide a better fix for
> > (current method works but is sloppy). I've
> > 20,000+ directories with the following in them:
> >
> > -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> > -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
> >
> > However at times I get this:
> >
> > -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
> >
> > Which is giving me issues (when there is a txt file and no gsm or wav
> > to match), I need to be able to look from the top directory, go into
> > every single directory and remove the *.txt file if there is no
> > matching *.gsm or wav file.
> >
> >
> > -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> > -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
> > -rw------- 1 root root 232 Sep 13 00:47 msg0011.txt
> >
> > *11.txt would have to go and so on.
> >
>
> The approach I would take is this: use find to list the files
> with -printf to pad each name to the same length, and
> then pipe to uniq which should compare the names
> only up to but not including their .txt/.wav suffixes.
>
> Note that I am assuming Gnu utilities: your system's native
> find and uniq (and xargs) commands may differ.
>
> find topdir -type f -printf "%10h:%f\n" |uniq --check-chars=17
>
> You might need to change the details (especially the 10 and
> 17 depending on how long the name of topdir is) depending on
> your actual needs (and remember the \n). When you are happy,
> you can pipe the output to xargs --no-run-if-empty rm
>
Oops: uniq needs -u as well!
--
John.
Re: Removing non matching files
am 17.09.2007 19:36:09 von Cyrus Kriticos
Janis wrote:
>
> If you have the regular filenames as shown above...
>
> find . -name *.txt |
> while read -r f
> do [[ -f "${f%.txt}.gsm" ]] ||
> [[ -f "${f%.txt}.wav" ]] ||
> rm -f "$f"
> done
>
Very nice solution!
> Though I'd take care with such a requirement, *.txt files are just too
> common; you may accidentally remove *.txt files you didn't intend if
> you'll call it from the wrong directory. Better use msg*.txt and build
> in some control to start the find operation in a determined directory
> subtree.
e.g.
cd dir_with_20000_subdirs && find . -name *.txt |
..
..
..
--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide
Re: Removing non matching files
am 17.09.2007 22:14:45 von Dummy
sil wrote:
> I've got an issue hoping someone could provide a better fix for
> (current method works but is sloppy). I've
> 20,000+ directories with the following in them:
>
> -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>
> However at times I get this:
>
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>
> Which is giving me issues (when there is a txt file and no gsm or wav
> to match), I need to be able to look from the top directory, go into
> every single directory and remove the *.txt file if there is no
> matching *.gsm or wav file.
>
>
> -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
> -rw------- 1 root root 232 Sep 13 00:47 msg0011.txt
>
> *11.txt would have to go and so on.
>
> for i in `sed 1 20000`
>
> do
>
> for j in `cd $i`
>
> do
> # List both text files and wav files
>
> ls *.txt > txtFile ; ls *.wav > wavFile ; ls *.gsm > gsmFile
>
> # Get rid of file descriptors to leave numbers only
>
> perl -pi -e 's:.txt::g;s:.wav::g;s:.gsm::g' txtFile wavFile gsmFile
>
> # Since I want to rid only text files, diff both lists
> # and for those *.txt files without a matching *.wav
> # file, delete the *.txt file
>
> diff wavFile txtFile |grep ">"|sed 's:>::g;s:$:.txt:g'|xargs rm
> diff gsmFile txtFile |grep ">"|sed 's:>::g;s:$:.txt:g'|xargs rm
> done
>
> done
>
> Works, but is sloppy and likely wasting resources, Ugly yes but I
> tried to re-butcher something uglier someone else started.
Untested:
perl -e'
/^(.+)\.([^.]+)$/ and $file{$1}{$2}++ for <*.txt *.wav>;
$file{$_}{txt} and not $file{$_}{wav} and unlink "$_.txt" for keys %file;
'
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Re: Removing non matching files
am 17.09.2007 22:54:34 von cfajohnson
On 2007-09-17, Janis wrote:
> On 17 Sep., 14:21, sil wrote:
>> I've got an issue hoping someone could provide a better fix for
>> (current method works but is sloppy). I've
>> 20,000+ directories with the following in them:
>>
>> -rwx------ 1 root root 5.9K Sep 13 00:37 msg0010.gsm
>> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>>
>> However at times I get this:
>>
>> -rw------- 1 root root 232 Sep 13 00:37 msg0010.txt
>>
>> Which is giving me issues (when there is a txt file and no gsm or wav
>> to match), I need to be able to look from the top directory, go into
>> every single directory and remove the *.txt file if there is no
>> matching *.gsm or wav file.
>
> If you have the regular filenames as shown above...
>
> find . -name *.txt |
That will fail if there are any .txt files in the current
directory; you should quote the pattern and, as you mention below,
be more specific:
find . -name 'msg*.txt' |
> while read -r f
> do [[ -f "${f%.txt}.gsm" ]] ||
> [[ -f "${f%.txt}.wav" ]] ||
I'd avoid the non-standard [[ ... ]] and use [ ... ].
> rm -f "$f"
> done
>
> Though I'd take care with such a requirement, *.txt files are just too
> common; you may accidentally remove *.txt files you didn't intend if
> you'll call it from the wrong directory. Better use msg*.txt and build
> in some control to start the find operation in a determined directory
> subtree.
--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence