Cómo recorrer un directorio de forma recursiva para eliminar archivos con ciertas extensiones

I need to loop through a directory recursively and remove all files with extension .pdf y .doc. I'm managing to loop through a directory recursively but not managing to filter the files with the above mentioned file extensions.

Mi código hasta ahora

#/bin/sh

SEARCH_FOLDER="/tmp/*"

for f in $SEARCH_FOLDER
do
    if [ -d "$f" ]
    then
        for ff in $f/*
        do      
            echo "Processing $ff"
        done
    else
        echo "Processing file $f"
    fi
done

I need help to complete the code, since I'm not getting anywhere.

preguntado el 09 de enero de 11 a las 11:01

I know it's bad form to execute code without understanding it, but a lot of people come to this site to learn bash scripting. I got here by googling "bash scripting files recursively", and casi ran one of these answers (just to test the recursion) without realizing it would delete files. I know rm is a part of OP's code, but it's not actually relevant to the question asked. I think it'd be safer if answers were phrased using a harmless command like echo. -

@Keith had similar experience, completely agree and changed the title -

15 Respuestas

find is just made for that.

find /tmp -name '*.pdf' -or -name '*.doc' | xargs rm

Respondido el 09 de enero de 11 a las 14:01

Or find's -delete opción. - Matthew Flaschen

One should always use find ... -print0 | xargs -0 ..., not raw find | xargs to avoid problems with filenames containing newlines. - Grumbel

Usar xargs with no options is almost always bad advice and this is no exception. Use find … -exec en lugar de. - Gilles 'SO- deja de ser malvado'

@Gilles'SO-stopbeingevil': Why is that bad advice? - Carl Winbäck

@CarlWinbäck Because the syntax of the input to xargs is not the syntax that find (or any other common command) prints. xargs expects a particular kind of quote-delimited input. - Gilles 'SO- deja de ser malvado'

As a followup to mouviciel's answer, you could also do this as a for loop, instead of using xargs. I often find xargs cumbersome, especially if I need to do something more complicated in each iteration.

for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm $f; done

As a number of people have commented, this will fail if there are spaces in filenames. You can work around this by temporarily setting the IFS (internal field seperator) to the newline character. This also fails if there are wildcard characters \[?* in the file names. You can work around that by temporarily disabling wildcard expansion (globbing).

IFS=$'\n'; set -f
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm "$f"; done
unset IFS; set +f

If you have newlines in your filenames, then that won't work either. You're better off with an xargs based solution:

find /tmp \( -name '*.pdf' -or -name '*.doc' \) -print0 | xargs -0 rm

(The escaped brackets are required here to have the -print0 apply to both or clauses.)

GNU and *BSD find also has a -delete action, which would look like this:

find /tmp \( -name '*.pdf' -or -name '*.doc' \) -delete

Respondido 13 Jul 17, 18:07

This does not work as expected if there is a space in the file name (the for loop splits the results of find on whitespace). - Trev

How do you avaoid splitting on whitespace? I'm trying a similar thing and I have a lot of directories with whitespaces that screw up this loop. - cristiano

because it's a very helpful answer? - Zenperttu

@Christian Fix the whitespace splitting by using quotes like this: "$(find...)". I've edited James' answer to show. - Mateo

@Matthew your edit didn't fix anything at all: it actually made the command only work if there's a unique found file. At least this version funciona if there are no spaces, tabs, etc. in filenames. I rolled back to the old version. Noting sensible can really fix a for f in $(find ...). Just don't use this method. - gniourf_gniourf

Sin find:

for f in /tmp/* tmp/**/* ; do
  ...
done;

/tmp/* are files in dir and /tmp/**/* are files in subfolders. It is possible that you have to enable globstar option (shopt -s globstar). So for the question the code should look like this:

shopt -s globstar
for f in /tmp/*.pdf /tmp/*.doc tmp/**/*.pdf tmp/**/*.doc ; do
  rm "$f"
done

Note that this requires bash ≥4.0 (or zsh without shopt -s globstar, or ksh with set -o globstar en lugar de shopt -s globstar). Furthermore, in bash <4.3, this traverses symbolic links to directories as well as directories, which is usually not desirable.

Respondido 13 Jul 17, 18:07

This method worked for me, even with filenames containing spaces on OSX - asilo de ideas

Worth noting that globstar is only available in Bash 4.0 or newer.. which is not the default version on many machines. - Troy Howard

I dont think you need to specify the first argument. (At least as of today,) for f in /tmp/** will be enough. Includes the files from /tmp dir. - phil294

Wouldn't it be better like this ? for f in /tmp/*.{pdf,doc} tmp/**/*.{,pdf,doc} ; do - Ice-Blaze

** is a nice extension but not portable to POSIX sh. (This question is tagged golpear but it would be nice to point out that unlike several of the solutions here, this really is Bash-only. Or, well, it works in several other extended shells, too.) - triples

If you want to do something recursively, I suggest you use recursion (yes, you can do it using stacks and so on, but hey).

recursiverm() {
  for d in *; do
    if [ -d "$d" ]; then
      (cd -- "$d" && recursiverm)
    fi
    rm -f *.pdf
    rm -f *.doc
  done
}

(cd /tmp; recursiverm)

Dicho esto, las find is probably a better choice as has already been suggested.

respondido 13 nov., 15:17

Here is an example using shell (bash):

#!/bin/bash

# loop & print a folder recusively,
print_folder_recurse() {
    for i in "$1"/*;do
        if [ -d "$i" ];then
            echo "dir: $i"
            print_folder_recurse "$i"
        elif [ -f "$i" ]; then
            echo "file: $i"
        fi
    done
}


# try get path from param
path=""
if [ -d "$1" ]; then
    path=$1;
else
    path="/tmp"
fi

echo "base path: $path"
print_folder_recurse $path

Respondido el 06 de diciembre de 15 a las 06:12

This doesn't answer your question directly, but you can solve your problem with a one-liner:

find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -exec rm {} +

Some versions of find (GNU, BSD) have a -delete action which you can use instead of calling rm:

find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -delete

Respondido 13 Jul 17, 18:07

This method handles spaces well.

files="$(find -L "$dir" -type f)"
echo "Count: $(echo -n "$files" | wc -l)"
echo "$files" | while read file; do
  echo "$file"
done

Edit, fixes off-by-one

function count() {
    files="$(find -L "$1" -type f)";
    if [[ "$files" == "" ]]; then
        echo "No files";
        return 0;
    fi
    file_count=$(echo "$files" | wc -l)
    echo "Count: $file_count"
    echo "$files" | while read file; do
        echo "$file"
    done
}

Respondido el 13 de junio de 17 a las 23:06

I think "-n" flag after echo not needed. Just test it yourself: with "-n" your script gives wrong number of files. For exactly one file in directory it outputs "Count: 0" - Vaca

This doesn't work with all file names: it fails with spaces at the end of the name, with file names containing newlines and with some file names containing backslashes. These defects could be fixed but the whole approach is needlessly complex so it isn't worth bothering. - Gilles 'SO- deja de ser malvado'

For bash (since version 4.0):

shopt -s globstar nullglob dotglob
echo **/*".ext"

Eso es todo.
The trailing extension ".ext" there to select files (or dirs) with that extension.

Option globstar activates the ** (search recursivelly).
Option nullglob removes an * when it matches no file/dir.
Option dotglob includes files that start wit a dot (hidden files).

Beware that before bash 4.3, **/ also traverses symbolic links to directories which is not desirable.

Respondido 13 Jul 17, 18:07

The following function would recursively iterate through all the directories in the \home\ubuntu directory( whole directory structure under ubuntu ) and apply the necessary checks in else bloquear.

function check {
        for file in $1/*      
        do
        if [ -d "$file" ]
        then
                check $file                          
        else
               ##check for the file
               if [ $(head -c 4 "$file") = "%PDF" ]; then
                         rm -r $file
               fi
        fi
        done     
}
domain=/home/ubuntu
check $domain

Respondido 08 Oct 16, 21:10

There is no reason to pipe the output of find into another utility. find tiene un -delete flag built into it.

find /tmp -name '*.pdf' -or -name '*.doc' -delete

Respondido 21 Feb 19, 01:02

This is the simplest way I know to do this: rm **/@(*.doc|*.pdf)

** makes this work recursively

@(*.doc|*.pdf) looks for a file ending in pdf OR doc

Easy to safely test by replacing rm es ls

Respondido 05 Feb 20, 04:02

The other answers provided will not include files or directories that start with a . the following worked for me:

#/bin/sh
getAll()
{
  local fl1="$1"/*;
  local fl2="$1"/.[!.]*; 
  local fl3="$1"/..?*;
  for inpath in "$1"/* "$1"/.[!.]* "$1"/..?*; do
    if [ "$inpath" != "$fl1" -a "$inpath" != "$fl2" -a "$inpath" != "$fl3" ]; then 
      stat --printf="%F\0%n\0\n" -- "$inpath";
      if [ -d "$inpath" ]; then
        getAll "$inpath"
      #elif [ -f $inpath ]; then
      fi;
    fi;
  done;
}

Respondido 13 Abr '20, 10:04

Solo haz

find . -name '*.pdf'|xargs rm

Respondido 20 Feb 13, 13:02

No, don't do this. This breaks if you have filenames with spaces or other funny symbols. - gniourf_gniourf

The following will loop through the given directory recursively and list all the contents :

for d in /home/ubuntu/*; do echo "listing contents of dir: $d"; ls -l $d/; done

Respondido 07 Oct 14, 16:10

No, this function does not traverse anything recursively. It only lists the content of the subdirectories. It's just fluff around ls -l /home/ubuntu/*/, so it's pretty useless. - Gilles 'SO- deja de ser malvado'

If you can change the shell used to run the command, you can use ZSH to do the job.

#!/usr/bin/zsh

for file in /tmp/**/*
do
    echo $file
done

This will recursively loop through all files/folders.

Respondido el 27 de enero de 20 a las 16:01

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.