INSTRUCTIONS The text between the lines BODY and ENDBODY is made of 1249 lines and 47686 bytes (not counting or ) In the following table this count is broken down by ASCII code; immediately following the code is the corresponding character. 30236 lowercase letters 1889 uppercase letters 1028 digits 2 ASCII characters 9 5635 ASCII characters 32 20 ASCII characters 34 " 1074 ASCII characters 36 $ 7 ASCII characters 37 % 74 ASCII characters 38 & 14 ASCII characters 39 ' 555 ASCII characters 40 ( 552 ASCII characters 41 ) 69 ASCII characters 42 * 78 ASCII characters 43 + 481 ASCII characters 44 , 351 ASCII characters 45 - 406 ASCII characters 46 . 2 ASCII characters 47 / 27 ASCII characters 58 : 38 ASCII characters 59 ; 12 ASCII characters 60 < 231 ASCII characters 61 = 27 ASCII characters 62 > 1 ASCII characters 64 @ 40 ASCII characters 91 [ 2466 ASCII characters 92 \ 40 ASCII characters 93 ] 485 ASCII characters 94 ^ 502 ASCII characters 95 _ 6 ASCII characters 96 ` 522 ASCII characters 123 { 295 ASCII characters 124 | 521 ASCII characters 125 } BODY \documentstyle {amsppt} \magnification \magstep1 \openup3\jot \NoBlackBoxes \pageno=1 \hsize 6 truein %\hoffset -.5 truein %\voffset -.5 truein %here are the needed definitions \font\slv=cmsl10 scaled \magstep1 \def\missingstuff{\vskip1cm\centerline{\bf MISSING STUFF}\vskip1cm} \def\wt{\widetilde} \def\wh{\widehat} \def\ov{\overline} \def\R{\Cal S} \def\E{\Bbb E} \def\F{\Cal F} \def\M{\Cal M} \def\ve{\varepsilon} \def\C{\Cal C} \def\Z{\Bbb Z} \def\To{\Bbb T} \font\sob=cmss10 \def\Id{{\hbox{\sob 1\kern-.8mm l}}} %this is the identity for matrices %here I redifine "\enddemo" \predefine\enddimost{\enddemo} \redefine\enddemo {\penalty 5000\hskip15pt plus1pt minus5pt\penalty1000 \qed\enddimost} \font\bit=cmmi10 scaled\magstep 1 \def\e{\hbox{\bit e}} \def\tb{|\hskip-.08em |\hskip-.08em |} \def\today{\ifcase\month\or January\or February\or March\or April\or May\or June\or July\or August\or September\or October\or November\or December\fi \space\number\day, \number\year} %here starts the document \topmatter \title CENTRAL LIMIT THEOREM FOR DETERMINISTIC SYSTEMS \endtitle \author Carlangelo Liverani \endauthor \affil University of Rome {\sl Tor Vergata} \endaffil \address Liverani Carlangelo, Mathematics Department, University of Rome II, Tor Vergata, 00133 Rome, Italy. \endaddress \email liverani\@mat.utovrm.it \endemail \date July 27, 1995 \enddate \abstract A unified approach to obtaining the central limit theorem for hyperbolic dynamical systems is presented. It builds on previous results for one dimensional maps but it applies to the multidimensional case as well. \endabstract \thanks \bf This paper originated out of discussions with D. Szasz and A. Kramli, and was made possible by D.Szasz key suggestion to use K-partitions. I wish to thank E. Olivieri, E. Presutti, B. Tot, and L. Triolo for helpful discussions. In addition, I am indebted to S.Olla for explaining me the subtleties of the Kipnis-Varadhan approach. This work has been partially supported by grant CIPA-CT92-4016 of the Commission of the European Community. \endthanks \endtopmatter \vskip -.5cm \centerline{\bf CONTENT} %\vskip -.5cm \newdimen\riga \newdimen\rigat \riga=\baselineskip \rigat=\lineskip \baselineskip=.5\baselineskip \lineskip=.5\lineskip \roster \item"0." Introduction\dotfill p. \ 2 \item"1." A general probabilistic result\dotfill p.\ 3 \item"2." Non invertible maps\dotfill p. 11 \item"3." Invertible maps\dotfill p. 12 \item" " References\dotfill p. 16 \endroster \baselineskip=\riga \lineskip=\rigat \vfil\par\newpage \document \vskip1cm \subhead \S 0 Introduction \endsubhead \vskip1cm A discrete time dynamical system consists of a measurable space $X$ together with a $\sigma$-algebra $\Cal F$, a measurable map $T:X\to X$ which describes the dynamics, and a probability measure $P$ invariant with respect to $T$. This setting is particularly well suited to study problems involving statistical properties of the motion of deterministic systems. Typically the properties of interest are ergodicity, mixing, bounds on the decay of correlations, Central Limit Theorems ( CLT ) and so on. Several approaches have been developed to tackle such problems at various levels. Given a system, one first explores the weaker statistical properties and then tries to investigate the stronger ones using the already obtained results plus some extra properties. The position of this paper in the above mentioned hierarchy is between obtaining bounds on the decay of correlations and CLT. In other words we discuss a general approach that gives checkable conditions under which, in a mixing system, an observable enjoys the CLT. Such general approaches already exist but they are either limited to one dimensional systems \cite{Ke} or relay on the existence of special partitions of the phase space \cite{Ch}, partitions which concrete construction may be far from trivial \cite{BSC1}, \cite{BSC2}; for a very nice review of the state of affairs up to 1989 (but still actual) see \cite{De}. Here, I want to put forward the following point of view: the above described dynamical systems are most naturally viewed as giving rise to a (deterministic) Markov process. It is therefore tempting to think that there should exists some general probabilistic theorem that states abstract conditions for the validity of the CLT, and that all the concrete cases can simply be obtained by the direct application of such a theorem to the system under consideration (without having to code the system in some symbolic type dynamics). General theorems of this type are well known in probability theory but they are normally not well suited for applications to the case at hand. Two such general theorems, tailored for dynamical systems, can be found in this paper. Attempts in this directions already exists for some time \cite{Go}, \cite{IL}, but they are satisfactory only for the one dimensional case (the equivalent of Theorem 1.1 in this paper). Particular mention must be given to \cite{DG}, the results obtained there are essentially comparable to the one presented here in section 1 and could be applied to the multidimensional case. Unfortunately, not much attention is given there to applications, so that the possibility to bypass a symbolic representation of the system is completely overlooked. The approach used here is a martingale approximation inspired by \cite{KV}. Since this is a typical probabilistic technique, I think it underlines very well the purely probabilistic nature of the result hereby clarifying which characteristics of a deterministic system yield such a drastic statistical behavior. As we will see, a major difference with the analogous type results in probability is that the CLT holds for a much smaller class of observables than the square summable ones. This is not an artifact of the proof: it is an inevitable consequence of the deterministic nature of the systems under consideration so that only observables that operates some ``coarse graining" (and therefore enjoy some degree of smoothness) can yield strong statistical behavior. Here no particular attempt is made to find the most general class of observables to which the Theorems apply; nonetheless, the technique put forward lends itself to an extension in such a direction. The paper includes some concrete examples as well. Their aim is to show how the general theorems can be applied in special cases. The cases discussed belong to quite general classes (expanding one dimensional maps, area preserving piecewise smooth uniformly hyperbolic maps in two and more dimensions), yet no real new result is contained in such examples. This reflects the spirit of the paper of presenting an approach to the problem rather than new implementations. Nevertheless, the application of the present results in technically complex situations (e.g., hyperbolic billiards) greatly simplifies the proof of the validity of the CLT. In addition, it is conceivable that some new results can be obtained by this approach since the two above mentioned theorems hold in more general cases than the ones already present in the literature (a brief comparison with previously known results is inserted after the proof of each theorem). The plan of the paper is as follows. Section 1 contains two probabilistic theorems that are well suited for the study of dynamical systems. In fact, they may seem a bit unnatural from the pure probabilistic point of view. On the one hand, both theorems deal only with functions in $L^\infty$ instead than $L^2$. The reason is that normally the decay of correlations in dynamical systems can be obtained only for classes of functions with some amount of smoothness, which makes them automatically bounded. The issue is not purely a matter of taste: a look at the proofs will show that such an hypothesis has really been used and that many key estimates would not hold in $L^2$. On the other hand, in Theorem 2 are introduced $\sigma$-algebras $\F_i$ that behave nicely with respect to the dynamics. This may make little sense from the purely probabilistic point of view but it is instead a cornerstone in the treatment of hyperbolic dynamical systems. In the above sense the results of section 1, although purely probabilistic in nature, are expressly developed for applications to dynamical systems. Section 2 describe how the technique applies to non-invertible maps. The case of piecewise smooth expanding maps of the interval is discussed in detail. Section 3 deals with the most interesting applications: the multidimensional case. As an example I treat a large subclass of piecewise smooth symplectic maps. Such maps are well studied in the literature for some relevant physical models (e.g. billiards) are naturally described in their terms. It is shown that very general considerations imply the applicability of the results developed in section one. \vskip1cm \subhead \S 1 A general probabilistic result \endsubhead \vskip1cm Let $X$ be a complete separable metric space, $\F$ a $\sigma$-algebra, $P$ a probability measure ($P(X)=1$) and $T:X\to X$ a measurable map.\footnote{Actually, we assume that, for each $A\in\F$, not only $T^{-1}A\in\F$ but also $TA\in\F$.} We will call $\E$ the expectation with respect to $P$. In addition, we require that $P$ is invariant with respect to $T$ (i.e., for all $A\in\Cal F$ holds $P(T^{-1}A)=P(A)$), and that the dynamical system $(T,\,X,\,P)$ be ergodic. For each $\phi\in L^2(X)$ define $\wh T:L^2(X)\to L^2(X)$ by $$ \wh T\phi=\phi\circ T , $$ and let $\wh T^* :L^2(X)\to L^2(X)$ be the dual of $\wh T$. If $\E(f)=0$, then by ergodicity $\lim\limits_{n\to\infty}\frac 1n \sum_{i=0}^{n-1}\wh T^n f=\E(f)=0$. The CLT gives us informations on the speed of convergence; namely the conditions under which there exists $\sigma\in\Bbb R^+$: for each interval $I\subset\Bbb R$ $$ \lim_{n\to\infty}P\left(\left\{\frac1{\sqrt n}\sum_{i=0}^{n-1}\wh T^n f\in I\right\}\right)= \frac 1{\sqrt {2\pi}\sigma}\int_I e^{-\frac {x^2}{2\sigma^2}}dx ; $$ this is called ``convergence in law (or distribution)" to a Gaussian random variable of zero mean and variance $\sigma$. Consider a sub-$\sigma$-algebra $\F_0$ of $\F$ and define $\F_i=T^{-i}\F_0$, $i\in\Bbb Z$, then the following holds. \proclaim{Theorem 1.1} If $\F_i$ is coarser than $\F_{i-1}$ and, for each $\phi\in\L^\infty(X)$, we have $$ \E(\wh T\wh T^*\phi|\F_1)=\E(\phi|\F_1), $$ then, for each $f\in\L^\infty(X)$, $\E(f)=0$ and $\E(f|\F_0)=f$, such that \roster \item $\sum_{n=0}^\infty|\E(f\wh T^n f)|<\infty$, \item the series $\sum_{n=0}^\infty\E(\wh T^{*n}f|\F_0)$ converges absolutely almost surely,\footnote{As we will see in the proof, this implies that there exists an almost everywhere finite $\F_0$-meausurable function $g$, such that $f=g-\E(\wh T^*g|\F_0)$.} \endroster the sequence $$ \frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i f $$ converges in law to a Gaussian random variable of zero mean and finite variance $\sigma$, $\sigma^2\leq -\E(f^2)+2\sum_{n=0}^\infty\E(f\wh T^n f)$. In addition, $\sigma=0$ if and only if there exists a $\F_0$--mesurable function $g$ such that $$ \wh Tf=\wh Tg -g . $$ Finally, if (2) converges in $L^1(X)$, then $\sigma^2= -\E(f^2)+2\sum_{n=0}^\infty\E(f\wh T^n f)$. \endproclaim \demo{Proof} The key idea is to use a Martingale approximation. That is, to find $Y_i\in L^2(X)$ and $g$ $\F_0$--measurable, and almost everywhere finite, such that $$ \E(Y_{i-1}|\F_{i})=Y_{i-1}\; ;\;\;\;\E(Y_{i}|\F_i)=0 , \tag 1.1 $$ (i.e., $Y_i$ is a reverse Martingale difference with respect to the filtration $\{\F_i\}_{i=0}^\infty$), and $$ \wh T^if=Y_{i}+\wh T^ig-\wh T^{i-1}g \quad \forall i>0. \tag 1.2 $$ Accordingly, $$ \frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i f= \frac 1{\sqrt n}\sum_{i=0}^{n-1} Y_i+ \frac 1{\sqrt n}[\wh T^ng -g]. \tag 1.3 $$ Equation (1.3) shows that we can obtain the central limit theorem for our random variable provided we have the central limit theorem for the martingale difference $Y_i$. In fact, $\frac 1{\sqrt n}[\wh T^ng -g]$ converges to zero in probability when $n\to\infty$. Note that $(1.1)$ and $(1.2)$ are equivalent to $$ \E(\wh T^if|\F_{i})=\E(\wh T^ig|\F_{i})-\E(\wh T^{i-1}g|\F_{i}) \quad \forall i>0. $$ Since by the definition of $\F_i$ follows that, for each $\phi\in L^1(X)$, $$ \E(\wh T^i\phi|\F_{i})=\wh T^i\E(\phi|\F_0) \quad\forall i>0 , $$ and because the invariance of $\E$ with respect to $T$ implies $\wh T^*\wh T=\Id$, we have $$ \aligned f&=\E(g|\F_0)-\wh T^*\E(g|\F_1)=g-\wh T^*\E(\wh T\wh T^*g|\F_1)\\ &=g-\E(\wh T^*g|\F_0). \endaligned \tag 1.4 $$ It is immediate to see that $g=\sum\limits_{n=0}^\infty\E(\wh T^{*n}f|\F_0)$ (the convergence of the series is the hypothesis (2) in the statement of the theorem) is a solution of the above equation, and therefore of (1.2), (clearly, $Y_i=\wh T^{i-1} Y_1$).\footnote{It is remarkable that, once we have $g$, the $Y_i$ are defined by (1.2) itself, and will automatically satisfy (1.1).} In fact, setting $T_0\phi=\Bbb E(\wh T^*\phi|\Cal F_0)$, the solution of $(1.4)$ is given by the Neumann series $\sum_{n=0}^\infty T_0^nf$. But $T^n_0f=\Bbb E(\wh T^{*n}f|\Cal F_0)$ since $$ \align \E(\wh T^{*}\E(\wh T^{*n}f|\F_0)|\F_0)=& \wh T^*\wh T\E(\wh T^*\E(\wh T^{*n}f|\F_0)|\F_0) =\wh T^*\E(\wh T\wh T^*\E(\wh T^{*n}f|\F_0)|\F_1)\\ =&\wh T^*\E(\E(\wh T^{*n}f|\F_0)|\F_1)=\wh T^*\E(\wh T^{*n}f|\F_1)\\ =&\wh T^*\E(\wh T\wh T^{*(n+1)}f|\F_1)=\E(\wh T^{*(n+1)}f|\F_0) . \endalign $$ To insure that the central limit theorem for $Y_i$ holds, we need only to show that $Y_i$ is square summable due to the following \cite{Ne}: \proclaim{Theorem} Let $(Y_n)_{n\geq 1}$ be a stationary, ergodic, martingale difference (or reversed martingale difference) with respect to the filtration $\{\F_n\}_{n\geq 1}$. If $Y_1\in L^2(X)$, then $\sigma^2=\E(Y_1^2)$ and the CLT holds. \endproclaim The above theorem applies to our case since the stationarity of $(Y_n)$ is implied by the invariance of the measure with respect to $T$, while the ergodicity follows from the ergodicity of the dynamical system $(X,\,T,\,P)$. If the series $\sum_{n=0}^\infty\E(\wh T^{*n}f|\F_0)$ would converge in $L^2(X)$, then $Y_1\in L^2(X)$ would hold and the Theorem would be proven. It is however a remarkable fact that $Y_i$ can be in $L^2(X)$ without $g$ being even integrable \cite{KV}. Unfortunately, the road to this result is a bit indirect and consists in carrying out an argument similar to the one above but producing a sequence of martingale differences $Y_i(\lambda)$ that approximate $Y_i$. Let us look for $Y_i(\lambda)$, $\lambda>1$, such that $$ \E(Y_{i-1}(\lambda)|\F_{i})=Y_{i-1}(\lambda)\; ;\;\;\;\E(Y_{i}(\lambda)|\F_i)=0 ; \tag 1.5 $$ and $$ \wh T^i f=Y_{i}(\lambda)+\wh T^ig(\lambda)-\lambda^{-1}\wh T^{i-1}g(\lambda) \quad \forall i>0,\;\lambda>1. \tag 1.6 $$ In analogy with what we have seen before $g(\lambda)=\sum\limits_{n=0}^\infty\lambda^{-n}\E(\wh T^{*n}f|\F_0)$, only now $g(\lambda)\in L^2(X)$ for each $\lambda>1$. Since $\lim\limits_{\lambda\to 1}g(\lambda)= g(1)=g$ almost surely, it follows that $\lim\limits_{\lambda\to 1}Y_i(\lambda)=Y_i$ almost surely. In addition, $$ \align \E(Y_i(\lambda)^2)=&\E(Y_1(\lambda)^2)=\E([\wh Tf-\wh T g(\lambda)+\lambda^{-1}g(\lambda)]^2)\\ =&\E(\wh Tf[\wh Tf-\wh T g(\lambda)+\lambda^{-1}g(\lambda)])\\ &-\E([\wh Tg(\lambda)- \lambda^{-1}g(\lambda)][\wh Tf-\wh T g(\lambda)+\lambda^{-1}g(\lambda)]), \endalign $$ since $\E(\wh Tf-\wh T g(\lambda)+\lambda^{-1}g(\lambda)|\F_1)=\Bbb E(Y_1| \Cal F_1)=0$. Hence, $$ \align \E(Y_i(\lambda)^2)=&-\E((\wh T f)^2)+\E([\wh T g(\lambda)-\lambda^{-1}g(\lambda)]^2)\\ =&-E(f^2)+\E(\wh Tg(\lambda)[\wh T g(\lambda)-\lambda^{-1}g(\lambda)])\\ &-\lambda^{-1}\E(g(\lambda)\wh T g(\lambda))+\lambda^{-2}\E(\wh Tg(\lambda)^2)\\ =&-\E(f^2)+2\E(\wh Tg(\lambda)[\wh T g(\lambda)-\lambda^{-1}g(\lambda)]) -(1-\lambda^{-2})\E(g(\lambda)^2)\\ =&-\E(f^2)+ 2\E(\wh Tg(\lambda)\wh T f)-(1-\lambda^{-2})\E(g(\lambda)^2)\\ =&-\E(f^2)+2\E(g(\lambda) f)-(1-\lambda^{-2})\E(g(\lambda)^2)\\ \leq&-\E(f^2)+ 2\sum_{n=0}^\infty\lambda^{-n}\E(f\wh T^n f)\leq -\E(f^2)+2\sum_{n=0}^\infty|\E(f\wh T^n f)| . \endalign $$ The wanted estimates follows from $$ \E(Y_1^2)=\E(\liminf_{\lambda\to 1} Y_1(\lambda)^2)\leq \liminf_{\lambda\to 1} \E(Y_1(\lambda)^2)\leq-\E(f^2)+ 2\sum_{n=0}^\infty\E(f\wh T^n f) . $$ In conclusion, we have seen that the random variable under consideration converges in law to a Gaussian of variance $\sigma^2=\E(Y_1^2)<\infty$. If $\sigma=0$ then the second assertion of the statement follows since $$ \E(Y_1^2)=\E([\wh Tf-\wh T g+g]^2) . $$ If we assume that the series in (2) converges in $L^1(X)$, then it is possible to obtain the much sharper result $$ \lim_{\lambda\to 1}\E(Y_1(\lambda)^2)=-\E(f^2)+ 2\sum_{n=0}^\infty\lambda^{-n}\E(f\wh T^n f). $$ In fact, for each $\varepsilon>0$ $$ \aligned \bigg|\E(Y_i(\lambda)^2)-&\E(f^2)+2\sum_{n=0}^\infty\lambda^{-n} \E(f\wh T^n f)\bigg| \leq \sum_{n=0}^\infty (1-\lambda^{-n})\E(f\wh T^n f)\\ &+(1-\lambda^{-2})\E(g(\lambda)^2) \leq (1-\lambda^{-M})\sum_{n=0}^\infty\E(f\wh T^{n}f)+\sum_{n=M}^\infty \E(f\wh T^n f)\\ &+(1-\lambda^{-2})\E(g(\lambda)^2) \leq \varepsilon+(1-\lambda^{-2})\E(g(\lambda)^2) \endaligned $$ where $M$ has been chosen sufficiently large and $\lambda$ sufficiently close to one. In order to continue we need to estimate the last term in the above expression. For further use we will deal with a more general estimate: for each $\lambda,\,\mu\in(1,\,\infty)$ holds $$ \aligned \E(&g(\lambda)g(\mu))=\sum_{n,m=0}^\infty\lambda^{-n}\mu^{-m} \E(\wh T^{*n}f\E(\wh T^{*m}f|\F_0))\\ &\leq\sum_{n=0}^\infty\lambda^{-n}\sum_{m=0}^{M-1}\|f\|_\infty\E(|\E(\wh T^{*n}f|\F_0)|) +\sum_{n=0}^\infty\lambda^{-n}\sum_{m=M}^\infty\E(|\E(\wh T^{*m}f|\F_0)|)\\ &\leq M\|f\|_\infty\sum_{n=0}^\infty\E(|\E(\wh T^{*n}f|\F_0)|)+\frac{\|f\|_\infty}{1-\lambda^{-1}}\sum_{m=M}^\infty\E(|\E(\wh T^{*m}f|\F_0)|). \endaligned \tag 1.7 $$ That is, choosing again $M$ large and $\lambda$ sufficiently close to 1, $$ (1-\lambda^{-1})\E(g(\lambda)^2)\leq 2\varepsilon. $$ This is not the end of the story: it is possible to prove that $Y_1$ is the limit of $Y_1(\lambda)$ in $L^2(X)$. To see this it suffices to estimate $$ \aligned \E([Y_1(\lambda)-Y_1(\mu)]^2)&=\E([\lambda^{-1}g(\lambda)-\mu^{-1}g(\mu)] [Y_1(\lambda)-Y_1(\mu)])\\ &= \E([\lambda^{-1}g(\lambda)-\mu^{-1}g(\mu)]^2)+ \E([g(\lambda)-g(\mu)]^2)\\ &\leq (1-\mu^{-1}\lambda^{-1})\E(g(\lambda)g(\mu)), \endaligned $$ since no generality is lost by choosing $\lambda\geq\mu>1$, the result follows thanks to the estimate $(1.7)$. \enddemo Let us discuss briefly how the above result compares with the ones present in the literature. In the work of Gordin \cite{Go}, used by Keller \cite{Ke}, a very similar theorem is present. The main difference is that condition (1) and (2) are replaced by the much stronger condition $$ \sum_{n=0}^\infty\E(\E(\wh T^{*n}f|\F_0)^2)<\infty . $$ A similar comment applies to \cite{DG}, where moreover there is no discussion of the case $\sigma=0$. Theorem 1.1 often is applicable in cases in which $T$ is not invertible, where sometime it is possible to choose $\F_0=\F$ (see \S2 ). When $T$ is invertible the choice $\F_0=\F$ is likely to yield $\F_i=\F$ for each $i\in\Bbb Z$, this would undermine the possibility of capturing any type of dynamical coarse graining effect, whereby nullifying the hope of obtaining an interesting statistical behavior. In such a case, there are situations in which a natural choice for $\F_0$ exists (see \S 3), but it would be too restrictive to require $f $ to be $\F_0$--measurable. The above difficulties can be dealt with by the following Theorem. \proclaim{Theorem 1.2} Suppose $T$ one to one and onto. If $\F_i$ is coarser than $\F_{i-1}$, then, for each $f\in\L^\infty(X)$, $\E(f)=0$ such that \roster \item $\sum_{n=0}^\infty|\E(f\wh T^n f)|<\infty$, \item the series $\sum_{n=0}^\infty|\E(\wh T^{*n}f|\F_0)|$ converges in $L^1$, \item $\exists$ $\alpha>1$: $\sup\limits_{k\in\Bbb N}k^\alpha \E(|\E(f|\F_{-k})-f|)<\infty$,\footnote{This condition it is not optimal, as it can be seen by looking at the proof, yet I do not know of any application in which a weaker condition could be of interest.} \endroster the sequence $$ \frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i f $$ converges in law to a Gaussian random variable of zero mean and finite variance $\sigma$, $\sigma^2=-\E(f^2)+2\sum_{n=0}^\infty\E(f\wh T^n f)$. In addition, if $\sum_{n=0}^\infty n|\E(f\wh T^n f)|<\infty$, then $\sigma=0$ if and only if there exists $g\in L^2(X)$ such that $$ \wh Tf=\wh Tg -g . $$ \endproclaim \demo{Proof} The key idea is to first approximate $f$ by $\E(f|\F_{-k})$ and then use the same type of Martingale approximation introduced in Theorem 1.1. That is, to find $Y_i(k,\,\lambda)\in L^2(X)$ and $g(k,\,\lambda)\in L^2(X)$ such that, given $k>0$, for each $i>0$ and $\lambda>1$ $$ \E(Y_{i-1}(k,\,\lambda)|\F_{i-k})=Y_{i-1}(k,\,\lambda)\; ;\;\;\; \E(Y_{i}(k,\,\lambda)|\F_{i-k})=0 , \tag 1.8 $$ (i.e., $Y_i(k,\,\lambda)$ is a reverse Martingale difference with respect to the filtration $\{\F_i\}_{i=-k}^\infty$) and $$ \wh T^i\E(f|\F_{-k})=Y_{i}(k,\,\lambda)+\wh T^ig(k,\,\lambda)-\lambda^{-1}\wh T^{i-1}g(k,\,\lambda) \quad \forall i>0,\;\lambda \geq 1. \tag 1.9 $$ Note that $(1.8)$ and $(1.9)$ are equivalent to $$ \E(f|\F_{-k})=g(k,\,\lambda)-\lambda^{-1}\E(\wh T^*g(k,\,\lambda)|\F_{-k}). $$ It is immediate to see that $g(k,\,\lambda)=\sum\limits_{n=0}^\infty \lambda^{-n}\E(\wh T^{*n}f|\F_{-k})\in L^2(X)$ for each $\lambda>1$ and in $L^1(X)$ for $\lambda=1$ (this is a consequence of hypothesis (2) in the statement of the Theorem) is a solution of the above equation (see the analogous discussion in Theorem 1.1). Again we want to show that the $Y_i(k,\,1)$ are square summable, actually, in this case, we need a uniform estimate in $k$. In partial analogy with Theorem 1.1, we have $$ \align \E(Y_i(k,\,\lambda)^2)=&\E(Y_1(k,\,\lambda)^2)=-\E(\E(f|\F_{-k})^2)+ \E([\wh Tg(k,\,\lambda)-\lambda^{-1}g(k,\,\lambda)]^2)\\ =&-\E(\E(f|\F_{-k})^2)+2\E(g(k,\,\lambda)\E(f|\F_{-k}))-(1-\lambda^{-2}) \E(g(k,\,\lambda)^2). \endalign $$ In addition, for each $\lambda>1$, $$ \align \E(g(k,\,\lambda)\E(f|\F_{-k})) \leq&\sum_{n=0}^\infty|\E(f\E(\wh T^{*n}f|\F_{-k}))| =\sum_{n=0}^\infty|\E(\wh T^{*n}f\E(f|\F_{-k}))|\\ \leq&2\|f\|_\infty k\E(|\E(f|\F_{-k})-f|)+\sum_{n=k}^\infty\E(\wh T^k f\E(\wh T^{*n}f|\F_0))\\ &+\sum_{n=0}^{2k-1}\E(\wh T^n ff)<\infty, \endalign $$ where the uniform bound follows from the hypotheses (1), (2), (3) of the Theorem. The previous estimates show that $Y_i(k,\,1)$ are uniformly square integrable martingale differences. Moreover, $$ \lim_{k \to \infty}\lim_{\lambda\to 1}\E(Y_1(k,\,\lambda)^2)=-\E(f^2)+ 2\sum_{n=0}^\infty\E(f\wh T^n f)=\sigma^2. $$ To see this it, it is enough to compute $$ \aligned \E(g(k,\,\lambda)g(k,\,\mu))&=\sum_{n,m=0}^\infty\lambda^{-n}\mu^{-m} \E(\wh T^{*n}f\E(\wh T^{*m}f|\F_{-k}))\\ &\leq\sum_{n=0}^\infty\lambda^{-n}M\|f\|_\infty\E(|\E(\wh T^{*n}f|\F_{-k})| )\\ &+\sum_{n=0}^\infty\lambda^{-n}\|f\|_\infty \sum_{m=M}^\infty\E(|\E(\wh T^{*m}f|\F_{-k})|)\\ &\leq M\|f\|_\infty^2k+M\|f\|_\infty\sum_{n=0}^\infty\E(|\E(\wh T^{*n}f| \F_0)|)\\ &+(1-\lambda^{-1})^{-1}\|f\|_\infty \sum_{m=M-k}^\infty\E(|\E(\wh T^{*m}f|\F_0)|), \endaligned $$ so, since $M$ can be chosen arbitrarily large, $\lim\limits_{\lambda\to 1}(1-\lambda)\E(g(k,\,\lambda)^2)=0$. Furthermore, in analogy with Theorem 1.1, easily follows that $Y_1(k,\,\lambda )$ converges to $Y_1(k,\,1)$ in $L^2(X)$. This implies that, defining $$ S_n=\frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i f\; ;\quad S_n^k=\frac 1{\sqrt n}\sum_{i=0}^{n-1}\wh T^i \E(f|\F_k), $$ the $S_n^k$ converges in law to a gaussian with zero means and variance $\E(Y_1(k,\,1)^2)$. The next step is to obtain the needed convergence as $k$ goes to infinity. $$ \aligned \E([S_n^k-S_n]^2)=&\frac 1n \sum_{i,\,j=0}^{n-1}\E(\wh T^i[f-\E(f|\F_{-k})] \wh T^j[f-\E(f|\F_{-k})])\\ \leq& \E([f-\E(f|\F_{-k})]^2)+2\sum_{i=1}^{n-1} |\E([f-\E(f|\F_{-k})]\wh T^i[f-\E(f|\F_{-k})])|\\ \leq& 2\|f\|_\infty\E(|f-\E(f|\F_{-k})|)+2\sum_{i=1}^{n-1}|\E(\wh T^if [f-\E(f|\F_{-k})])|\\ =&2\|f\|_\infty\E(|f-\E(f|\F_{-k})|)+2\sum_{i=1}^{n-1}|\E(\wh T^{*i}f [f-\E(f|\F_{-k-i})])|\\ &\leq 2\|f\|_\infty\sum_{i=k}^\infty\E(|f-\E(f|\F_{-i})|), \endaligned $$ which it is smaller than $\varepsilon$ uniformly in $n$, since (3) implies the convergence of the series $\sum_{i=0}^\infty\E(|f-\E(f|\F_{-i})|)$. Collecting the previous estimates follows that $S_n$ converges to a Gaussian of zero mean and variance $\sigma^2$. Next, suppose that $\sigma^2=0$ and $\sum_{n=0}^\infty n|\E(f\wh T^n f)| \leq\infty$, then $$ \aligned (1-\lambda^{-2})\E(g(k,\,\lambda)^2)&+\E(Y_1(k,\,\lambda)^2)\leq -\E(\E(f|\F_{-k})^2)+2\E(g(k,\,\lambda)f)\\ &\leq\E(f^2)-\E(\E(f|\F_{-k})^2)+2\sum_{n=1}^\infty(1-\lambda^{-n})\E(f\wh T^nf)\\ &+ 2\sum_{n=0}^\infty\lambda^{-n}\left[\E(\E(\wh T^{*n}f|\F_{-k})f)- \E(f\wh T^n f)\right]\\ &\leq\|f\|_\infty\E(|f-\E(f|\F_{-k})|)+2(1-\lambda^{-1})\sum_{n=0}^\infty n|\E(f\wh T^n f)|\\ &\;+ 2\|f\|_\infty\left(\sum_{n=0}^{2k-1}\E(|\E(f|\F_{-k})-f|)+ \sum_{n=k}^\infty\E(|\E(\wh T^{*n}f|\F_0)|)\right)\\ &+2\sum_{n=2k}^\infty\E(f\wh T^n f). \endaligned $$ Accordingly, it is possible to define $\phi:(0,\,1)\to\Bbb N$, $\lim_{\lambda\to 1}\phi(\lambda)=\infty$, such that $$ \aligned &\E(g(\phi(\lambda),\,\lambda)^2)\leq M\quad\forall \lambda>1\\ &\lim_{\lambda \to 1}\E(Y_1(\phi(\lambda),\,\lambda)^2)=0 , \endaligned $$ where $M$ is some fixed positive number. Since $L^2(X)$ is a Hilbert space, and therefore reflexive, the unit ball is compact in the weak topology, so $\{g(\phi(\lambda),\,\lambda\}_{\lambda>1}$ is a weakly compact set and we can extract a subsequence $\{\lambda_j\}$, $\lim_{j\to\infty}\lambda_j=1$, such that $\{g(\phi(\lambda_j),\,\lambda_j\}$ converges weakly to a function $g\in L^2(X)$. In addition, (1.9) implies, for each $\varphi\in L^2(X)$, $$ \E(\wh T^*\varphi\E(f|\F_{-k}))=\E(Y_1(\phi(\lambda_j),\,\lambda_j)\varphi)+ \E(\wh T^*\varphi g(\phi(\lambda_j),\,\lambda_j))-\lambda_j^{-1}\E(\varphi g(\phi(\lambda_j),\,\lambda_j)), $$ and taking the limit $j\to\infty$ yields $$ \E(\wh T^*\varphi f)=\E(\wh T^*\varphi g)-\E(\varphi g) \quad \forall \varphi\in L^2(X). $$ That is $$ \wh Tf=\wh T g - g. $$ \enddemo This theorem is rather similar to Theorem 4.4 in \cite{DG}, the main difference is the absence, in \cite{DG}, of a discussion of the degenerate case $\sigma=0$. The only other results known to the author that have a breath similar to Theorem 2.1 are contained in \cite{Ch}. The comparison it is not so easy because the results in \cite{Ch} are stated directly in the language of special families of finite partitions. This language it is well suited for applications to the case in which the system is studied by the type of coding called Markov sieves, but it is not so transparent in an abstract contest. At any rate an evident different is that Chernov's result requires the existence of the first moment of the correlations (i.e., $\sum_{n=0}^\infty n\E(ff\circ T^n)<\infty$) in order to obtain the CLT while in Theorem 1.2 such a condition is not necessary, unless one wants the coboundary characterization of the functions that yields to a degenerate limit. \vskip1cm \subhead \S 2 Non invertible maps \endsubhead \vskip1cm In this section we will see how the results of the previous section apply to the case in which $T$ is onto but not one to one. We choose $\F=\F_0$, so $\F_i=\F$ for all $i\leq 0$. Note that if $\E(\phi|\F_1)=\phi$, then $g(x)=\phi(T^{-1}x)$ is well defined, hence Range$(T)$ is exactly the $\F_1$-measurable functions. Moreover, $\wh T\wh T^*$ is an orthogonal projection onto Range$(T)$, while $\E(\cdot|\F_1)$ is an orthogonal projection onto the $\F_1$-mesuarable functions. That is, for each $\phi\in L^1(X)$ $$ \wh T\wh T^*\phi=\E(\phi|\F_1). $$ The first condition of Theorem 1.1 is then satisfied quite generally. To see how the theorem works let us apply it to the case of one dimensional maps (i.e. $X=[0,\,1]$). Let us consider a partition of $[0,\,1]$ into finitely many intervals $\{I_k\}_{k=1}^p$. And $T:[0,\,1]\to[0,\,1]$ such that \roster \item $T\big|_{\overline I_k}\in\Cal C^{(2)}$ for each $k\in\{1,\,...,\,p \}$ \item $\inf\limits_{x\in[0,\,1]}|D_xT|\geq \lambda>1$. \endroster That is a piecewise smooth expanding map. If the reader wants to consider a concrete example, here is a very simple one: the piecewise linear map $T:[0,\,1]\to[0,\,1]$ define by $$ T(x)=\left\{\aligned \frac 92 \left(\frac 19 -x\right)&\quad x\in\left(0,\,\frac 19\right)\\ \frac 92 \left(x-\frac 19\right) &\quad x\in\left(\frac 19,\, \frac 39\right)\\ \frac 92 \left(\frac 59-x\right)&\quad x\in\left(\frac 39,\, \frac 59\right)\\ \frac 92 \left(x-\frac 59\right)&\quad x\in\left(\frac 59,\, \frac 79\right)\\ \frac 92 \left(1-x\right)&\quad x\in\left(\frac 79,\,1\right) \endaligned \right. $$ The map satisfies our assumptions since $|DT|=\frac 92>1$. The following result is well known \cite{HK}: \proclaim{Theorem 2.1} There exists a unique probability measure $\mu$, absolutely continuous with respect to Lebesgue, which is invariant with respect to the map $T$. In addition, there exist $\Lambda\in(0,\,1)$ and $K>0$ such that, for each $f\in BV([0,\,1])$ (the space of functions of bounded variation), and $g\in L^1([0,\,1],\,\mu)$ $$ \left|\int_0^1 f g\circ T^n d\mu-\int_0^1 fd\mu\int_0^1 gd\mu\right| \leq K\Lambda^n\|f\|_{\text{BV}}\|g\|_1 $$ \endproclaim Since $\mu$ is absolutely continuous with respect to the Lebesgue measure $m$ the Radon--Nicod\'ym derivative $h=\frac{d\mu}{dm}$ is in $L^1([0,\,1], \,m)$. For simplicity assume $h\geq\varepsilon>0$,\footnote{This is always verified if $T$ is continuous, like in our example; but see \cite{L2} for a discussion of the general case.} then it follows $$ \wh T^*f(x)=h(x)^{-1}\sum_{y\in T^{-1}(x)}h(y)f(y)|D_yT|^{-1} . $$ Such a representation implies that the last statement of Theorem 2.1 can be rephrased as follows: for each $f\in BV([0,\,1])$, $\int_0^1 fd\mu= 0$ $$ \|\wh T^{*n}f\|_\infty\leq K\Lambda^n\|f\|_{\text{BV}}. $$ It is then immediate to see that Theorem 1.1 applies to this situation yielding the central limit theorem for all functions of bounded variation. The reader can easily see that such a result can be improved obtaining the central limit theorem for functions with less regularity (e.g., by an approximation argument) but this is not the main focus here. In addition, similar results can be obtained for several cases in which the map $T$ consists of infinitely many smooth pieces. It is also immediate to verify that the theorem will yield the CLT for BV functions also for some non-hyperbolic maps (such as the quadratic family \cite {Y}) or maps that are non-uniformly hyperbolic (\cite{LSV}). \vskip1cm \subhead \S 3 Invertible maps \endsubhead \vskip1cm In this case it would be useless to choose $\F_0=\F$: typically this would yield $\F_i=\F$ for each $i\in\Bbb Z$. So the choice of $\F_0$ must be motivated by dynamical considerations. Here we will discuss a general class of systems for which such a choice is quite natural: the hyperbolic systems.\footnote{More generally this strategy can be applied to K-systems.} For simplicity I will confine the discussion to the case in which $X$ is as compact symplectic manifold with a Riemannian structure that yields a volume form equivalent to the symplectic one and $T$ a piecewise $\Cal C^2$ symplectic map, but see \cite{KS} and \cite{LW} for more general possibilities. By hypothesis the symplectic (or Riemannian) volume $\mu$ is invariant. (The more general case of dissipative systems can also be treated with the same arguments, again the details are left to the reader). We will assume $T$ uniformly hyperbolic, since almost nothing is known on the decay of correlations for non-uniformly hyperbolic systems. By this we mean that at each point $x\in X$ there exists two subspaces $E^u(x),\,E^s(x)\in\Cal T_x X$, $E^u(x)\cap E^s(x)=\{0\}$ and $E^u(x)\oplus E^s(x)=\Cal T_xX$, invariant (i.e., $D_xT E^{u,s}(x)=E^{u,s}( Tx)$), and there exists $\lambda>1$ such that for each $x\in X$, $$ \aligned &\|D_xTv\|\geq \lambda \|v\|\quad\forall v\in E^u(x)\\ &\|D_xTv\|\leq \lambda^{-1} \|v\|\quad\forall v\in E^s(x). \endaligned $$ Also, we assume that $E^{u, s}(x)$ depends continuously with respect to $x$ (the above systems are called Anosov, in the smooth case). In the smooth case such systems are known to be ergodic (in fact, Bernoulli ), one can see \cite{LW} for sufficient conditions that insure ergodicity also in the non smooth case. To help the reader in better visualizing the following discussion let us consider the simplest possible non-trivial example. We consider a family of linear maps of the plane defined by $$ \aligned x_1' &= x_1 + a x_2 \\ x_2' &= x_2 , \endaligned $$ where $a$ is a real parameter. We use these linear maps to define (discontinuous if $a\not\in\Bbb N$) maps of the torus by restricting the formulas to the strip $\{ 0 \leq x_2 \leq 1 \}$ and further taking them modulo 1. In this way we define a mapping $T_1$ of the torus $\To ^2 = \Bbb R^2/\Z^2$ which is discontinuous on the circle $\{ x_2 \in \Z \}$ (except when $a$ is equal to an integer) and preserves the Lebesgue measure $\mu$. Similarly we define another family of maps depending on the same parameter $a$ by restricting the formulas $$ \aligned x_1' &= x_1 \\ x_2' &= a x_1 + x_2 \endaligned $$ to the strip $\{ 0 \leq x_1 \leq 1 \}$ and then taking them modulo 1. Thus for each $a$ we get a mapping $T_2$ of the torus which is discontinuous on the circle $\{ x_1 \in \Z \}$ (except when $a$ is equal to an integer) and preserves the Lebesgue measure $\mu$. Finally we introduce the composition of these maps $T = T_2 T_1$ which depends on one real parameter $a$. An alternative way of describing the map $T$ is by introducing two fundamental domains for the torus $\M^+ = \{ 0 \leq x_1 + a x_2 \leq 1,\, 0 \leq x_2 \leq 1 \}$ and $\M^- = \{ 0 \leq x_1 \leq 1, \,0 \leq - a x_1 + x_2 \leq 1,\, \}$. The linear map defined by the matrix $$ \left(\matrix 1 & a \\ a & 1 + a^2\endmatrix\right)= \left(\matrix 1 & 0 \\ a& 1 \endmatrix\right) \left(\matrix 1 & a \\ 0 & 1 \endmatrix\right) $$ takes $\M^+$ onto $\M^-$ thus defining a map of the torus which is discontinuous at most on the boundary of $\M^+$ and preserves the Lebesgue measure. This is the map $T$ that constitute our toy model. Let us go back to the more general case, according to \cite{KS} such systems have a natural measurable partition (in fact a K-partition): the partition into stable manifolds. Such a partition $\Cal P$ can be constructed as to satisfy the following requirements: \roster \item there exists a finite number of codimension one smooth manifolds $\{S_i\}_{i=1}^{m_0}$, transversal to the stable direction, such that each $p\in\Cal P$ has the boundaries points belonging to the set $\cup_{j=1}^{m_0}\cup_{n=0}^\infty T^{-n}S_i$;\footnote{In the discontinuous case such manifolds can be simply chosen as the set of points at which $T$ is not $\Cal C^{(2)}$.} \item for each $p\in\Cal P$ diam$(p)\leq 2\delta$;\footnote{$\delta$ is some previously fixed number.} \item for each $p\in\Cal P$ there exists $\{p_i\}_{i=1}^k\subset\Cal P$ such that $T^{-1}p= \cup_{i=1}^k p_i$. \endroster The above properties imply that choosing as $\F_0$ the $\sigma$-algebra generated by the partition $\Cal P$, then $\{\F_i\}_{i=0}^\infty$ has the dynamical properties requested in the hypotheses of Theorem 1.2. To make the previous statement more clear let us see how such a partition looks like in the concrete example mentioned above. The map $T$ is piecewise linear and it has constant contracting direction $v$. Let us call $S$ the discontinuity set of $T^{-1}$ and $S_\infty=\cap_{n=0}^\infty T^{-n}S$. Then the stable partition is made of segments along the direction $v$ with the endpoints belonging to $S_\infty$.\footnote{See \cite{LW} for the details of such a construction and the proof that almost every point belongs to one such segment.} Since $S_\infty$ is an invariant set, properties (1)-(3) are readily verified. Further, we will assume that the manifolds $\{S_i\}$ satisfy the following property: \proclaim{Property 0} For each $i\neq j$ $\overline{S}_i\cap \overline{S}_j$ is either empty or consists of smooth submanifolds $I_{ij}$ of codimension at least two. Moreover, setting\footnote{By $\sharp B$ I mean the cardinality of the set $B$.} $$ M\equiv\sup_{ij}\sharp\{k\in\{1,\,...,\,m_0\}\;|\; \overline{S}_k\cap I_{ij}\neq\emptyset\}, $$ we require $$ \nu\equiv\lambda^{-1} M<1 . $$ \endproclaim Note that Property 0 may not be satisfied by $T$ but may be enjoyed by $T^q$, for some $q>1$. In fact, it is not so hard to see that ``generically" this will be the case (i.e., Property 0 will hold for some iterate of the map). In such a situation we can apply all the following to the dynamical system $(X,\,\mu,\,T^q)$ obtaining the same conclusions as far as the CLT is concerned. Here, for simplicity, we restrict ourselves to the case $q=1$. If we think to our model example we see that $M=2$, so that $\lambda^{-1}M<1$ if $|a|>\frac 1{\sqrt{2}}$. The reader can easily compute $M$ for powers of $T$ and see that Property 0 is satisfied for smaller and smaller values of $a$. Of course, $a=0$ corresponds to the identity, for which no hyperbolicity is present. For the systems under consideration holds the following (see \cite{KS} for details) \proclaim{Property 1}For each $p\in\Cal P$ define the measure $\mu_p$ by $$ \Bbb E(g|\Cal F_0)(x)\equiv\int_p g d\mu_p, $$ for $g\in\Cal C^{(0)}(X)$, and $x\in p$. Then, calling $m_p$ the measure induced by the Riemannian structure on $p$, and $\phi_p=\frac{d\mu_p}{dm_p}$ the Radon--Nicod\'ym derivative, there exist $c_0$ such that $\sup_p\|\phi_p\|_\infty\leq c_0$. \endproclaim For our simple example we see that $\mu_p=\frac 1{m_p(p)}m_p$. The map is invertible, thus $\wh T^* f=f\circ T^{-1}$. A very important consequence of Property 1 is that, if $p\in\Cal P$ and $\Cal P'\subset \Cal P$ is such that $\bigcup\limits_{q\in\Cal P'}q=T^{-n}p$, then for each $f\in L^1(X,\,\mu)$ $$ \int_pf\circ T^{-n}d\mu_p=\sum_{q\in\Cal P'}\mu_p(T^nq)\int_q fd\mu_p . $$ In addition, one can prove the following (see \cite{L1} for a complete discussion of the two-dimensional case). \proclaim{Property 2} There exists $K\in \Bbb R$ and $\Lambda>1$, such that, for each $x\in X$ that belongs to $p\in\Cal P$ with diam$(p)\geq\delta$, and for each $g,\,f\in\Cal C^{\alpha}(X)$ (H\"older continuous of class $\alpha>0$), $\int_Xg=0$,\footnote{By $\|f\|_\alpha$ we mean the usual $\Cal C^{(\alpha)}$ norm, while $\|f\|_\alpha^s=\sup \limits_{p\in\Cal P}\sup\limits_{x,\,y\in p}\frac{|f(x)-f(y)|} {\|x-y\|^\alpha}+\|f\|_\infty$; and $\|f\|^u_\alpha$ is defined analogously by using the unstable partition. Essentially, This norms measure the H\"older derivative in the stable (or unstable) direction only.} $$ \E(f\wh T^{*n}g|\F_0)(x)\leq K \Lambda^{-n}\|g\|_\alpha^s\|f\|_\alpha^u $$ \endproclaim In the rest of the section we will see that Properties 0-2 imply, for the systems under consideration, the hypotheses of Theorem 1.2. \proclaim{Lemma 3.1} Calling $\Cal A_\varepsilon=\{x\in X\;|\; \text{diam}(p(x))\leq \varepsilon\}$ we have\footnote{By $m(\cdot)$ we mean the symplectic or Riemannian metric that, according to our hypotheses, is the invariant measure of the system.} $$ m(\Cal A_\varepsilon)\leq C\varepsilon $$ for some fixed $C\in\Bbb R^+$. \endproclaim \demo{Proof} Since $\partial p$ is made up of points belonging to the preimages of the manifolds $S_i$, it follows that if diam$(p)\leq\varepsilon$ then there exists $z\in\partial p$ and $n\in\Bbb N$, $i\in\{1,\,...,\,m_0\}$ such that $T^n z\in S_i$. Accordingly, $T^np$ must lie in a $\lambda^{-n}\varepsilon$ neighborhood of $S_i$. Such a neighborhood has measure $c_1\lambda^{-n}\varepsilon$, for some fixed $c_1$. It is then clear that $$ m(\Cal A_\varepsilon)\leq \sum_{n=0}^\infty m_0c_1\lambda^{-n}\varepsilon= \frac{m_0 c_1}{1-\lambda}\varepsilon. $$ \enddemo The problem in applying our theorem comes from the possible presence in $\Cal P$ of very small elements. On such elements Property 2 does not provide any direct control. To our advantage instead works Lemma 3.1 that informs us that the total measure of the very small pieces is small. Yet, small pieces may be present.\footnote{In fact, this is certainly the case in the non-smooth case. If $T$ is smooth, then it is possible to construct $\Cal P$ in such a way that diam$(\Cal P)\geq \delta$ for some fixed $\delta$, by using Markov partitions. When finite Markov partitions are available the present method boils down to a repackaging of well known facts.} The idea to deal with them consists in iterating them: if $T^{-n}|_p$ is smooth, then diam$(T^{-n}p)\geq\lambda^n \text{diam}(p)$. Unfortunately, in general $T$ is not smooth so we have to handle the iteration with more care. Fix $p\in\Cal P$, by construction there exists $\Cal P_1\subset\Cal P$ such that $T^{-1}p=\bigcup\limits_{q\in\Cal P_1}q$. Call $\Cal P_1^-=\{q\in\Cal P_1 \;|\;\text{diam}(q)\leq\delta\}$ and $p_1=\bigcup\limits_{q\in\Cal P_1^-}Tq\subset p$. In other words $p_1$ consists in the part of $p$ that, under the actions of $T^{-1}$, does not give rise to sufficiently large elements of the partition. The process can obviously be iterated: let $\Cal P_2$ be the collection such that $T^{-2}p_1=\bigcap\limits_{q\in\Cal P_2}T^{-2}q$, $\Cal P_2^-=\{q\in\Cal P_2\;|\;\text{diam}(q)\leq\delta\}$, $p_2=\bigcup\limits_{q\in\Cal P_2^-}T^2q\subset p_1$ and so on. \proclaim{Lemma 3.2} If $\delta$ is chosen sufficiently small and $p\in\Cal A_\varepsilon$, then for $n\geq \frac{\log\varepsilon^{-1}\delta}{\log\nu^{-1}}+m$, $$ m_p(p_n)\leq\varepsilon\nu^m . $$ \endproclaim \demo{Proof} By choosing $\delta$ sufficiently small we can insure, thanks to Property 0, that each element with diameter less than $\delta$ can intersect at most $M$ manifolds $S_i$. Since the $S_i$ describe all the possible discontinuities in our system, it follows that $\sharp \Cal P_1\leq M$. But the same argument applies to each connected piece of $p_j$: since the diameter of $T^{-l}p_j$ is, by definition, less than $\delta$, for $l