Ë ÐVhqãó<—ddlZddlZddlmZddlmZddddœd„Zy)éN)ÚExpandedWeight)Ú_pytreeÚsumT)Ú batch_sizeÚloss_reductionÚbatch_firstcó2‡‡‡‡‡‡—ˆfd„Šˆfd„Š‰dvrtd‰›«‚t‰tjj«s!tdt‰«j›«‚‰1t‰t«s!tdt‰«j›«‚‰‰dkrtd‰›«‚‰j«D]*}t|d «sŒ|j€Œtd |›d«‚tj‰j«ˆˆˆˆfd„«}|S) a; Return a forward function for a module, populating grad_sample with per sample gradients on backward invocation. Args: module: The ``nn.Module`` to get per sample gradients with respect to. All trainable parameters will compute per sample gradients, located in a ``grad_sample`` field when ``backward`` is invoked batch_size: The batch size of the input. If None is passed, all tensor arguments in args and kwargs must have the same batch size, which is the size of the first dimension. Otherwise, it must be passed manually. Default: None loss_reduction: Indicates if the loss reduction (for aggregating the gradients) is a sum or a mean operation. If "mean", per sample gradients will be scaled by the batch size to offset the crossbatch interaction from running mean across a batch. Must be "mean" or "sum". Default: "sum" batch_first: Indicates if the batch dimension is the first dimension. If True, the batch dimension is the first dimension. If False, it's the second dimension. Default: True. Examples:: >>> # xdoctest: +SKIP >>> model = nn.Linear(4, 3) >>> batched_input = torch.randn(5, 4) # batch size of 5 >>> res = call_for_per_sample_grads(model)(batched_input).sum() >>> res.backward() >>> assert model.weight.shape == (3, 4) >>> assert model.weight.grad_sample.shape == (5, 3, 4) >>> assert model.weight.grad is None >>> assert model.bias.shape == (3,) >>> assert model.bias.grad_sample.shape == (5, 3) >>> assert model.bias.grad is None An example using "mean" loss reduction. The grad_sample fields will be scaled by batch_size from what they would be if we ran the same code with loss_reduction="sum". This is because the mean at the end will scale all grad_outputs by 1 / batch_size from cross batch interaction. >>> model = nn.Linear(4, 3) >>> batched_input = torch.randn(5, 4) # batch size of 5 >>> res = call_for_per_sample_grads(model, 5, loss_reduction="mean")(batched_input).mean() >>> res.backward() Note:: Does not work with any `nn.RNN`, including `nn.GRU` or `nn.LSTM`. Please use custom rewrites that wrap an `nn.Linear` module. See Opacus for an example có:•—|jr t||‰«S|S©N)Ú requires_gradr)Ú og_tensorrrs €úO/home/dcms/DCMS/lib/python3.12/site-packages/torch/nn/utils/_per_sample_grad.pyÚmaybe_build_expanded_weightz>call_for_per_sample_grads..maybe_build_expanded_weight<s!ø€Ø×"Ò"Ü! )¨Z¸ÓHÐHàÐócó •—tj|i|¤Ž}d}|D]X}t|tj«sŒ‰r|j dn|j d}|||k7rt d|›d|›d«‚|}ŒZ|€t d«‚|S)NrézDWhen computing batch size, found at least one input with batch size z and one with batch size zV. Please specify it explicitly using the batch size kwarg in call_for_per_sample_gradszµUnable to find a tensor in the passed args and kwargs. They may not be pytree-able and so ExpandedWeights cannot compute the batch size from the inputs. Please specify it explicitly)ÚpytreeÚarg_tree_leavesÚ isinstanceÚtorchÚTensorÚshapeÚRuntimeError)ÚargsÚkwargsÚargs_and_kwargsrÚargÚarg_batch_sizers €rÚcompute_batch_sizez5call_for_per_sample_grads..compute_batch_sizeBs¶ø€Ü ×0Ñ0°$ÐA¸&ÑAˆØˆ Ø"ò (ˆCÜ˜c¤5§<¡<Ô0Øá-8˜SŸY™Y qš\¸c¿i¹iÈ¹lˆNØÐ%¨*¸Ò*FÜ"ØZØ!lÐ";¸NÐ;KðLYðYóðð (‰Jð (ðÐÜð óð ð Ðr)rÚmeanz8Expected loss_reduction argument to be sum or mean, got z%Module passed must be nn.Module, got z2Batch size passed must be None or an integer, got rz!Batch size must be positive, got Úgrad_samplez©Current Expanded Weights accumulates the gradients, which will be incorrect for multiple calls without clearing gradients. Please clear out the grad_sample parameter of zC or post an issue to pytorch/pytorch to prioritize correct behaviorc óÀ•—‰}|€‰|i|¤Ž}‰ j«Dcic]\}}|‰||«“Œ}}}tjj‰ |||«Scc}}wr)Únamed_parametersrÚfuncÚfunctional_call) rrÚwrapper_batch_sizeÚnameÚvalueÚparamsrrrÚmodules €€€€rÚwrapperz*call_for_per_sample_grads..wrapperps{ø€à'ÐØÐ%Ù!3°TÐ!D¸VÑ!DÐð"(×!8Ñ!8Ó!:÷ áuð Ñ-¨eÐ5GÓHÑHð ˆñ ôz‰z×)Ñ)¨&°&¸$ÀÓGÐGùó s¡A)rrrÚnnÚModuleÚtypeÚ__name__ÚintÚ parametersÚhasattrr!Ú functoolsÚwrapsÚforward)r*rrrÚweightr+rrs```` @@rÚcall_for_per_sample_gradsr7s8ý€ôbôð.˜_Ñ,ÜØFÀ~ÐFVÐWó ð ôfœeŸh™hŸo™oÔ.ÜØ3´D¸³L×4IÑ4IÐ3JÐKó ð ð Ð¤*¨Z¼Ô"=ÜØ@ÄÀjÓAQ×AZÑAZÐ@[Ð\ó ð ðÐ *¨q¢.ÜÐ>¸z¸lÐKÓLÐLØ×#Ñ#Ó%òˆÜ6˜=Õ)¨f×.@Ñ.@Ñ.LÜðcØciÐbjðkRðRóð ðô‡__V—^‘^Ó$ö Hó%ð Hð€Nr)r3rÚ6torch.nn.utils._expanded_weights.expanded_weights_implrÚtorch.utilsrrr7©rrúr;s"ðããÝQÝ)ðØØõqr